Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

AbuTahir@lemm.ee · edit-2 2 months ago

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

SoftestSapphic@lemmy.world · 2 months ago

Wow it’s almost like the computer scientists were saying this from the start but were shouted over by marketing teams.

minoscopede@lemmy.world · edit-2 2 months ago

I see a lot of misunderstandings in the comments 🫤

This is a pretty important finding for researchers, and it’s not obvious by any means. This finding is not showing a problem with LLMs’ abilities in general. The issue they discovered is specifically for so-called “reasoning models” that iterate on their answer before replying. It might indicate that the training process is not sufficient for true reasoning.

Most reasoning models are not incentivized to think correctly, and are only rewarded based on their final answer. This research might indicate that’s a flaw that needs to be corrected before models can actually reason.

Knock_Knock_Lemmy_In@lemmy.world · 2 months ago

When given explicit instructions to follow models failed because they had not seen similar instructions before.

This paper shows that there is no reasoning in LLMs at all, just extended pattern matching.

MangoCats@feddit.it · 2 months ago

I’m not trained or paid to reason, I am trained and paid to follow established corporate procedures. On rare occasions my input is sought to improve those procedures, but the vast majority of my time is spent executing tasks governed by a body of (not quite complete, sometimes conflicting) procedural instructions.

If AI can execute those procedures as well as, or better than, human employees, I doubt employers will care if it is reasoning or not.

Knock_Knock_Lemmy_In@lemmy.world · 2 months ago

Sure. We weren’t discussing if AI creates value or not. If you ask a different question then you get a different answer.

MangoCats@feddit.it · 2 months ago

Well - if you want to devolve into argument, you can argue all day long about “what is reasoning?”

Knock_Knock_Lemmy_In@lemmy.world · edit-2 2 months ago

You were starting a new argument. Let’s stay on topic.

The paper implies “Reasoning” is application of logic. It shows that LRMs are great at copying logic but can’t follow simple instructions that haven’t been seen before.

REDACTED@infosec.pub · edit-2 2 months ago

What confuses me is that we seemingly keep pushing away what counts as reasoning. Not too long ago, some smart alghoritms or a bunch of instructions for software (if/then) was officially, by definition, software/computer reasoning. Logically, CPUs do it all the time. Suddenly, when AI is doing that with pattern recognition, memory and even more advanced alghoritms, it’s no longer reasoning? I feel like at this point a more relevant question is “What exactly is reasoning?”. Before you answer, understand that most humans seemingly live by pattern recognition, not reasoning.

https://en.wikipedia.org/wiki/Reasoning_system

MangoCats@feddit.it · 2 months ago

I think as we approach the uncanny valley of machine intelligence, it’s no longer a cute cartoon but a menacing creepy not-quite imitation of ourselves.

Communist@lemmy.frozeninferno.xyz · edit-2 2 months ago

I think it’s important to note (i’m not an llm I know that phrase triggers you to assume I am) that they haven’t proven this as an inherent architectural issue, which I think would be the next step to the assertion.

do we know that they don’t and are incapable of reasoning, or do we just know that for x problems they jump to memorized solutions, is it possible to create an arrangement of weights that can genuinely reason, even if the current models don’t? That’s the big question that needs answered. It’s still possible that we just haven’t properly incentivized reason over memorization during training.

if someone can objectively answer “no” to that, the bubble collapses.

MouldyCat@feddit.uk · 2 months ago

In case you haven’t seen it, the paper is here - https://machinelearning.apple.com/research/illusion-of-thinking (PDF linked on the left).

The puzzles the researchers have chosen are spatial and logical reasoning puzzles - so certainly not the natural domain of LLMs. The paper doesn’t unfortunately give a clear definition of reasoning, I think I might surmise it as “analysing a scenario and extracting rules that allow you to achieve a desired outcome”.

They also don’t provide the prompts they use - not even for the cases where they say they provide the algorithm in the prompt, which makes that aspect less convincing to me.

What I did find noteworthy was how the models were able to provide around 100 steps correctly for larger Tower of Hanoi problems, but only 4 or 5 correct steps for larger River Crossing problems. I think the River Crossing problem is like the one where you have a boatman who wants to get a fox, a chicken and a bag of rice across a river, but can only take two in his boat at one time? In any case, the researchers suggest that this could be because there will be plenty of examples of Towers of Hanoi with larger numbers of disks, while not so many examples of the River Crossing with a lot more than the typical number of items being ferried across. This being more evidence that the LLMs (and LRMs) are merely recalling examples they’ve seen, rather than genuinely working them out.

GaMEChld@lemmy.world · 2 months ago

Most humans don’t reason. They just parrot shit too. The design is very human.

El Barto@lemmy.world · 2 months ago

LLMs deal with tokens. Essentially, predicting a series of bytes.

Humans do much, much, much, much, much, much, much more than that.

Zexks@lemmy.world · 2 months ago

No. They don’t. We just call them proteins.

stickly@lemmy.world · 2 months ago

You are either vastly overestimating the Language part of an LLM or simplifying human physiology back to the Greek’s Four Humours theory.

Zexks@lemmy.world · 2 months ago

No. I’m not. You’re nothing more than a protein based machine on a slow burn. You don’t even have control over your own decisions. This is a proven fact. You’re just an ad hoc justification machine.

surph_ninja@lemmy.world · 2 months ago

You assume humans do the opposite? We literally institutionalize humans who not follow set patterns.

petrol_sniff_king@lemmy.blahaj.zone · 2 months ago

Maybe you failed all your high school classes, but that ain’t got none to do with me.

surph_ninja@lemmy.world · 2 months ago

Funny how triggering it is for some people when anyone acknowledges humans are just evolved primates doing the same pattern matching.

billwashere@lemmy.world · 2 months ago

When are people going to realize, in its current state , an LLM is not intelligent. It doesn’t reason. It does not have intuition. It’s a word predictor.

x0x7@lemmy.world · edit-2 2 months ago

Intuition is about the only thing it has. It’s a statistical system. The problem is it doesn’t have logic. We assume because its computer based that it must be more logic oriented but it’s the opposite. That’s the problem. We can’t get it to do logic very well because it basically feels out the next token by something like instinct. In particular it doesn’t mask or disconsider irrelevant information very well if two segments are near each other in embedding space, which doesn’t guarantee relevance. So then the model is just weighing all of this info, relevant or irrelevant to a weighted feeling for the next token.

This is the core problem. People can handle fuzzy topics and discrete topics. But we really struggle to create any system that can do both like we can. Either we create programming logic that is purely discrete or we create statistics that are fuzzy.

Of course this issue of masking out information that is close in embedding space but is irrelevant to a logical premise is something many humans suck at too. But high functioning humans don’t and we can’t get these models to copy that ability. Too many people, sadly many on the left in particular, not only will treat association as always relevant but sometimes as equivalence. RE racism is assoc with nazism is assoc patriarchy is historically related to the origins of capitalism ∴ nazism ≡ capitalism. While national socialism was anti-capitalist. Associative thinking removes nuance. And sadly some people think this way. And they 100% can be replaced by LLMs today, because at least the LLM is mimicking what logic looks like better though still built on blind association. It just has more blind associations and finetune weighting for summing them. More than a human does. So it can carry that to mask as logical further than a human who is on the associative thought train can.

Slaxis@discuss.tchncs.de · 2 months ago

You had a compelling description of how ML models work and just had to swerve into politics, huh?

mfed1122@discuss.tchncs.de · edit-2 3 months ago

This sort of thing has been published a lot for awhile now, but why is it assumed that this isn’t what human reasoning consists of? Isn’t all our reasoning ultimately a form of pattern memorization? I sure feel like it is. So to me all these studies that prove they’re “just” memorizing patterns don’t prove anything other than that, unless coupled with research on the human brain to prove we do something different.

amelia@feddit.org · 2 months ago

This. Same with the discussion about consciousness. People always claim that AI is not real intelligence, but no one can ever define what real/human intelligence is. It’s like people believe in something like a human soul without admitting it.

technocrit@lemmy.dbzer0.com · edit-2 2 months ago

why is it assumed that this isn’t what human reasoning consists of?

Because science doesn’t work work like that. Nobody should assume wild hypotheses without any evidence whatsoever.

Isn’t all our reasoning ultimately a form of pattern memorization? I sure feel like it is.

You should get a job in “AI”. smh.

mfed1122@discuss.tchncs.de · 2 months ago

Sorry, I can see why my original post was confusing, but I think you’ve misunderstood me. I’m not claiming that I know the way humans reason. In fact you and I are on total agreement that it is unscientific to assume hypotheses without evidence. This is exactly what I am saying is the mistake in the statement “AI doesn’t actually reason, it just follows patterns”. That is unscientific if we don’t know whether or “actually reasoning” consists of following patterns, or something else. As far as I know, the jury is out on the fundamental nature of how human reasoning works. It’s my personal, subjective feeling that human reasoning works by following patterns. But I’m not saying “AI does actually reason like humans because it follows patterns like we do”. Again, I see how what I said could have come off that way. What I mean more precisely is:

It’s not clear whether AI’s pattern-following techniques are the same as human reasoning, because we aren’t clear on how human reasoning works. My intuition tells me that humans doing pattern following seems equally as valid of an initial guess as humans not doing pattern following, so shouldn’t we have studies to back up the direction we lean in one way or the other?

I think you and I are in agreement, we’re upholding the same principle but in different directions.

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

archive.is