What is a good eli5 analogy for GenAI not "knowing" what they say?
I have many conversations with people about Large Language Models like ChatGPT and Copilot. The idea that "it makes convincing sentences, but it doesn't know what it's talking about" is a difficult concept to convey or wrap your head around. Because the sentences are so convincing.
Any good examples on how to explain this in simple terms?
Edit:some good answers already!
I find especially that the emotional barrier is difficult to break. If an AI says something malicious, our brain immediatly jumps to "it has intent". How can we explain this away?
I am an LLM researcher at MIT, and hopefully this will help.
As others have answered, LLMs have only learned the ability to autocomplete given some input, known as the prompt. Functionally, the model is strictly predicting the probability of the next word+, called tokens, with some randomness injected so the output isn’t exactly the same for any given prompt.
The probability of the next word comes from what was in the model’s training data, in combination with a very complex mathematical method to compute the impact of all previous words with every other previous word and with the new predicted word, called self-attention, but you can think of this like a computed relatedness factor.
This relatedness factor is very computationally expensive and grows exponentially, so models are limited by how many previous words can be used to compute relatedness. This limitation is called the Context Window. The recent breakthroughs in LLMs come from the use of very large context windows to learn the relationships of as many words as possible.
This process of predicting the next word is repeated iteratively until a special stop token is generated, which tells the model go stop generating more words. So literally, the models builds entire responses one word at a time from left to right.
Because all future words are predicated on the previously stated words in either the prompt or subsequent generated words, it becomes impossible to apply even the most basic logical concepts, unless all the components required are present in the prompt or have somehow serendipitously been stated by the model in its generated response.
This is also why LLMs tend to work better when you ask them to work out all the steps of a problem instead of jumping to a conclusion, and why the best models tend to rely on extremely verbose answers to give you the simple piece of information you were looking for.
From this fundamental understanding, hopefully you can now reason the LLM limitations in factual understanding as well. For instance, if a given fact was never mentioned in the training data, or an answer simply doesn’t exist, the model will make it up, inferring the next most likely word to create a plausible sounding statement. Essentially, the model has been faking language understanding so much, that even when the model has no factual basis for an answer, it can easily trick a unwitting human into believing the answer to be correct.
—-
+more specifically these words are tokens which usually contain some smaller part of a word. For instance, understand and able would be represented as two tokens that when put together would become the word understandable.
I think that a good starting place to explain the concept to people would be to describe a Travesty Generator. I remember playing with one of those back in the 1980's. If you fed it a snippet of Shakespeare, what it churned out sounded remarkably like Shakespeare, even if it created brand "new" words.
The results were goofy, but fun because it still almost made sense.
The most disappointing source text I ever put in was TS Eliot. The output was just about as much rubbish as the original text.
Harry Frankfurt's influential 2005 book (based on his influential 1986 essay), On Bullshit, offered a description of what bullshit is.
When we say a speaker tells the truth, that speaker says something true that they know is true.
When we say a speaker tells a lie, that speaker says something false that they know is false.
But bullshit is when the speaker says something to persuade, not caring whether the underlying statement is true or false. The goal is to persuade the listener of that underlying fact.
The current generation of AI chat bots are basically optimized for bullshit. The underlying algorithms reward the models for sounding convincing, not necessarily for being right.
I commented something similair on another post, but this is exactly why I find this phenomenon so hard to describe.
A teenager in a new group still has some understanding and has a mind. It knows many of the meaning of the words that are said. Sure, some catchphrases might be new, but general topics shouldn't be too hard to follow.
This is nothing like genAI. GenAI doesn't know anything at all. It has (simplified) a list of words that somehow are connected to eachother. But AI has no meaning of a wheel, what round is, what rolling is, what rubber is, what an axle is. NO understanding. Just words that happened to describe all of it. For us humans it is so difficult to understand that something uses language without knowing ANY of the meaning.
How can we describe this so our brains make sense that you can have language without understanding? The Chinese Room experiment comes close, but is quite complicated to explain as well I think.
Not an ELI5, sorry. I'm an AI PhD, and I want to push back against the premises a lil bit.
Why do you assume they don't know? Like what do you mean by "know"? Are you taking about conscious subjective experience? or consistency of output? or an internal world model?
For consistency of output and internal world models, however, their is mounting evidence to suggest convergence on a shared representation of reality. Eg this paper published 2 days ago: https://arxiv.org/abs/2405.07987
The idea that these models are just stochastic parrots that only probabilisticly repeat their training data isn't correct, although it is often repeated online for some reason.
A little evidence that comes to my mind is this paper showing models can understand rare English grammatical structures even if those structures are deliberately withheld during training:
https://arxiv.org/abs/2403.19827
People sometimes act like the models can only reproduce their training data, which is what I'm saying is wrong. They do generalise.
During training the models are trained to predict the next word, but after training the network is always effectively interpolating between the training examples it has memorised. But this interpolation doesn't happen in text space but in a very high dimensional abstract semantic representation space, a 'concept space'.
Now imagine that you have memorised two paragraphs that occupy two points in concept space. And then you interpolate between them. This gives you a new point, potentially unseen during training, a new concept, that is in some ways analogous to the two paragraphs you memorised, but still fundamentally different, and potentially novel.
The man takes the goat across the river first, then he returns alone and takes the boat across again. Finally, he brings the goat's friend, Mr. Cabbage, across the river.
LMAO
I asked what if the man can't swim...
In that case, the man can still take the goat across the river first. Then, he ties the goat to the boat and pulls the boat across the river using a rope, ensuring he stays on the side of the river where he can safely manage the boat without swimming.
I asked who Mr. Cabbage is...
My apologies for the confusion! Mr. Cabbage is a playful addition. In the classic riddle, there's usually no mention of Mr. Cabbage. It's just the man, the goat, and perhaps another item or animal depending on the version of the riddle. I just added Mr. Cabbage for some extra fun!
Then I asked what some other additions could be...
Certainly! You can add various characters or items to make the riddle more entertaining. Here are a few ideas:
Mr. Wolf: The man, the goat, and Mr. Wolf need to cross the river. But if the man leaves the wolf alone with the goat, the wolf will eat the goat.
Ms. Hen: The man, the goat, and Ms. Hen need to cross the river. But if the man leaves the hen alone with the goat, the goat will eat the hen.
Professor Fox: The man, the goat, and Professor Fox need to cross the river. But if the man leaves the fox alone with the goat, the fox will eat the goat.
Princess Diamond: The man, the goat, and Princess Diamond's precious gem need to cross the river. But Princess Diamond has warned that if anyone other than herself touches the gem, they'll be cursed.
Feel free to mix and match or come up with your own additions!
And the "solution"...
Ah, that's a fun twist! Let's see:
The man takes Mr. Cabbage across first, then returns alone.
Next, he takes Ms. Hen across but brings Mr. Cabbage back.
He then takes Princess Diamond's gem across, leaves it on the other side, and takes Mr. Wolf across.
The man leaves Mr. Wolf with Ms. Hen on the other side and returns alone.
He takes Princess Diamond across, but before he leaves, he swaps Mr. Cabbage with Princess Diamond's gem.
The man brings Mr. Cabbage across, leaving Princess Diamond's gem with Mr. Wolf.
Finally, he takes the goat across.
Everyone safely crosses the river without any mishaps!
This is the one I got from the house to get the kids to the park and then I can go to work and then I can go to work and get the rest of the day after that I can get it to you tomorrow morning to pick up the kids at the same time as well as well as well as well as well as well as well as well as well... I think my predictive text broke
I always thought the Chinese Room argument was kinda silly. It’s predicated on the idea that humans have some unique capacity to understand the world that can’t be replicated by a syntactic system, but there is no attempt made to actually define this capacity.
The whole argument depends on our intuition that we think and know things in a way inanimate objects don’t. In other words, it’s a tautology to draw the conclusion that computers can’t think from the premise that computers can’t think.
This is what I was going to point to. When I was in grad school, it was often referred to as the Symbol Gounding Problem. Basically it's a interdisciplinary research problem involving pragmatics, embodied cognition, and a bunch of others. The LLM people are now crashing into this research problem, and it's interesting to see how they react.
Imagine you were asked to start speaking a new language, eg Chinese. Your brain happens to work quite differently to the rest of us. You have immense capabilities for memorization and computation but not much else. You can't really learn Chinese with this kind of mind, but you have an idea that plays right into your strengths. You will listen to millions of conversations by real Chinese speakers and mimic their patterns. You make notes like "when one person says A, the most common response by the other person is B", or "most often after someone says X, they follow it up with Y". So you go into conversations with Chinese speakers and just perform these patterns. It's all just sounds to you. You don't recognize words and you can't even tell from context what's happening. If you do that well enough you are technically speaking Chinese but you will never have any intent or understanding behind what you say. That's basically LLMs.
That analogy is hard to come up with because the question of whether it even comprehends meaning requires first answering the unanswerable question of what meaning actually is and whether or not humans are also just spicy pattern predictors / autocompletes, since predicting patterns is like the whole point of evolving intelligence, being able to connect cause and effect in patterns and anticipate the future just helps with not starving. The line is far blurrier than most are willing to admit and ultimately hinges on our experience of sapience rather than being able to strictly define knowledge and meaning.
Instead it's far better to say that ML models are not sentient, they are like a very big brain that's switched off, but we can access it by stimulating it with a prompt.
Interesting thoughts! Now that I think about this, we as humans have a huge advantage by having not only language, but also sight, smell, hearing and taste. An LLM basically only has "language." We might not realize how much meaning we create through those other senses.
To add to this insight, there are many recent publications showing the dramatic improvements of adding another modality like vision to language models.
While this is my conjecture that is loosely supported by existing research, I personally believe that multimodality is the secret to understanding human intelligence.
Imagine that you have a random group of people waiting in line at your desk. You have each one read the prompt, and the response so far, and then add a word themself. Then they leave and the next person in line comes and does it.
This is why "why did you say ?" questions are nonsensical to AI. The code answering it is not the code that wrote it and there is no communication coordination or anything between the different word answerers.
Ok, I like this description a lot actually, it's a very quick and effective way to explain the effects of no backtracking. A lot of the answers here are either too reductive or too technical to actually make this behavior understandable to a layman. "It just predicts the next word" is easy to forget when the thing makes it so easy to be anthropomorphized subconsciously.
Imagine making a whole chicken out of chicken-nugget goo.
It will look like a roast chicken. It will taste alarmingly like chicken. It absolutely will not be a roast chicken.
The sad thing is that humans do a hell of a lot of this, a hell of a lot of the time. Look how well a highschooler who hasn't actually read the book can churn out a book report. Flick through, soak up the flavour and texture of the thing, read the blurb on the back to see what it's about, keep in mind the bloated over-flowery language that teachers expect, and you can bullshit your way to an A.
Only problem is, you can't use the results for anything productive, which is what people try to use GenAI for.
In the sense that the "argument" is an intuition pump. As an anti ai argument it's weak - you could replace the operator in the Chinese room with an operator in an individual neuron and conclude that our brains don't know anything, either
It's all just fancy statistics. It turns words into numbers. Then it finds patterns in those numbers. When you enter a prompt, it finds numbers that are similar and spits out an answer.
You can get into vectors and back propagation and blah blah blah but essentially it's a math formula. We call it AI but it's not fundamentally different than solving 2x + 4 = 10 for x.
After reading some of the comments and pondering this question myself, I think I may have thought of a good analogy that atleast helps me (even though I know fairly well how LLM's work)
An LLM is like a car on the road. It can follow all the rules, like breaking in front of a red light, turning, signaling etc. However, a car has NO understanding of any of the traffic rules it follows.
A car can even break those rules, even if its behaviour is intended (if you push the gas pedal at a red light, the car is not in the wrong because it doesn't KNOW the rules, it just acts on it).
Why this works for me is that when I give examples of human behaviour or animal behaviour, I automatically ascribe some sort of consciousness. An LLM has no conscious (as far as I know for now). This idea is exactly what I want to convey.
If I think of a car and rules, it is obvious to me that a car has no concept of rules, but still is part of those rules somehow.
Thing is a conscience (and any emotions, and feelings in general) is just chemicals affecting electrical signals in the brain... If a ML model such as an LLM uses parameters to affect electrical signals through its nodes then is it on us to say it can't have a conscience, or feel happy or sad, or even pain?
Sure the inputs and outputs are different, but when you have "real" inputs it's possible that the training data for "weather = rain" is more downbeat than "weather = sun" so is it reasonable to say that the model gets depressed when it's raining?
The weightings will change leading to a a change in the electrical signals, which emulates pretty closely what happens in our heads
Doesn't that depend on your view of consciousness and if you hold the view of naturalism?
I thought science is starting to find more and more that a 100% naturalistic worldview is hard to keep up. (E: I'm no expert on this topic and the information and podcast I listen to are probably very biased towards my own view on this. The point I'm making is that to say "we are just neurons" is more a disputed topic for debate than actual fact when you dive a little bit into neuroscience)
I guess my initial question is almost more philosophical in nature and less deterministic.
Compression algorithms can reduce most written text to about 20–25% of its original size—implying that that’s the amount of actual unique information it contains, while the rest is predictable filler.
Empirical studies have found that chimps and human infants, when looking at test patterns, will ignore patterns that are too predictable or too unpredictable—with the sweet spot for maximizing attention being patterns that are about 80% predictable.
AI researchers have found that generating new text by predicting the most likely continuation of the given input results in text that sounds monotonous and obviously robotic. Through trial and error, they found that, instead of choosing the most likely result, choosing one with around an 80% likelihood threshold produces results judged most interesting and human-like.
The point being: AI has stumbled on a method of mimicking the presence of meaning by imitating the ratio of novelty to predictability that characterizes real human thought. But we know that the actual content of that novelty is randomly chosen, rather than being a deliberate message.
So there's two different things to what you are asking.
(1) They don't know what (i.e. semantically) they are talking about.
This is probably not the case, and there's very good evidence over the past year in research papers and replicated projects that transformer models do pick up world models from the training data such that they are aware and integrating things at a more conceptual level.
For example, even a small toy GPT model trained only on chess moves builds an internal structure of the whole board and tracks "my pieces" and "opponent pieces."
(2) Why do they say dumb shit that's clearly wrong and don't know.
They aren't knowledge memorizers. They are very advanced pattern extenders.
Where the answer to a question is part of the pattern they can successfully extend, they get the answer correct. But if it isn't, they confabulate an answer in a similar way to stroke patients who don't know that they don't know the answer to something and make it up as they go along. Similar to stroke patients, you can even detect when this is happening with a similar approach (ask 10x and see how consistent the answer is or if it changes each time).
They aren't memorizing the information like a database. They are building ways to extend input into output in ways that match as much information as they can be fed. In this, they are beyond exceptional. But they've been kind of shoehorned into the initial tech demo usecase of "knowledgeable chatbot" which is a less than ideal use. The fact they were even good at information recall was a surprise to most researchers.
I'll see if I can find that article/paper about the chess moves. That sounds interesting!
Could it be that we ascribe an LLM with conceptual knowledge while in fact it is by chance? We as humans are masters at seeing patterns that aren't there. But then again, like another commenter said, maybe the question is more about conscience itself, and what that actually means. What it means to "understand" something.
It's not by chance - there's literally interventions where flipping a weight or vector results in the opposite behavior (like acting like a piece is in a different place, or playing well he badly no matter the previous moves).
But it's more that it seems unlikely that there's any actual 'feeling' or 'conscious' sentience/consciousness to understand beyond the model knowing what the abstracted pattern means in relation to the inputs and outputs. It probably is simulating some form of ego and self, but not actively experiencing it if it makes sense.
Part of the problem is hyperactive agency detection - the same biological bug/feature that fuels belief in the divine.
If a twig snaps, it could be nothing or someone. If it's nothing and we react as if it was someone, no biggie. If it was someone and we react as if it was nothing, potential biggie. So our brains are bias towards assuming agency where there is none, to keep us alive.
The short hand answer I’d try to give people is ‘it’s statistics’. Based on training data, there’s a certain chance of certain words being in proximity of each other. There’s no reasoning behind placement, other than whatever pattern is discernible from known situation.
All of this also touches upon an interesting topic. What it really means to understand something? Just because you know stuff and may even be able to apply it in flexible ways, does that count as understanding? I’m not a philosopher, so I don’t even know how to approach something like this.
Anyway, I think the main difference is the lack of personal experience about the real world. With LLMs, it’s all second hand knowledge. A human could memorize facts like how water circulates between rivers, lakes and clouds, and all of that information would be linked to personal experiences, which would shape the answer in many ways. An LLM doesn’t have such experiences.
Another thing would be reflecting on your experiences and knowledge. LLMs do none of that. They just speak whatever “pops in their mind”, whereas humans usually think before speaking… Well at least we are capable of doing that even though we may not always take advantage of this super power. Although, the output of an LLM can be monitored and abruptly deleted as soon as it crosses some line. It’s sort of like mimicking the thought processes you have inside your head before opening your mouth.
Example: Explain what it feels like to have an MRI taken of your head. If you haven’t actually experienced that yourself, you’ll have to rely on second hand information. In that case, the explanation will probably be a bit flimsy. Imagine you also read all the books, blog posts and and reddit comments about it, and you’re able to reconstruct a fancy explanation regardless.
This lack of experience may hurt the explanation a bit, but an LLM doesn’t have any experiences of anything in the real world. It has only second hand descriptions of all those experiences, and that will severely hurt all explanations and reasoning.
I feel like you're already not getting it and therefore giving too much credit to the LLM.
With LLMs it's not even about second hand knowledge, the concept of knowledge does not apply to LLMs at all, it's literally just about statistics, eg. what is the most likely next output after this token.
You could argue that embeddings constitute some kind of stored knowledge. But I do agree with your larger point, LLMs are getting to much credit because of the language we use to describe them
Someone once described is as T9 on steroids. It's like your mobile keyboard suggesting follow up words, just a lot more complex in size.
If an AI says something malicious, our brain immediatly jumps to "it has intent". How can we explain this away?
The more you understand the underlying concept of LLMs, the more the magic fades away. LLMs are certainly cool and can be fun but the hype around them seems very artificial and they're certainly not what I'd describe as "AI". To me, an AI would be something that actually has some form of consciousness, something that actually can form its own thoughts and learn things on its own through observation or experimentation. LLMs can't do any of those things. They're static and always wait for your input to even do anything. For text generation you can even just regenerate an answer to the same previous text and the replies can and will vary greatly. If they say something mean or malicious, it's simply because it is based on whatever they were trained on and whatever parameters they are following (like if you told them to roleplay a mean person).
have you played that game where everyone write a subjet and put it on a stack of paper, then everyone puts a verb on a different stack of paper, then everyone put an object on a third stack of paper, and you can even add a place or whatever on the next stack of paper. You end-up with fun sentences like A cat eat Kevin's brain on the beach. It's the kind of stuff (pre-)teen do to have a good laugh.
Chat GPT somehow works the same way, except that instead of having 10 paper in 5 stack, it has millions of paper in thousands of stack and depending on the "context" will choose which stack it draws paper from (To take an ELI5 analogy)
I think what makes it hard to wrap your head around is that sometimes, this text is emotionally charged.
What I notice is that it's especially hard if an AI "goes rogue" and starts saying sinister and malicious things. Our brain immediatly jumps to "it has bad intent" when in reality it's jus taking some reddit posts where it happened to connect some troll messages or extremist texts.
How can we decouple emotionally when it feels so real to us?
It's like your 5 year old daughter, relaying to you what she made of something she heard earlier.
That's my analogy. ChatGPT kind of has the intellect and ability to differentiate between facts and fiction of a 5 year old. But it combines that with the writing style of a 40 year old with a uncanny love of mixing adjectives and sounding condescending.
The way I've explained it before is that it's like the autocomplete on your phone. Your phone doesn't know what you're going to write, but it can predict that after word A, it is likelly word B will appear, so it suggests it. LLMs are just the same as that, but much more powerful and trained on the writing of thousands of people. The LLM predicts that after prompt X the most likelly set of characters to follow it is set Y. No comprehension required, just prediction based on previous data.
it's a spicy autocomplete. it doesn't know anything, does not understand anything, it does not reason, and it won't stop until your boss thinks it's good enough at your job for "restructuring" (it's not). any illusion of knowledge comes from the fact that its source material mostly is factual. when you're drifting off into niche topics or something that was missing out of training data entirely, spicy autocomplete does what it does best, it makes shit up. some people call this hallucination, but it's closer to making shit up confidently while not knowing any better. humans do that too, but at least they know when they do that
Hmm, now that I read this, I have a thought: it might also be hard to wrap our heads around this issue because we all talk about AI as if it is an entity. Even the sentence "it makes shit up" gives AI some kind of credit that it "thinks" about things. It doesn't make shit up, it is doing exactly what it is programmed to do: create good sentences. It succeeds.
Maybe the answer is just to stop talking about AI's as "saying" things, and start talking about GenAI as "generating sentences"? That way, we emotionally distance ourselves from "it" and it's more difficult to ascribe consciousness to an AI.
If you think they would understand the Chinese room experiment.
Imagine the computer playing mad libs with itself and it picks the least funniest answers to present.
Imagine if you tore every page out of every book in the library (about the things you mentioned) shuffled them and try to handout the first page that mostly makes sense to the last page given, now think about that with just letters.
Demonstration of its capacity to make mistakes, esp continuity errors.
It’s trained on an immense amount of data and that 89% of the time when someone asks the phrase “what is the answer to the ultimate question of life, the universe, everything?” It’s “42”, with an explanation that it’s a reference to Douglas Adam’s Hitchhiker’s Guide to the Galaxy
So, when you ask that… it just replies 42, and gives a mash up of informstion mostly consistent with the pop culture reference.
It has no idea what “42” is, whether it’s a real question or real answer, or entirely a joke. Only that’s how people in its training data responded.
(In this example, 11% of people are either idiots who’ve never read the book- losers- or people who are making some other random quip.)
I think a good example would be finding similar prompts that reliably give contradictory information.
It's sort of like auto pilot. It just believes everything and follows everything as if they're instructions. Prompt injection and jail breaking are examples of this. It's almost exactly like the trope where you trick an AI into realizing it's had a contradiction and it explodes.
Like parrots, LLM learn to immitate language (only, unlike parrots, it's done in a learning mode, not from mere exposure, and it's billions or even trillions of examples) without ever understanding its primary meaning, much less secondary more subtle meanings (such as how a person's certainty and formal education shapes their choice of words used for a subject).
As we humans tend to see patterns in everything even when they're not there (like spotting a train in the clouds or a christ in a burnt toast), when confronted with the parroted output from an LLM we tend to "spot" subtle patterns and from them conclude characteristics of the writter of those words as we would if the writter was human.
Subconsciously we're using a cognitive process meant to derive conclusions about other humans from their words, and applying it to words from non-humans, and of course out of such process you only ever get human chracteristics out so this shortcut yields human characteristics for non-humans - in logical terms it's as if we're going "assuming this is from a human, here are the human characteristics of the writer of this words" only because it's all subconscious we don't spot we're upfront presuming humanity to conclude the presence of human traits, i.e. circular logic.
This kind of natural human cognitive shortcut is commonly and purposefully taken advantage of by all good scammers, including politicians and propagandists, to lead people into reaching specific conclusions since we're much more wedded to conclusion we (think we) reached ourselves than to those others told us about.