Skip Navigation

ChatGPT's hallucination problem is getting worse according to OpenAI's own tests and nobody understands why

www.pcgamer.com

ChatGPT's hallucination problem is getting worse according to OpenAI's own tests and nobody understands why

42 comments
  • Who knew the self reinforcing markov chain would do this?

    Literally anyone with a brain in compsci 5 years ago? Nah that's "hallucinations".

    • I don’t know what a Markov chain is nor do I know dick about computer science so correct me if I’m wrong, but I’m assuming the ai is picking up its own shitty data and regurgitating it but worse? This just over and over again?

      The more I think about this the ai is just going to destroy the internet completely huh?

      • a Markov chain predicts the next state based on the current state. If today is sunny, how likely is it that tomorrow will be rainy? Mathematically, this can be reduced to a Markov chain (so we don't have to take into account the season, weather patterns or anything like that for this example).

        But a Markov chain isn't just saying how likely it is to be rainy on a given day, but how likely it is to be rainy tomorrow based on today. If today is sunny, there's a let's say 70% chance that tomorrow will be rainy. If today is rainy, there's a 40% chance that tomorrow will be rainy (and conversely a 60% chance that tomorrow will be sunny because possible states must always equal 100%).

        Autocorrect works similarly. It predicts the next word based on the current word you've typed out. LLMs are kinda glorified markov chains because they also predict words (called tokens, which are about 3 to 4 characters) but they do it over a much larger "current state", which is the chat history, custom instructions if you gave any on chatgpt, etc. The context that is passed on with your prompt consists of several tokens and the AI generates one token at a time until little by little it's formed a full response that it outputs.

        In this way the markov chain of LLM is if I give it the sentence "Hello! My name is" for example, it will predict which token is the most likely to follow and should output it. We can assume this should be a name but truthfully we don't know the exact probabilities of the next state. If I give it "Hello, my name is" - changing just one character might also change the prediction weighting. I say "might" because AI is a black box and we don't really see what happens when the data passes through the neurons.

        However if you send that sentence to chatGPT it will correctly tell you that your message got cut off and asks you to finish it. They do some post-production fine-tuning to get it to do that. Compare to deepseek without the reasoning model:

      • Basically, you're correct. All the rest below is just me blabbin'.

        I read a bit about markov chains a few times, and we talked about them in a few of my comp sci classes, but I'm not an expert or anything. But the basic explanation of them is that its a function that returns the probability which state should occur next. A fancy way of saying "What the odds of X, Y, or Z happening".

        The classic example is you feed the chain something like a book, or all the books you have. Then you can ask what's likely to come after a word, and it will either spit out a single word or it gives you a list of the likelihood of all words it "knows". Texting apps, etc, word suggestion algorithms most likely used them.

        You can train them on some sample data, then "customize" it so it "learns" from what the end user types. So my phone likes to say one person's name after I type "hi" because I've typed "hi x" a lot. You can also have them "look back" more words, classic chains only look at the most current input. They're much more complex and the time-space-energy required to compute them, so in general look back has only extended a few dozen words (in general).

        All of that might sound familiar if you've heard much about how LLMs work. Because (in my opinion) its the same. I haven't seen a compelling argument that LLMs/Machine Learning aren't reducible to Markov Chains yet. (In the same way all computers are reducible to Turing Machines. Both machines suffer the same limits, one just hides some of the math so a normal human can use them.)

        That isn't to say they're more powerful (LLMs can look back a few hundred words fairly ok), but they suffer the same limits Markovs inherently do. IE: The output is only as good as the inputs, deciding the output is subjective (because you can choose either the most common, or pick randomly, or ...), and they fundamentally don't "know" anything beyond their state machine. They can't verify. They can't research. As of right now, both systems can't even look at their original inputs. (And if they included those inputs, that'd take on a few terabytes of data.) All those books, text messages, reddit posts, all reduced to relations of words represented by probabilities.

        I swear feeding a Markov's output to itself was discussed in one of my classes, and the professor was like, yeah it improves it a bit, but if you do it more then X amount then the whole thing falls apart. This was before 2020, in undergrad classes. People 100% said the same about LLMs before they started posting online, and now the silicon valley dweebs are hitting the same problem. I swear tech bros, besides being the most hiterite prone, love to recreate disasters. Same happened with "crypto" shit and basic financial schemes.

        TLDR: Fuck "ai". My AI class sucked because it was in Java, but at least we were making a silly game AI (specific game withheld for privacy, but it was 20+ years old). If they're not talking about min-maxing and search trees and master agents, and instead are pushing this crap, comp sci is overcooked.

        • It is not true that a pretrained transformer is reducible to a Markov chain in the same way that a computation is reducible to a turing machine. While any computation could be achieved by a turing machine (given arbitary time), not every pretrained transformer could be described by a (sufficiently complicated) Markov chain of arbitrary length.

          One reason is the attention mechanism, which allows a transformer to weight some tokens in a sequence as differently than others.

          But the biggest difference is that a Markov chain can only consider an entire state at once, and produce the next state sequentially, while transformers can consider many parts of a sequence as individual states. Transformers are also capable of autoregression (they're basically just RNNs+attention heads), while Markov chains are not - new states transform old states, and do not simply append.

          Now, if you take the attention mechanism out of the transformer, you're basically just left with an RNN, and if you then take the autoregression out of the RNN, you've got a Markov chain in the limit, probably. So you could argue a Markov chain is in the limit of an LLM (kind of obvious, since you should expect an LLM trained on a Markov chain to predict it totally), but you could never argue an LLM can be produced in the limit of a Markov chain (you can train any arbitrarily large Markov chain and it will never get anywhere close to an LLM).

      • It happened way fucking quicker than I expected

42 comments