Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)ST
Sabir_Hajjar [comrade/them] @ Sabir_Hajjar @hexbear.net
Posts
0
Comments
1
Joined
4 yr. ago

  • It is not true that a pretrained transformer is reducible to a Markov chain in the same way that a computation is reducible to a turing machine. While any computation could be achieved by a turing machine (given arbitary time), not every pretrained transformer could be described by a (sufficiently complicated) Markov chain of arbitrary length.

    One reason is the attention mechanism, which allows a transformer to weight some tokens in a sequence as differently than others.

    But the biggest difference is that a Markov chain can only consider an entire state at once, and produce the next state sequentially, while transformers can consider many parts of a sequence as individual states. Transformers are also capable of autoregression (they're basically just RNNs+attention heads), while Markov chains are not - new states transform old states, and do not simply append.

    Now, if you take the attention mechanism out of the transformer, you're basically just left with an RNN, and if you then take the autoregression out of the RNN, you've got a Markov chain in the limit, probably. So you could argue a Markov chain is in the limit of an LLM (kind of obvious, since you should expect an LLM trained on a Markov chain to predict it totally), but you could never argue an LLM can be produced in the limit of a Markov chain (you can train any arbitrarily large Markov chain and it will never get anywhere close to an LLM).