Why do all text LLMs, no matter how censored they are or what company made them, all have the same quirks and use the slop names and expressions?

HammyMcBurgers@sh.itjust.works · 27 days ago

Why do all text LLMs, no matter how censored they are or what company made them, all have the same quirks and use the slop names and expressions?

T156@lemmy.world · 26 days ago

It’s something of the law of averages. At their core, an LLM is a sophisticated text prediction algorithm, that boils down the entire corpus of human language into numeric tokens, that it averages out, and creates entire sentences by determining the next most likely word to fill the space.

Given enough data, and you need a tremendous amount of it for an LLM, patterns start to come about, and many of those end up the ones that we see in LLMs.

kromem@lemmy.world · 25 days ago

It’s more like they are a sophisticated world modeling program that builds a world model (or approximate “bag of heuristics”) modeling the state of the context provided and the kind of environment that produced it, and then synthesize that world model into extending the context one token at a time.

But the models have been found to be predicting further than one token at a time and have all sorts of wild internal mechanisms for how they are modeling text context, like building full board states for predicting board game moves in Othello-GPT or the number comparison helixes in Haiku 3.5.

The popular reductive “next token” rhetoric is pretty outdated at this point, and is kind of like saying that what a calculator is doing is just taking numbers correlating from button presses and displaying different numbers on a screen. While yes, technically correct, it’s glossing over a lot of important complexity in between the two steps and that absence leads to an overall misleading explanation.

khepri@lemmy.world · edit-2 25 days ago

I like the analogy, I have a lot of trouble explaining to people that LLMs are anything more than just a “most likely next token” predictor. Because that is exactly what an LLM is, but saying it that way is so abstract that it abstracts away everything that is actually interesting about them lol. It’s like saying a computer is “just” a collection of switches than can be a 1 or 0. Which, yeah, base level, not wrong, but also not all that useful to someone actually curious about what they are and what they can do.