

You’re correct that a collection of deterministic elements will produce a deterministic result.
LLMs produce a probability distribution of next tokens and then randomly select one of them. That’s where the non-determinism enters the system. Even if you set the temperature to 0 you’re going to get some randomness. The GPU can round two different real numbers to the same floating point representation. When that happens, it’s a hardware-level coin toss on which token gets selected.
You can test this empirically. Set the temperature to 0 and ask it, “give me a random number”. You’ll rarely get the same number twice in a row, no matter how similar you try to make the starting conditions.
How could anyone know this?
Is there some test of understanding that humans can pass and AIs can’t? And if there are humans who can’t pass it, do we consider then unintelligent?
We don’t even need to set the bar that high. Is there some definition of “understanding” that humans meet and AIs don’t?