zlyuk | Tavern table pitch: Of LLMs and Temperature

I say, a double metaphor is always a beauty. And if it's realized to be a twin - even more so. -- Man, what the heck are you? -- well, hear this and see for yourself.

Every toddler knows that LLMs have a tunable parameter called "temperature". If you pop it up, the bot starts mumbling. Temperature has a long history in computing. It's a concept borrowed from physics: a particle moves according to a field or a flow that eventually should settle down, but very slowly (who said 'glass'?). So it will usually steadily proceed towards its final position, but a high temperature would allow it occasionally to go against the directive and change the trajectory.

Actually, I do not understand enough physics to explain this model, but some 70 years ago someone brought this seed from physics to algorithm science and it grew into a sizeable garden of search and sampling algorithms, and much more.

How do you find a minimum of a function (e.g., the answer with smallest error)? Think of the graph of your function as a surface. Drop a ball on it somewhere and let it roll, by gravity. It will settle in a point with locally lowest value. But if you allow your ball to drive upward sometimes, you may be able to discover a much deeper valley that lies just across the hill. The frequency of such random movements against gravity is controlled by the temperature. One famous algorithm starts with high temperature to enable hectic initial exploration, and then cools down to converge to a local minimum.

Back to LLMs. Every toddler knows that the LLM acts by computing the next word at each step. But the toddler's younger sibling also knows that it actually computes a distribution, that is, a probability for each possible word. Then it would choose the one with the maximal probability (i.e., minimal error). When the temperature is high, it may occasionally choose a less probable word instead. Higher temperature would cause more frequent appearance of less probable words, and the model will become non-deterministic and rambling. So here is the first prong of the metaphor: high temperature is more jolting motion, fluctuations and randomness. Lower temperature - calm, order and determinism. Coming directly from physics, the best of sciences.

The complementary metaphoric branch is of course human body temperature. Look at J. Doe, a reasonable citizen with sane and predictable behaviours. Let her get some fever - and she starts hallucinating, making unexpected physical and linguistic turns and jerks, becomes irritable and random. Cool a citizen down below 32 deg Celcius - and no nonsense will come of him ever.

So it seems that two distinct senses of temperature both give us a useful analogy. But are they indeed that different? The toddler's nanny is well aware of the fact that our enzymes are tuned to work within a small interval about the normal body temperature. Increase the heat and you get protein conformation changes, unstable or undesired rate of biochemical reactions, shifts in half-life time, etc. In particular, neurons fire in unexpected directions, well-established connections falter and new ones are too easy to create. In the opposite direction, turn it cold, and nothing wants to move anymore, reactions shut down, rigidity crawls in.

Suddenly we see, that these two stories are not unrelated parables coming from different realms of wisdom, but rather a manifestation of a principle of a fundamental nature. We also see how little we understand about the action of such principle on ourselves: i've met no clear and comprehensive account of the adventures of our body and mind under high fever (links, anyone?).
What to do of it? Hell if I know. Once in the past, the person's disposition was explained by the unique blend of his body humours. Maybe since then we've got too much carried away imagining our psyche as a high-level programme, running by an interpreter, inside a sandbox, on a virtual machine, under an operating system, while the physics and chemistry of our body are only the hardware. Maybe the levels of abstraction are not that separated. Just think of it.