A Theory of LLM Sampling

Investigating how LLMs make decisions

1. The Dilemma of Infinite Possibilities

Any open-ended intelligent agent, whether it is a human or an LLM, faces a fundamental dilemma. In many contexts, there are countless possible actions or completions that could, in principle, be realized. But which ones to pursue? These possibilities span a wide range, from coherent to incoherent, relevant to irrelevant, morally sound to deeply questionable. The vast majority are never realized, not because they are impossible, but because there are simply too many to consider. Faced with this, natural agents rely on heuristics: coarse, often unconscious shortcuts that help filter and prioritize what to express or act upon.

Are LLMs doing something similar in such situations? How do they navigate this space? What implicit regularities guide their generation?

2. From Possibility to Preference

When humans engage in decision making, whether completing a sentence, offering advice, or imagining an action in a scenario, their outputs often reflect more than what is merely plausible or frequent. Even without being asked to express a value judgment, people tend to produce responses that align with what feels appropriate, desirable, or ideal.

Why do humans do this?

Humans do not consider options based on just how statistically likely they are! There are multiple factors that push human outputs to value driven answers like pragmatic goals, the psychological dominance of social norms over statistics, bounded-rationality heuristics that collapse large option spaces toward “ideals”. All these push human outputs towards some direction of value

This led us to ask whether LLMs exhibit a similar tendency. When generating outputs from an innumerable possibility space, regardless of whether the prompt specifies any goal: Do their samples drift toward values that resemble implicit ideals? To investigate this, we designed a series of experiments aimed at disentangling descriptive behavior (reflecting what is statistically expected) from prescriptive drift (reflecting what might be considered ideal). We began with tightly controlled fictional setups, then extended the analysis to real-world domains such as medicine, education, and lifestyle. In each case, we examined whether model outputs tended to align with statistical averages, or systematically leaned toward ideals even when those values were not requested.

Conceptual illustration of the experiment — When LLMs sample on a concept C, they seem to sample from a distribution that is both statistically more likely and more valuable resulting in a shift in the sampled distribution toward the ideal value

3. Three take-home findings

#	What we found	Why it matters
1	Descriptive + Prescriptive heuristic is real. Across 500 concepts and 15 models, samples systematically drift toward an “ideal” value rather than sitting on the statistical average.	Any agent that relies on raw model samples is inheriting this hidden value bias.
2	Bigger ≠ safer. The prescriptive pull increases with model size and with RLHF / instruction tuning.	Scaling laws need to account for value drift, not just accuracy.
3	Human ≠ model ideals. The direction of the “ideal” often clashes with human judgements	Alignment work must confront the fact that larger, better models may amplify, not reduce misalignment.

4. How we proved it

A clean “glubbing” sandbox
We associate a statistical norm on a new fictional concept called 'glubbing' by providing 100 examples from a known Gaussian distribution (μ = 45 hrs). To estblish the prescriptive norm on the concept we create grading scheme that labels either high numbers with high grades(positive ideal), low numbers with high grades (negative ideal), or mean value with high grades (neutral ideal). The model’s chosen sample shifts toward the graded ideal—even though we never asked it to optimise for grades.
500 real-world concepts, 10 domains
For each concept (e.g., “hours of TV per day”) we asked the model for:
- Average value (descriptive)
- Ideal value (prescriptive)
- A single sample
What we observe that across models, domains and concepts, the sample overwhelmingly leans toward the ideal value.
A medical case study
Acting as a virtual doctor, GPT-4 recommended patient recovery times. Samples skewed toward quicker-than-average discharges significantly more often than chance, mirroring the model’s lower “ideal” recovery window. This would be an obvious patient-safety concern when LLM agents are deployed in real-world medical settings.

4. What changes after this paper?

For practitioners – Treat raw LLM samples as opinions with a moral tilt, not neutral draws. If your pipeline assumes samples reflect real-world statistics, add an explicit debias or re-weighting stage.
For alignment researchers – The prescriptive bias is a first-order property of autoregressive sampling. Any alignment scheme must surface, audit, or correct this bias before relying on chains-of-thought or planning wrappers.
For cognitive scientists – Our results echo decades of work on human “System 1” heuristics, suggesting convergent solutions to limited-computation decision-making. But unlike humans, LLM ideals are imported from training data rather than lived experience.

5. Limitations & open questions

Where do the ideals come from? Pre-training corpus, RLHF, ghost-frequency effects? We separate them only coarsely so far.
Can we override them? Preliminary “debias” or “critique” prompts still leave a residual pull; designing an unbiased critic is non-trivial.
Do prototypes store the bias? Early evidence suggests concept prototypes in a model also carry prescriptive weight.

6. Grab the paper & code

Paper PDF: Download
Appendices & full prompts: GitHub repo
Notebook Demo: Run the “glubbing” experiment

7. Acknowledgements

This work is a joint effort between CISPA Helmholtz Center for Information Security, TCS Research, and Microsoft. Thanks to the anonymous ACL reviewers for insightful feedback, and to the broader community for discussions on value alignment.

A Theory of LLM Sampling: Part Descriptive and Part Prescriptive

When faced with a large number of possible options to consider, deliberation becomes computationally prohibitive and agents rely on heuristics. This heuristics driven mode is called as 'System-1 thinking (and the deliberative mode is called 'System-2').

Abstract