Perplexity
How surprised a language model is by a given piece of text. Lower means the text looks more model-generated.
Perplexity is a measure of how well a language model predicts a sequence of text. Concretely, it's the exponentiated cross-entropy of the model on the text. The lower the perplexity, the more "expected" the text was from the model's point of view.
That makes perplexity a useful AI-detection signal: text generated by a model tends to have low perplexity under that same model (or a closely related one), while human-written text. Full of idiosyncratic word choices and unexpected turns. Tends to have higher perplexity.
Perplexity is rarely used alone in production detection. It's combined with burstiness, classifier confidence, and other features. Note that perplexity in this sense is a statistical measure, not the search company Perplexity AI.
Related terms
- Burstiness· A measure of variation in sentence length, structure, and complexity across a piece of text.
- AI detection· The task of identifying text that was written by a large language model rather than a human.
- Token· The smallest unit of text a language model processes. Usually a word or a piece of a word.
Move from definition to code
Free 1,000 requests/month, no credit card. Be detecting AI text in 5 minutes.