Models · Glossary

Token

The smallest unit of text a language model processes. Usually a word or a piece of a word.

A token is the smallest unit of text a language model processes. Tokenizers split text into tokens. Often whole words for common English words, but sub-word pieces for rarer or longer words, and individual characters for punctuation.

Roughly, 1 token ≈ 4 characters of English text or about ¾ of a word. A 500-word essay is ~650 tokens.

Tokens matter for AI detection because most detection signals are computed at the token level. Perplexity, surprisal, attention patterns. They also matter for pricing in detector APIs: AI Detector API's free tier is 1,000 *requests* per month, not 1,000 tokens, so you can process essays of any reasonable length within that quota.

Related terms

Perplexity· How surprised a language model is by a given piece of text. Lower means the text looks more model-generated.
AI detection· The task of identifying text that was written by a large language model rather than a human.

Move from definition to code

Free 1,000 requests/month, no credit card. Be detecting AI text in 5 minutes.

Get my free API key Back to glossary