Skip to main content
Models

Token

The smallest unit of text a language model processes — usually a word or a piece of a word.

A token is the smallest unit of text a language model processes. Tokenizers split text into tokens — often whole words for common English words, but sub-word pieces for rarer or longer words, and individual characters for punctuation.

Roughly, 1 token ≈ 4 characters of English text or about ¾ of a word. A 500-word essay is ~650 tokens.

Tokens matter for AI detection because most detection signals are computed at the token level — perplexity, surprisal, attention patterns. They also matter for pricing in detector APIs: AI Detector API's free tier is 1,000 *requests* per month, not 1,000 tokens, so you can process essays of any reasonable length within that quota.

Related terms

Move from definition to code

Free 1,000 requests/month — no credit card. Be detecting AI text in 5 minutes.