Token
The smallest unit of text a language model processes — usually a word or a piece of a word.
A token is the smallest unit of text a language model processes. Tokenizers split text into tokens — often whole words for common English words, but sub-word pieces for rarer or longer words, and individual characters for punctuation.
Roughly, 1 token ≈ 4 characters of English text or about ¾ of a word. A 500-word essay is ~650 tokens.
Tokens matter for AI detection because most detection signals are computed at the token level — perplexity, surprisal, attention patterns. They also matter for pricing in detector APIs: AI Detector API's free tier is 1,000 *requests* per month, not 1,000 tokens, so you can process essays of any reasonable length within that quota.
Related terms
Move from definition to code
Free 1,000 requests/month — no credit card. Be detecting AI text in 5 minutes.