Token
In one line: The unit AI models read and write in. Roughly 4 characters or 0.75 words. Pricing and context windows are measured in tokens.
What is Token?
A token is the basic unit that language models read and write. It is not a word - It is roughly 3–4 characters or part of a word. The word 'tokenization' typically splits into four tokens: token, iz, ation, plus the leading space. LLMs never process raw text directly; they convert everything to tokens first via a tokenizer, then generate output one token at a time.
Token conversion table
| Unit | Tokens | Approx. words | Example |
|---|---|---|---|
| One character | ~0.25 | - | 'a' |
| One short word | 1 | 1 | 'cat' |
| One long word | 2–4 | 1 | 'tokenization' |
| Short paragraph | ~100 | ~75 | Three or four sentences |
| Typical email | ~1,000 | ~750 | A detailed business email |
| Short article | ~4,000 | ~3,000 | A 3,000-word blog post |
| Novel (e.g. HP book 1) | ~75,000 | ~77,000 | Harry Potter and the Philosopher's Stone |
Why tokens matter
- API pricing - Charged per million input and output tokens. Output tokens cost more than input tokens on most APIs. Check plan pricing for current rates.
- Context window limits - Measured in tokens. Claude Sonnet 4's 200 K-token window fits roughly 150,000 English words - An entire novel. See context window for more.
- Speed and latency - More output tokens means a longer wait. Concise prompts reduce both cost and latency.
- Plan allowances - AskAI.free Pro and Max plans include generous monthly token budgets; the FAQ explains what each plan includes. Use the token counter to estimate how much a task will use.
Non-English and code token efficiency
| Language / content type | Tokens per 100 words | vs English |
|---|---|---|
| English | ~133 | Baseline |
| Spanish / French | ~150 | ~13% more |
| Arabic | ~170 | ~28% more |
| Chinese (GPT-4o) | ~100 | ~25% fewer (improved tokenizer) |
| Chinese (GPT-3.5) | ~250 | ~88% more |
| Code (Python/JS) | ~110 | ~17% fewer |
| Dense prose / legal text | ~140 | ~5% more |
Older tokenizers were heavily optimised for English, making non-English text significantly more expensive. The o200k_base tokenizer used in GPT-4o dramatically improved efficiency for Chinese, Japanese, Korean, and Arabic. Use the free token counter to see exact counts for your text and model.
Token example
If you are using AskAI.free, a practical way to understand token is to ask a model to explain it, then ask for a concrete example in your own workflow. For example: "Explain token for someone using AI to write, code, research, or create images."
This turns the term from a dictionary definition into a decision-making tool: you can see when it affects prompt quality, model choice, output reliability, privacy, cost, or how much context the AI can use.
Why Token matters
Token matters because it changes how you choose, prompt, compare or trust AI systems. If you understand this term, you can ask better questions, spot weak answers faster and choose the right model or tool for the job.
A common mistake is treating token as isolated jargon. It usually connects to nearby ideas like Tokenizer and Training data, so check those next if you want the full picture.
Common mistake with Token
The most common mistake is using the term as a label without changing behavior. When token comes up, ask what action should change: the prompt, the model, the input length, the evidence you request, or the way you verify the answer.
See it in action - Ask any AI about token on AskAI.free.
Try it free →