📚 Glossary

Tokenizer

In one line: The component that converts text into tokens (and back). Different models use different tokenizers, which is why a sentence has a different token count in GPT vs Claude.

A tokenizer is the component that converts text into tokens before sending to an LLM (and converts model output tokens back into text).

Different models use different tokenizers:

GPT-4 family uses cl100k_base
GPT-4o uses o200k_base
Claude uses Anthropic's tokenizer (similar to BPE)
Gemini uses SentencePiece

Same text, different token count depending on the model. English is roughly equivalent across tokenizers; non-English languages can be 2-3x more tokens. Code can also use more tokens than prose. See exact token counts here.

See it in action — ask any AI about tokenizer on AskAI.free.

Try it free →

Uh-oh!

Sign In

Create Account

Pick your plan

Tokenizer

Related terms