AskAI.Free
Beta
Navigation
Back Professions
Back Dating
Back Writing Tools
Back Programming Tools
📚 Glossary

Tokenizer

In one line: The component that converts text into tokens (and back). Different models use different tokenizers, which is why a sentence has a different token count in GPT vs Claude.

A tokenizer is the component that converts text into tokens before sending to an LLM (and converts model output tokens back into text).

Different models use different tokenizers:

  • GPT-4 family uses cl100k_base
  • GPT-4o uses o200k_base
  • Claude uses Anthropic's tokenizer (similar to BPE)
  • Gemini uses SentencePiece

Same text, different token count depending on the model. English is roughly equivalent across tokenizers; non-English languages can be 2-3x more tokens. Code can also use more tokens than prose. See exact token counts here.

See it in action — ask any AI about tokenizer on AskAI.free.

Try it free →