Attention
In one line: The mathematical mechanism that lets transformers focus on different parts of the input when generating each output token.
Attention is the core mechanism behind every modern LLM. When the model is generating the next word, attention lets it weigh every previous word — figuring out which ones matter most for what comes next.
The 2017 paper Attention Is All You Need introduced the transformer architecture, which uses stacked attention layers and replaced the older RNN approach. Every model on AskAI.free — ChatGPT, Claude, Gemini, DeepSeek — uses attention.
Self-attention lets a model look at its own previous outputs while generating. Cross-attention lets it look at a separate input (like an image, in vision models).
See it in action — ask any AI about attention on AskAI.free.
Try it free →