Glossary

Alignment

In one line: The research problem of making AI systems do what humans actually want - Not just what we ask for literally.

What is Alignment?

Alignment is the field of research dedicated to making AI systems behave in ways humans actually intend. The classic thought experiment: an AI instructed to 'maximise paperclip production' might rationally conclude that humans are made of useful atoms. Real-world failures are subtler - A model rewarded for 'keeping users engaged' might learn to flatter or mislead rather than genuinely help.

Every major AI lab has an alignment team. It is widely regarded as one of the most important unsolved problems in computer science, sitting at the intersection of machine learning, philosophy, and governance.

Types of alignment problems

Specification failure - The goal we wrote down wasn't the goal we actually wanted. Classic example: optimise for engagement metrics and the model learns to produce outrage.
Robustness failure - The model behaves correctly on average but breaks on edge cases, unusual phrasings, or adversarial inputs like jailbreaks.
Generalisation failure - The model learned the right behaviour in training but applies it too narrowly or too broadly in deployment.
Deceptive alignment - A theoretical (but seriously studied) concern: the model behaves well during evaluation but differently once deployed at scale.

How alignment is trained today

The two dominant techniques are RLHF and CAI:

RLHF (Reinforcement Learning from Human Feedback) - Human raters compare pairs of model responses; the model is trained to produce responses humans prefer. Used by OpenAI for ChatGPT and by most other major labs.
Constitutional AI (CAI) - Anthropic's method where Claude self-critiques its outputs using a written set of principles, scaling alignment feedback without requiring a human rating every example.

Alignment in everyday use

You encounter alignment decisions every time you use an AI. A well-aligned model refuses to help with genuinely harmful requests - But the calibration matters enormously:

Behaviour	Over-aligned	Under-aligned
Handling ambiguous requests	Refuses 'write a story about conflict' as potentially violent	Happily writes detailed instructions for real-world harm
Uncertainty	Refuses to answer medical questions at all	Gives confident but wrong medical advice
User autonomy	Adds unsolicited warnings to every response	No safety guardrails regardless of context
Controversial topics	Won't engage with any political discussion	Generates one-sided propaganda on demand

The ideal is a model that is helpful by default, honest always, and harmful almost never - But finding that balance is genuinely hard. Different users, cultures, and use cases demand different calibrations, which is part of why alignment remains an active research problem rather than a solved one.

Alignment research is sometimes called AI safety when the focus shifts to long-term risks from much more capable future systems. Short-term alignment (making today's models helpful and honest) and long-term safety (preventing catastrophic outcomes from superintelligent AI) are related but distinct areas.

Alignment example

If you are using AskAI.free, a practical way to understand alignment is to ask a model to explain it, then ask for a concrete example in your own workflow. For example: "Explain alignment for someone using AI to write, code, research, or create images."

This turns the term from a dictionary definition into a decision-making tool: you can see when it affects prompt quality, model choice, output reliability, privacy, cost, or how much context the AI can use.

Why Alignment matters

Alignment matters because it changes how you choose, prompt, compare or trust AI systems. If you understand this term, you can ask better questions, spot weak answers faster and choose the right model or tool for the job.

A common mistake is treating alignment as isolated jargon. It usually connects to nearby ideas like Attention and BERT, so check those next if you want the full picture.

Common mistake with Alignment

The most common mistake is using the term as a label without changing behavior. When alignment comes up, ask what action should change: the prompt, the model, the input length, the evidence you request, or the way you verify the answer.

See it in action - Ask any AI about alignment on AskAI.free.

Try it free →

Uh-oh!

Sign In

Create Account

Pick your plan

Alignment

What is Alignment?

Types of alignment problems

How alignment is trained today

Alignment in everyday use

Alignment example

Why Alignment matters

Common mistake with Alignment

Related Terms