📚 Glossary

Reinforcement learning (RL)

In one line: A training technique where the AI improves by trial and error, getting rewards for good outputs. The 'F' in RLHF.

Reinforcement learning (RL) is a machine-learning technique where an agent learns by trial and error — getting rewards for good actions and penalties for bad ones.

For LLMs, RL is used in RLHF (Reinforcement Learning from Human Feedback). After pre-training, humans rank pairs of model outputs; the model is trained to produce outputs humans rate higher. This is how raw, unhinged base models get turned into well-behaved chatbots.

Newer techniques: RLAIF (RL from AI feedback — used in Constitutional AI), DPO (Direct Preference Optimisation — simpler alternative to PPO).

Modern reasoning models like DeepSeek R1 use RL to learn how to do chain-of-thought effectively.

See it in action — ask any AI about reinforcement learning (rl) on AskAI.free.

Try it free →

Uh-oh!

Sign In

Create Account

Pick your plan

Reinforcement learning (RL)

Related terms