Fine-tuning
In one line: Training an existing model on additional examples to specialise it for a domain - Like making ChatGPT write in your company's voice.
What is Fine-tuning?
Fine-tuning takes a pre-trained foundation model and continues training it on your own dataset. The result is a model that has absorbed your domain's patterns, style, and conventions - so it writes in your brand voice, follows your formatting rules, or responds correctly to your product-specific terminology without those instructions needing to appear in every single prompt.
Fine-tuning vs prompt engineering vs RAG
| Approach | Cost | Adds new knowledge? | When to use |
|---|---|---|---|
| Prompt engineering | Free | No | Most tasks - always try this first |
| RAG | Low-medium | Yes (retrieved) | Large knowledge bases, factual Q&A over documents |
| Fine-tuning | Medium-high | No (style only) | Consistent tone/format, cost reduction at scale, specialised domains |
When fine-tuning is worth it
- Consistent output style - You need thousands of outputs in a specific tone, persona, or format that prompt engineering produces inconsistently.
- Highly specialised domains - Medical notes, legal briefs, financial filings with rigid formatting requirements that chat models don't reliably produce out of the box.
- Reducing prompt length - A fine-tuned model 'knows' the rules and doesn't need a long system prompt each time, saving tokens and cost at scale.
- Scale economics - If you're making millions of API calls, a fine-tuned smaller model can replace an expensive frontier model at 10x lower cost.
Fine-tuning does not add new factual knowledge to the model. If you need the model to know specific facts, use RAG instead. Fine-tuning changes style and behaviour, not what the model knows.
LoRA and parameter-efficient fine-tuning
Full fine-tuning updates every weight in the model - expensive in GPU-hours and storage. LoRA (Low-Rank Adaptation) inserts small trainable matrices into each layer and updates only those, leaving the base model frozen. This cuts compute and storage costs dramatically while achieving similar results for most tasks. QLoRA extends this with 4-bit quantization, making fine-tuning possible on a single consumer GPU. Most open-weights fine-tuning of models like Llama or Mistral uses LoRA or QLoRA. See also: parameters, training data.
Fine-tuning providers and cost
| Provider | Base model | Training cost | Notes |
|---|---|---|---|
| OpenAI | GPT-4o, GPT-4o-mini | ~$25/M training tokens | Managed, no GPU needed |
| Anthropic | Claude Haiku | Contact sales | Enterprise only |
| Together AI | Llama 3, Mistral | $1-$5 per M tokens | Open-weights, LoRA supported |
| Replicate / RunPod | Any open model | $0.50-$2/hr GPU | Full control, lowest cost |
See the pricing page and the guides section for practical walkthroughs on choosing between fine-tuning, RAG, and prompt engineering for your use case.
Fine-tuning example
If you are using AskAI.free, a practical way to understand fine-tuning is to ask a model to explain it, then ask for a concrete example in your own workflow. For example: "Explain fine-tuning for someone using AI to write, code, research, or create images."
This turns the term from a dictionary definition into a decision-making tool: you can see when it affects prompt quality, model choice, output reliability, privacy, cost, or how much context the AI can use.
Why Fine-tuning matters
Fine-tuning matters because it changes how you choose, prompt, compare or trust AI systems. If you understand this term, you can ask better questions, spot weak answers faster and choose the right model or tool for the job.
A common mistake is treating fine-tuning as isolated jargon. It usually connects to nearby ideas like Foundation model and GPT, so check those next if you want the full picture.
Common mistake with Fine-tuning
The most common mistake is using the term as a label without changing behavior. When fine-tuning comes up, ask what action should change: the prompt, the model, the input length, the evidence you request, or the way you verify the answer.
See it in action - Ask any AI about fine-tuning on AskAI.free.
Try it free →