How instruction tuning makes a model helpful
What you’ll learn
Section titled “What you’ll learn”This is lesson 1 of Phase 4 (How models learn to be helpful) in Track 5 (AI Foundations). Phase 3 ended with a pretrained base model: trillions of tokens of compute spent making it a great autocompleter. The model is impressive in a specific way and limited in another. It knows facts, languages, and idioms. It does not follow instructions. You write “summarize this paragraph” and it produces more paragraph, not a summary. The chat assistants you actually use feel different.
Supervised fine-tuning (SFT) is the first of the post-training stages that closes that gap. It uses the same training objective as pretraining (predict the next token) on a much smaller, much higher-quality dataset of instruction-response pairs hand-written by humans. The volume drop is dramatic: pretraining sees trillions of tokens; SFT sees thousands to hundreds of thousands of curated examples and is enough. After enough examples, the model has learned the pattern of instruction-following. The knowledge was already in the weights from pretraining; SFT teaches the model when to apply it. The lesson also names the parameter-efficient variant (LoRA: small low-rank adapters on top of frozen base weights), describes what kind of model you have at the end (instruction-following but not yet preference-aligned), and surfaces the structural limitation that makes the next lesson necessary: SFT can only teach the model what to predict, not what not to predict.
Where this fits
Section titled “Where this fits”This is lesson 1 of Phase 4, How models learn to be helpful, and the phase opener. Phase 3 ended with a pretrained base model and the engineering tricks that make pretraining tractable. Phase 4 is the post-training arc: how you turn a base model into something useful. This lesson covers SFT as the foundation. The next two lessons in the phase complete the post-training arc: Preferences into reward signals (how human preference data becomes a reward model that scores responses) and RLHF and DPO (how those reward signals actually update the model). The previous lesson in Track 5 was Why precision matters: quantization and mixed precision (Phase 3 closer).
Before you start
Section titled “Before you start”Prerequisites: the Phase 3 closer on quantization and mixed precision, and ideally the rest of Phase 3. You should be comfortable with what a pretrained base model is, what its objective was during training, and the rough scale of pretraining (trillions of tokens, months of compute). No new math beyond the previous phases.
By the end, you’ll be able to
Section titled “By the end, you’ll be able to”- Distinguish a base model from an instruction-tuned model and explain what each one can and cannot do
- Describe the SFT mechanism (same next-token loss as pretraining, applied to curated instruction-response pairs)
- Explain why SFT teaches response shape rather than new knowledge, and why high-quality examples in the thousands are often enough
- Recognize LoRA as a parameter-efficient way to do SFT (small low-rank adapters on top of frozen base weights)
- Identify the structural limitation of SFT (no negative signal, only positive examples) that motivates the next lesson on preference data and reward modeling
Time and difficulty
Section titled “Time and difficulty”- Read time: about 18 minutes
- Practice time: about 12 minutes (a base-vs-SFT classification exercise plus flashcards)
- Difficulty: standard