Skip to content

How instruction tuning makes a model helpful

This is lesson 1 of Phase 4 (How models learn to be helpful) in Track 5 (AI Foundations). Phase 3 ended with a pretrained base model: trillions of tokens of compute spent making it a great autocompleter. The model is impressive in a specific way and limited in another. It knows facts, languages, and idioms. It does not follow instructions. You write “summarize this paragraph” and it produces more paragraph, not a summary. The chat assistants you actually use feel different.

Supervised fine-tuning (SFT) is the first of the post-training stages that closes that gap. It uses the same training objective as pretraining (predict the next token) on a much smaller, much higher-quality dataset of instruction-response pairs hand-written by humans. The volume drop is dramatic: pretraining sees trillions of tokens; SFT sees thousands to hundreds of thousands of curated examples and is enough. After enough examples, the model has learned the pattern of instruction-following. The knowledge was already in the weights from pretraining; SFT teaches the model when to apply it. The lesson also names the parameter-efficient variant (LoRA: small low-rank adapters on top of frozen base weights), describes what kind of model you have at the end (instruction-following but not yet preference-aligned), and surfaces the structural limitation that makes the next lesson necessary: SFT can only teach the model what to predict, not what not to predict.

This is lesson 1 of Phase 4, How models learn to be helpful, and the phase opener. Phase 3 ended with a pretrained base model and the engineering tricks that make pretraining tractable. Phase 4 is the post-training arc: how you turn a base model into something useful. This lesson covers SFT as the foundation. The next two lessons in the phase complete the post-training arc: Preferences into reward signals (how human preference data becomes a reward model that scores responses) and RLHF and DPO (how those reward signals actually update the model). The previous lesson in Track 5 was Why precision matters: quantization and mixed precision (Phase 3 closer).

Prerequisites: the Phase 3 closer on quantization and mixed precision, and ideally the rest of Phase 3. You should be comfortable with what a pretrained base model is, what its objective was during training, and the rough scale of pretraining (trillions of tokens, months of compute). No new math beyond the previous phases.

  • Distinguish a base model from an instruction-tuned model and explain what each one can and cannot do
  • Describe the SFT mechanism (same next-token loss as pretraining, applied to curated instruction-response pairs)
  • Explain why SFT teaches response shape rather than new knowledge, and why high-quality examples in the thousands are often enough
  • Recognize LoRA as a parameter-efficient way to do SFT (small low-rank adapters on top of frozen base weights)
  • Identify the structural limitation of SFT (no negative signal, only positive examples) that motivates the next lesson on preference data and reward modeling
  • Read time: about 18 minutes
  • Practice time: about 12 minutes (a base-vs-SFT classification exercise plus flashcards)
  • Difficulty: standard