The post-training pipeline: SFT, RLHF (reward model + PPO) and DPO, plus LoRA/QLoRA. When to fine-tune vs RAG vs prompt engineering.