Le pipeline de post-training : SFT, RLHF (reward model + PPO) et DPO, plus LoRA/QLoRA. Quand affiner vs RAG vs prompt engineering.