RLHF

Sign in to follow this category

Fine-tuning and post-training (LoRA, DPO, RLHF)

The post-training pipeline: SFT, RLHF (reward model + PPO) and DPO, plus LoRA/QLoRA. When to fine-tune vs RAG vs prompt engineering.

2026-06-14 14 min read