MoE: router, top-k routing, auxiliary load-balancing loss, capacity, expert parallelism. Total vs active parameters (Switch, GLaM, Mixtral, DeepSeek-V3).