KV-cache

Sign in to follow this category

Long context and KV-cache management

Why long context is costly: O(n²) attention and a linear KV cache. MQA/GQA, FlashAttention, PagedAttention, RoPE/YaRN, attention sinks.

2026-06-13 12 min read