# KV Cache
Status: Seedling
The KV cache stores attention keys and values for tokens that have already been processed during LLM inference.
## Mental Model
During autoregressive generation, the model repeatedly generates one token at a time. Without caching, the model would recompute attention state for previous tokens over and over.
The KV cache avoids that repeated work by keeping previous keys and values available for decode.
## Why It Matters
The KV cache is central to inference performance because it trades memory for speed.
It affects:
- GPU memory usage.
- Maximum context length.
- Batch size.
- Decode throughput.
- Scheduling complexity.
## Related
- [[Articles/Inference Core Loop of an Inference Engine]]
- [[Articles/Inference Prefill and Decode]]
- [[Topics/AI Infrastructure]]