KV Caching (LLM Inference) - mnml's vault

# KV Caching ([[Large Language Models|LLM]] inference) - [LLM Inference Series: 3. KV caching explained | by Pierre Lienhart | Medium](https://medium.com/@plienhar/llm-inference-series-3-kv-caching-unveiled-048152e461c8) - [LLM Inference Series: 4. KV caching, a deeper look | by Pierre Lienhart | Medium](https://medium.com/@plienhar/llm-inference-series-4-kv-caching-a-deeper-look-4ba9a77746c8) This goes into the details of how kv caching across requests is useful in conversational contexts. It also explains why you get billed for your input tokens, as they end up taking memory in the GPU to keep the KV cache around.