🦙 Understanding LLaMA2 Part 2 KV Cache - WHEN MOORE'S LAW ENDS

#software #ai #llm #open-source [[🦙 Understanding LLaMA2 Part 1 Model Architecture]] [[🦙 Understanding LLaMA2 Part 2 KV Cache]] [[🦙 Understanding LLaMA2 Part 3 PyTorch Implementation]] [[🦙 Understanding LLaMA2 Part 4 ExecuTorch Runtime]] [[🦙 Understanding LLaMA2 Part 5 Training with TinyStories]] Following up with [[🦙 Understanding LLaMA2 Part 1 Model Architecture]], this diagram explains LLaMA model architecture with KV Cache support. We follow the same legend as well as the abbreviations. ![[llama2_architecture_kvcache.png]] [[🦙 Understanding LLaMA2 Part 1 Model Architecture]] [[🦙 Understanding LLaMA2 Part 2 KV Cache]] [[🦙 Understanding LLaMA2 Part 3 PyTorch Implementation]] [[🦙 Understanding LLaMA2 Part 4 ExecuTorch Runtime]]