Large language models (LLMs) are deep [[neural networks]]—typically based on the [[Transformer architecture]]—that have been trained on massive text to understand and generate human-like language. By learning statistical relationships between words, sentences and larger contexts, they can perform a wide range of natural language tasks “in-context” (i.e. by conditioning on a [[prompt]]) without task-specific fine-tuning. --- **Core Architecture & Training** - **Transformer backbone**: LLMs use self-attention layers to model long-range dependencies in text, scaling up to hundreds of billions (or even trillions) of parameters. - **[[Pre-training]]**: They ingest web-scraped text, books, code repositories, etc., optimizing a language-modeling objective (predict the next [[token]]). - **Emergent abilities**: As size and data scale grow, models exhibit surprising zero- and few-shot performance on translation, summarization, question answering, code synthesis, and more. --- **Key Capabilities** - **Text generation**: Coherent paragraphs, dialogue, creative writing. - **Code assistance**: Translating natural language to programming code and vice versa. - **[[Reasoning]]**: Logical puzzles, math word problems (to a degree), chain-of-thought when prompted. - **[[Multimodal]] extensions**: Some LLMs have been extended to process images or other modalities alongside text. --- **Limitations & Challenges** - **[[Hallucinations]]**: Confidently generating incorrect or fabricated facts. - **Bias and ethics**: Reflecting issues present in training data (stereotypes, toxicity). - **Compute cost**: Training and inference can be extremely resource-intensive. - **Interpretability**: Internal workings remain largely opaque, posing risks in safety-critical applications. --- **Prominent Examples** - **[[🏦 OpenAI|OpenAI]]'s GPT-3 & GPT-4** used in [[💾 ChatGPT]] - GPT-3 (175 B parameters) demonstrated broad zero-shot/few-shot capabilities. - GPT-4 (undisclosed size) improved reasoning, coding, and safety guardrails. - **[[🏦 Alphabet ($GOOGL, $GOOG)|Google]]'s [[💾 Gemini|Gemini]] (formerly: PaLM & PaLM 2)** - **[[🏦 Meta ($META)|Meta]]'s [[LLaMA]] (LLaMA 2, 3)** - Open-weight models ranging from 7 B to 70 B parameters, optimized for research and fine-tuning. - **[[🏦 Anthropic|Anthropic]]'s [[💾 Claude|Claude]]** - Emphasizes constitutional AI for safer, more controllable dialogue. --- These models form the backbone of modern AI-driven applications—from chatbots and virtual assistants to code generation tools and content-creation platforms—while ongoing research tackles their limitations around factuality, fairness, and efficiency.