Large language models (LLMs) are deep [[neural networks]]—typically based on the [[Transformer architecture]]—that have been trained on massive text to understand and generate human-like language. By learning statistical relationships between words, sentences and larger contexts, they can perform a wide range of natural language tasks “in-context” (i.e. by conditioning on a [[prompt]]) without task-specific fine-tuning.
---
**Core Architecture & Training**
- **Transformer backbone**: LLMs use self-attention layers to model long-range dependencies in text, scaling up to hundreds of billions (or even trillions) of parameters.
- **[[Pre-training]]**: They ingest web-scraped text, books, code repositories, etc., optimizing a language-modeling objective (predict the next [[token]]).
- **Emergent abilities**: As size and data scale grow, models exhibit surprising zero- and few-shot performance on translation, summarization, question answering, code synthesis, and more.
---
**Key Capabilities**
- **Text generation**: Coherent paragraphs, dialogue, creative writing.
- **Code assistance**: Translating natural language to programming code and vice versa.
- **[[Reasoning]]**: Logical puzzles, math word problems (to a degree), chain-of-thought when prompted.
- **[[Multimodal]] extensions**: Some LLMs have been extended to process images or other modalities alongside text.
---
**Limitations & Challenges**
- **[[Hallucinations]]**: Confidently generating incorrect or fabricated facts.
- **Bias and ethics**: Reflecting issues present in training data (stereotypes, toxicity).
- **Compute cost**: Training and inference can be extremely resource-intensive.
- **Interpretability**: Internal workings remain largely opaque, posing risks in safety-critical applications.
---
**Prominent Examples**
- **[[🏦 OpenAI|OpenAI]]'s GPT-3 & GPT-4** used in [[💾 ChatGPT]]
- GPT-3 (175 B parameters) demonstrated broad zero-shot/few-shot capabilities.
- GPT-4 (undisclosed size) improved reasoning, coding, and safety guardrails.
- **[[🏦 Alphabet ($GOOGL, $GOOG)|Google]]'s [[💾 Gemini|Gemini]] (formerly: PaLM & PaLM 2)**
- **[[🏦 Meta ($META)|Meta]]'s [[LLaMA]] (LLaMA 2, 3)**
- Open-weight models ranging from 7 B to 70 B parameters, optimized for research and fine-tuning.
- **[[🏦 Anthropic|Anthropic]]'s [[💾 Claude|Claude]]**
- Emphasizes constitutional AI for safer, more controllable dialogue.
---
These models form the backbone of modern AI-driven applications—from chatbots and virtual assistants to code generation tools and content-creation platforms—while ongoing research tackles their limitations around factuality, fairness, and efficiency.