PaLM - Subhaditya's Website - Obsidian Publish

# PaLM - [PaLM: Scaling Language Modeling with Pathways](https://arxiv.org/abs/2204.02311) - single 540 billion parameter dense Transformer language model - few-shot language understanding and generation - drastically reduces the number of task-specific training examples needed to adapt the model to a particular application - Pathways Language Model - 6144 TPU v4 chips - breakthrough performance on reasoning tasks, which require multi-step logical inference - combination of scale and chain-of-thought prompting, where the model is explicitly prompted to generate a natural language logical inference chain before making its predictio - write explicit logical inference chains to both explain jokes and answer complex questions about scenarios - [Big-Bench](Big-Bench.md) - suggest that the improvements from scale for few-shot language understanding have not yet plateaued - When they compare results from PaLM 540B to our own identically trained 62B and 8B model variants, improvements are typically log-linear. - certain capabilities of language models only emerge when trained at sufficient scale, and there are additional capabilities that could emerge from future generations of models - demonstrating that prompting the model to generate explicit inference chains can drastically increase the quality of the predictions themselves - model’s generation (rather than just understanding) capabilities can be immensely beneficial even for tasks that are modeled as categorical prediction or regression, which typically do not require significant language generation - comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale - ethical considerations related to large language models and discuss potential mitigation strategies