How AI Works - A Primer - Anjney Midha

If one thing has become clear over the past year, it's that many well-intentioned leaders - pundits, and practitioners - are not speaking the same language when it comes to artificial intelligence. If these folks were speaking the same language, it’s unlikely we would have seen more than 800 new pieces of AI-specific legislation across the globe in 2024 — not to mention countless news stories, blog posts, and social media threads — painting a haphazard and inconsistent view of what AI is and isn't, how it works, and, as a result, how to govern it effectively. ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXelFTOvuO6DSlf3iydF8w7Sqqegt_B9KCVh0PbFw6M89I0fQUGPpk6M3RcvnoeFi8MqPm-EFLQAMl0QHIFaMp9gprndbwGCmfUXyLFuq18RcLVHTbsT9HlwfNt55--XeD8c0my20y-X_K1phzbUHkCc_JzW?key=NBZjkpw2yqHSCQn2zZCuiQ) One major stumbling block is that the term "artificial intelligence" is now too broad and imprecise for effective governance. What’s more, it comes with a litany of preconceptions based in science fiction and, maybe worse, clickbait journalism. It's more accurate to think of AI as an ecosystem with distinct components: 1. Applications: Consumer-facing products and services using AI models (e.g., chatbots, image generators, and copywriting programs). 2. Models: Mathematical functions represented by arrays of numbers ("weights"), rewsulting from statistical optimization on vast datasets. Models are software, not conscious entities. 3. Infrastructure: Hardware, software, and data resources needed to train and run AI models. Modern AI models are not weapons, but technological advancements similar to the printing press or search engines. They run on code written by humans, and their weights are fundamentally simply information. To harness AI's potential to reshape global power, economy, and society, we must understand how these models work and avoid hasty restrictions on innovation. In order to foster innovation, decision makers should recognize these distinctions. Should risks arise in specific layers of the AI ecosystem, we should address them narrowly. This post will break down the fundamentals of modern AI models for those who are not AI technology experts, to help inform: - What AI experts and practitioners mean when they use certain terms. - The recent history of AI models and how we arrived at our current state. - How global investments, laws, and attitudes about AI will affect the level of benefit we ultimately experience from it. This spans a range of concerns, including energy, compute infrastructure, data access, and model outputs. ## AI model basics One thing that complicates discussions about AI is that relatively few people will ever train a model. Rather, the vast majority of human interaction with AI models is via a consumer application carrying out an inference workload (e.g., prompting ChatGPT to write a "thank you" letter, or a Tesla camera identifying a pedestrian). This is clearly interesting on its own, but only engaging with these systems at the surface level often paints a misguided impression about how these models work. If we want to speak about models intelligently, we need to understand what they actually are and actually can do. AI models are fundamental, but constrained, components of larger applications, products, or systems (e.g., GPT-4 is a model, while ChatGPT is a product), acting as progress multipliers that make almost everything more efficient, more powerful, and more productive. Without an application to expose that information, we might think of models the same way as we think about databases or automobile engines — remarkable feats of engineering that aren't very useful on their own. Or, going back to the printing press analogy, we might think of models as the press, and books, pamphlets, and other printed media as the applications. ## AI lexicon A short glossary of commonly used terms you'll find throughout this essay and in the industry: |Term|Definition| |---|---| |Foundation models|Large-scale AI models trained on vast amounts of data that can be adapted for a wide range of tasks. They serve as a base for more specialized applications.| |Generative AI|AI systems capable of creating new content such as text, images, audio, or video based on training data and input prompts.| |Inference|The process of using a trained AI model to make predictions or generate outputs based on new input data.| |Training|The process of teaching an AI model to perform tasks by exposing it to large amounts of data and adjusting its parameters.| |Supervised learning|A machine learning approach where the model is trained on labeled data, learning to map inputs to known outputs.| |Unsupervised learning|A machine learning approach where the model learns patterns and structures from unlabeled data without predefined outputs.| |Reinforcement learning|A machine learning approach where an agent learns to make decisions by taking actions in an environment to maximize a reward signal.| |Red-teaming|The practice of stress-testing AI systems by deliberately attempting to find flaws, vulnerabilities, or unintended behaviors.| |Token|A unit of text or data that an AI model processes. In language models, tokens can be words, parts of words, or individual characters.| |Parameters|The adjustable values within an AI model that are learned during training and determine its behavior.| |Cluster|A group of interconnected computers working together to perform complex computations, often used in AI training.| |GPU|Graphics Processing Unit; specialized hardware designed for parallel processing, crucial for AI computations.| |Vector|A mathematical representation of data in a multi-dimensional space, often used to encode information in AI models.| |Latent space|A compressed representation of data within an AI model, capturing essential features and relationships.| |Context window|The amount of preceding text or data an AI model can consider when generating outputs or making predictions.| |Prompting|The technique of providing input or instructions to an AI model to guide its output or behavior.| |FLOPs|Floating Point Operations per Second; a measure of computational performance often used to compare AI hardware.| |Neural network|A machine learning model inspired by biological neural networks, consisting of interconnected nodes organized in layers.| |Narrow AI|AI systems designed to perform specific tasks within a limited domain, as opposed to general AI.| |General AI|AI systems with human-like ability to understand, learn, and apply knowledge across a wide range of tasks and domains.| |Deep learning|A subset of machine learning using neural networks with multiple layers to learn hierarchical representations of data.| |Machine learning|A branch of AI focused on creating systems that can learn from and improve with experience, without being explicitly programmed.| |Modality|The type or category of data an AI system can process, such as text, images, audio, or video.| |Fine-tuning|The process of adapting a pre-trained model to a specific task or domain by training it on a smaller, specialized dataset.| ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXfnVnbkwA25llvxxGkh1yxwH4e3AFFPeKBh42Gx3fXZlpHSnGGqvRufpjsgYqDrsN4COfz3aaP53ejHq7ohuZcKQ41M9A9mTILu4S2v4BBr931FBcGkk7oxhbXzL64q8nRKBBxh229VACu4cfhH7vjXlDlm?key=NBZjkpw2yqHSCQn2zZCuiQ) This table provides concise definitions for each term in the Lexicon section, making it easy for readers to quickly reference and understand key AI concepts. ## A very brief history of AI Conceptually, the north star of a generally intelligent model capable of reasoning as well, or better, than humans has been a fixture in science fiction and academia for hundreds of years. It wasn't until recently, however, that progress in this direction became more practical at large scale. Although we had many successful applications of narrow AI, it was the advent of GPT-3, ChatGPT and its ilk between 2020 and 2022 that led to the emergence of systems that could be defined as general AI. #### Pre-2012: Expert systems In those intervening years, though, we saw many advances and commercial applications in related, or sub, fields, such as expert systems and machine learning. Examples of prominent efforts from this era might include IBM's famous Deep Blue chess-playing expert system (which was essentially programmed with the rules of chess and a map of possible moves and outcomes), or machine learning models used by financial institutions to do everything from detecting fraud to predicting the value of trades. While these systems could excel at their prescribed tasks, the primary drawback was that building them was a manual affair. Expert systems had to be programmed with all the rules of a game like chess, as well as a catalog of available and best moves at any given scenario within a match. Machine learning experts — then a precious commodity — and subject-matter experts had to define the relevant features or signals of a machine learning model and dictate how they should be weighed when generating an output or prediction. Systems behaved how they were programmed to behave, and building them was quite difficult. ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXenMPgd_hnlpqmsrOGy5OeXp4fZo7-JJIRUN6dbbh3IjT21Gq_ADQuURX-RWYF1MhUYdV9EAr0pId-Yt71mR_ERjiWkE2WW_fndcR4W_lQIAE_yJ98-5th77XxLEhb5sO8VPuNAQAENQW0eYYUJzHby6z4?key=NBZjkpw2yqHSCQn2zZCuiQ) #### 2012-2017: Deep learning and reinforcement learning Skipping over a lot of details, the AI field hit an inflection point in 2012 with the advent of deep learning. What set deep learning apart from previous attempts at AI was, in part, how it took advantage of the parallel-processing capabilities of graphics processing units (GPUs) and web data to scale out an existing field of research around "neural networks." Neural networks are able to automatically detect relevant features from textual inputs (optical character recognition on bank checks was an early example), but were limited by compute capacity and a lack of training data. That all changed thanks to abundant training data (especially images) becoming available on the web and, as noted, the availability of powerful GPUs. More data and more compute meant researchers could build "deeper" models that could "learn" more effectively. Research teams focused on perfecting model architectures that outperform previous ones on established benchmarks — accurately classifying images on industry-standard datasets such as ImageNet, for example. As a result, using computer vision to classify images was an early production application of deep learning. This era also brought us advances in voice recognition, natural language processing (NLP), and other modalities that helped power popular products such as Amazon Echo devices, Spotify music recommendations, and early versions of Google Translate. But for all its advances, the deep learning era was still defined by bespoke systems and architectures, largely trained via supervised learning on hand-labeled data sets. It was easier to build and deploy better machine learning models, but each one had limited utility. The AI field experienced another major breakthrough just a couple of years later when reinforcement learning hit the public eye, most famously by roundly defeating a human expert in the notoriously complex game of Go. Beyond the idea of letting a model test out myriad novel solutions to solve a loosely defined task — many more than a human could do, or even conceive of — reinforcement learning also introduced the idea of training a model to choose its outputs based on its developer's preferences. In the game of Go, that might be winning at all costs; but someone building an enterprise chatbot, for example, might want to discourage the model from responding with offensive language or sensitive data. ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXcDzf51jzDHmCMV_2kQMdvqPGmjtTKof-FeHlLtER1V5iyMHmRqHl9a2f3vbBINRs66fdkcBlEQ8nQk5nE-P9CSS2gJftxDBQ0mukd1-59ah6hync2QpxyAsrCDbiufqsL5bJFTVdO2h34lq-9Euuj3KHuq?key=NBZjkpw2yqHSCQn2zZCuiQ) #### 2018-present: Foundation models So far, the big engineering shift in the current era of AI is standardization on relatively simple model architectures — the transformer architecture for language, and the related diffusion architecture for images — that benefit greatly from more compute power and data. At the macro level, of course, it's what these models enable that has changed how the world views AI: Generating high-quality (and mostly accurate) text, images, code, audio, and more to match almost any conceivable prompt (in natural language, no less!) is an incredibly popular use case. It's difficult to overstate the importance of the shift to generative AI from a user perspective. Whereas previous generations of models were largely focused on classification (e.g., identifying animals in images or determining the sentiment of a tweet) or translating speech to computer-readable text (e.g., Siri and Google Assistant), these new ones unlocked creativity. Excitement to discover what they can produce and how broadly we can apply them has resulted in an incredible pace of adoption, tool creation, and new capabilities. A very simplified explanation of how a transformer works goes something like this: A model is trained on text inputs, and uses a concept called "self-attention" to analyze long passages of text, thus forming a deep and complex mapping of words, concepts, and other important features into vectors (this forms the latent space). When a transformer model is analyzing prompt inputs or generating outputs, it's the distance between vectors in the latent space that helps the model connect concepts and predict the next logical word. As evidenced by the capabilities of large language models, these mappings are mind-blowingly complex (even if the output is not always accurate). A key distinction from previous approaches to both NLP and vectorization is that transformer-based LLMs do this across much longer passages, without forgetting what they've already calculated earlier in that same passage. So although all language models essentially do the same thing — predict the next logical word to a query or input — today's LLMs appear much more intelligent, even creative, because they can retain much more context and are trained on much more data. (The same holds true for image models and pixel prediction, audio models and note prediction, etc.) Equally as important is the transferability of transformer-based foundation models across modalities, including language, images, video, audio, and 3D. And whereas previous generations of AI models might excel at narrow tasks, these newer models can excel across many domains with relatively minimal tweaking. Today, it's perfectly reasonable to expect a language model to also perform well across tasks such as coding, mathematics, and industry-specific benchmarks (e.g., legal bar exams). Thankfully for the teams building these models, GPU performance and digital data volumes have both scaled up immensely in the past decade. So rather than trying to milk incremental improvements out of new, bespoke architectures, AI teams can build amazing models by accumulating more (and better) data and optimizing their systems to handle even more parameters. These models came to be known as foundation models because of the sheer amount of data on which they are trained — billions or trillions of tokens acquired from web scraping, books, images, videos, code repositories, etc. Particularly as it relates to text, which is still much more abundant than other media, we're talking about a significant portion of all human knowledge ever digitized. If you need a model to power an application for a particular industry, the chances are that a foundation model already has an "understanding" of that space and/or can be easily fine-tuned to acquire that understanding. ### What’s coming next Apart from advances in foundational concerns like model architecture — where it will be difficult to predict what catches on — a major focus of AI researchers going forward will be on improving things like reliability and interpretability. Interpretability in AI refers to our ability to understand and explain how models arrive at their outputs. As models grow more complex, making them more interpretable becomes crucial for trust, debugging, and ethical considerations. Several approaches are being developed to improve AI interpretability, although two big ones are mechanistic Interpretability and probing: - Mechanistic Interpretability aims to reverse-engineer the internal workings of neural networks, trying to understand how specific neurons or groups of neurons contribute to the model's behavior as shown below. - Probing involves adding small trainable components to a frozen pretrained model to investigate its internal representations and predict specific properties or behaviors of the model. These methods are still in early stages, but they're crucial for making AI systems more transparent, accountable, and reliable. As adoption picks up in mission-critical settings where large amounts of money — possibly even lives — are at stake, the companies selling foundation-model-powered products and applications understand the necessity of making sure things operate how they’re supposed to. Creativity is a great feature in many consumer and artistic use cases, but reliability and consistency often rule the day in more “serious” endeavors. ## Types of transformer-based models When thinking about AI and how it actually works, it's helpful to understand that although many models function via similar methods, no two are really the same. As noted above, earlier computer vision models, for example, are architecturally much different than today's popular text-to-image models — and even among generative models and foundation models, different architectures produce different results. Here are some of the most common techniques powering today's leading AI models, some of which we've already mentioned, as well as examples of the types of transformative applications being built on top of them. ### Autoregressive language models Large language models (LLMs) like GPT-4 (OpenAI), Claude 3.5 (Anthropic), LLaMA 3 (Meta), Gemini 1.5 Pro (Google), and Mistral Large 2 (Mistral) are perhaps the most visible type of AI models today. The scale of data used to train these LLMs is mind-boggling. For example, the Fineweb dataset used to train some recent LLMs contains over 15 trillion tokens of high-quality web content. This is equivalent to about 30 billion pages of text — every word of which has been mapped across a huge number of dimensions and assigned a vector value. We refer to this class of models as autoregressive because, as explained above, they work by predicting the next token (word or subword) in a sequence, using the correlations and connections they learned during the training stage. This allows them to generate human-like text and perform a wide range of language-related tasks. Language models come in many sizes depending on their use case, and are most widely adopted as components within popular applications like the multi-purpose ChatGPT; task-specific products like GitHub Copilot for programming; and any number of chatbots, writing aids or auto-correct features, and other embedded product capabilities. ### Image and video diffusion models Another class of model that has seen rapid progress is diffusion models for image and video generation. These work by gradually adding “statistical noise to images, then learning to reverse that process and regenerate the image. Because these models are primarily trained on images or videos combined with descriptive captions, labels, or other metadata, they can generate high-quality visual content from text descriptions or other inputs. Popular images models and/or associated products (many image models are highly proprietary) include FLUX, Ideogram, Midjourney, and Stable Diffusion. ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXfAafzl1pvlIfrgBRnOQQKU7STAZaAYalO-H-0Gef53hVwr8IWwQ7770r6PsuwYf1iNebUCva7Zf2BunUn4eNii3UYEUt8noGKfuM_sJVyv92aJBWo_hD4zrlys_uazfokWsrgxbv9G5C6NDVxp-86YCINj?key=NBZjkpw2yqHSCQn2zZCuiQ) Flux by Black Forest Labs, which is open source, showcases the dramatic improvements in output quality and diversity that generative visual models have undergone over the last decade. Luma Dream Machine, a state-of-the-art video-generation model took things even further by training on more than 200 trillion tokens. This allows it to generate high-quality — and cohesive — videos from text prompts. Because it has trained on so many video frames, it’s able to include seemingly small, but very impressive, features, such as accurate 3D rendering and light refraction, and consistency across scenes. #### Multi/omnimodal engines The cutting edge of AI model research is in mixed-modal models that can work across different types of data — text, images, video, and more. Meta's CM3Leon (pronounced chameleon, like the lizard) model is a prime example of this approach. It uses a unified architecture to process both text and images, representing everything as discrete tokens. This allows it to seamlessly reason over and generate interleaved sequences of text and images. The model was trained on over 15 trillion tokens of mixed-modal data, including text, images, and interleaved documents. This type of unified architecture opens up exciting possibilities for AI systems that can fluidly work across modalities, much like humans do. It points to a future where the boundaries between different types of data and tasks become increasingly blurred. Multimodals could ultimately help bring about major advances in areas such as virtual reality and robotics — including robots that can see, hear, read, and speak. However, the word “reason” has different meanings when applied to AI models versus human brains. Whereas exposure to enough examples can help models recognize cause and effect (that a teetering vase is likely to fall and break, for example), they don’t understand why things happen. That capability still lies exclusively with humans. When faced with almost any given scenario, we can apply years of experience, education, abstract thinking, and gut-feeling to guide our predictions about how it will play out both immediately and several steps down the road. ## How AI models are built Having covered some of the high-level concepts and evolutionary milestones for AI, it's time to drill down a little deeper and discuss how today's AI models are actually built. This is important to understand: The better that we all understand each other, the more productive discussions we can have about the role of AI in our society and how it intersects with other areas of technology, law, and humanity. ### Compute and infrastructure At the heart of modern AI data centers lies a piece of hardware originally designed for rendering video game graphics: the GPU. These specialized processors have become the backbone of AI computation, powering everything from model training to inference. GPUs excel at AI workloads due to their architecture, which is optimized for parallel processing. While a standard CPU (like the Intel chip in your laptop) might have 4 to 64 cores, a modern GPU can have thousands of smaller, more specialized cores. This design allows GPUs to perform many simple calculations simultaneously, which is ideal for the matrix multiplications that form the basis of neural network computations. As noted above, the journey of GPUs from gaming to AI began in earnest in the early 2010s when researchers discovered they could repurpose graphics cards for machine learning tasks. Nvidia, recognizing this trend, began optimizing their GPUs for AI workloads, introducing specialized tensor cores and developing the CUDA programming model to make it easier for developers to harness GPU power for general-purpose computing. Today's AI-focused GPUs are marvels of engineering. Take Nvidia's latest offering, the H200 GPU, which boasts: - 141 billion transistors - 96GB of high-bandwidth HBM3e memory - Up to 4.8 TB/s of memory bandwidth - 66.2 TFLOPS (trillion floating-point operations per second) for computations To put this in perspective, a single Nvidia H200 GPU can perform as many calculations per second as several supercomputers from the early 2000s combined. But this performance doesn't come for free. A single top-end GPU from Nvidia can cost up to $40,000 and consume significant amounts of power — up to 700 watts under load, contributing to the massive energy demands of modern AI data centers. The relationship between compute infrastructure (GPUs, storage, and high-speed networking) and AI model performance is unique in the world of software engineering. While adding more compute might allow most applications to run faster or handle more users in production, adding more compute during the training stage allows an AI model to handle more parameters faster, and thus directly reduces model development time. Even if it were technically possible to train a large model on a single GPU, it would take many years to complete and would be subject to failure modes that massively distributed clusters can help mitigate. There is no "works on my machine" scenario when it comes to training AI models — you either have the resources to train a state-of-the-art model, or you don't. Meta's LLaMA 3 405B model, for instance, was trained on over 15 trillion tokens of data, utilizing a system containing 16,000 NVIDIA H100 GPUs, advanced high-speed networking, and a 240-petabyte (or 2.4 billion megabyte!) file system. The training process consumed enough electricity to power a small city for months and, in fact, estimates suggest the pre-training process took just over 3 months. To get a sense of how far things have advanced in just a few years, consider that some educated estimates believe it took OpenAI around 34 days to train its GPT-3 model in 2020 (which underpinned the original incarnation of ChatGPT in 2022) on an earlier generation of GPUs. Utilizing the same H100 GPUs that Meta had available for Llama 3, the pre-training might have taken just over 2 days. However, neither the peak performance of that training system (measured in FLOPs), nor the cost to build it, are intrinsically connected to production capabilities. In part, this is because, as we'll explain below, so many improvements, guardrails, and other features are implemented during the post-training phase. Additionally, as the underlying techniques and technologies improve, and investment in AI rises, we expect to see continued improvements in the cost-size-performance ratio. Looking ahead, for example, the next frontier in AI computation may lie in more specialized hardware. Application-Specific Integrated Circuits (ASICs) designed explicitly for AI workloads are beginning to emerge, and Google's Tensor Processing Units (TPUs) and various AI startups' custom silicon promise even greater efficiency for specific AI tasks. Still, the flexibility and widespread support for GPUs ensure they will remain a critical component of AI infrastructure for years to come. ### Pre-training When people talk about "training" models, they're most often referring to the pre-training stage. It's during this process, which can take months and involve several distinct steps, that models ingest the huge volumes of data we often associate with them. During pre-training, the model learns a general understanding of the data distribution, forming a foundation that can be adapted to many downstream tasks. And, at least for the foreseeable future, it’s during pre-training where compute performance really matters: All else being equal, the system architecture and available FLOPs will determine how many parameters a model can realistically include while still finishing the training run in an acceptable time frame. However, it's not as simple as just collecting as much data as possible and getting to work. To begin with, teams building state-of-the-art models that strive for high accuracy and other performance metrics must take steps to ensure that they actually want to use all the data present in their sources — redundant, inappropriate, inaccurate, or otherwise low-quality data is often pruned from the data set. Any data utilized in pre-training must also be processed to fit a preferred structure (e.g., extracting just the relevant plain text content from webpages), size (e.g., size constraints for image and video files), or other criteria. The importance of data quality can't be overstated: Having a large, diverse, and high-quality dataset is often more important than simply increasing model size. Once prepared, it's time to expose the model to this vast dataset, allowing it to learn patterns and relationships within the data. ### Post-training Even after they've ingested and mapped all that pre-training data, AI models are not finished — at least not in the sense that they're usable for real-world tasks. They still need to go through a post-training phase, which is where the teams building the models do things like test and tune their accuracy, red-team them to identify safety issues, and generally make sure the model is performing (1) as expected and (2) as well as possible across key domains. Techniques like reinforcement learning, RLHF, and fine-tuning (explained below) all come into play during this step. Essentially, post-training is where human feedback (as well as any automated processes the team might have constructed) takes a model from the equivalent of a college freshman to the equivalent of a PhD candidate on the skills that matter. Their skills might be both broader and more specific, and they’ve learned when to shut up. For example, if you’ve heard of “guardrails” with AI models (i.e., developer-imposed limitations on topics about which a model can reply, or “personas” it can take on), these are products of the post-training process. The Midjourney text-to-image model illustrates how important a strategic post-training process can be. By releasing their model on Discord and ingesting every image that users rated, upvoted, and shared, Midjourney created a powerful flywheel: As people used their model to generate images, those images and the user feedback on likes, dislikes, and aesthetic preferences were fed back in as training data, making the model better at generating what people liked. Midjourney may have trained its model on over a billion image-feedback pairs gathered in this way, and the result was a highly performant model that didn't require a massive compute cluster. ### Fine-tuning Fine-tuning is a technique for improving the performance of a foundation model on a specific set of data, or within a specific field. For example, an AI lab training a model like, say, Codestral, will supplement the original model with code samples and other relevant data types in order to improve the model specifically in the realm of software programming. Alternatively, an end-user — say, a large bank — might fine-tune an open source LLM on its own collection of fraud data, customer interactions, or other data specific to its business. The ability to fine-tune a model can be a blessing for organizations that have specific needs that a generalized foundation model can't handle. Because neither OpenAI nor Google nor any other AI lab has access to company-specific data, even their best models will struggle on company-specific tasks. Even if a model isn't open source in the technical sense that open source software purists prefer, fine-tuning does provide the expected experience of adapting a core piece of infrastructure to any specific user's needs. ### Inference As explained above, inference is the process of using the trained model to generate outputs for new inputs. Or, less formally, the model takes a prompt or query and generates a response based on its training. This is where the "magic" happens from the user's perspective. Because inference is the nexus between models and products or applications, though, companies building models must think much more practically about inference. Key considerations for inference include: - Speed: How quickly can the model generate a response? - Compute requirements: What hardware is needed to run the model efficiently? - Coherence and relevance: How well does the output match the input and context? Essentially, these all boil down to money. If a company is serving its model's capabilities via API, every API call (or prompt/query) costs money. Therefore, it's critical to host the models on infrastructure that can deliver acceptable performance at a price that won't scare away users. If a company is selling or open sourcing its models, they must tune them to perform at acceptable levels on hardware that customers could reasonably afford. And, in any circumstance, the model's responses must be good enough that someone would actually pay to use it. Increasingly, model providers are betting that users will pay more for inference — in either time or money — if there is a commensurate improvement in performance. This is called test-time-compute scaling, and it’s exemplified by models like OpenAI's o1. This approach allows models to improve their performance by using more computational resources during inference – essentially giving the model more time to "think" before answering. Similar to how a human might take longer to solve a complex problem, these models can produce better results when allowed to run for longer periods or use more processing power. This technique highlights a key difference between AI and traditional software: for AI, additional compute doesn't just make things faster, it can actually improve the quality of outputs. The o1 model demonstrates that performance consistently improves with more time spent thinking, pushing the boundaries of what's possible in AI reasoning. ### Evaluations and testing Although the academic roots of AI research led to largely simplistic and one-size-fits-all measures of base models, we are moving toward “in-context” evaluation. Instead of measuring performance in the lab, measures of success (and risks) in this new paradigm are and should be done within the context of the application, use case, and industry where the model is actually interacting with the real world. This is a positive development: Must-have capabilities for an LLM-powered screenwriting assistant, for example, might be wholly irrelevant in a national security environment. Beyond relevance, in-context evaluation also allows developers to test their models on industry-specific benchmarks, as well as security and regulatory requirements, and to continuously improve performance based on real-world usage in particular fields. ### Open source release After training and evaluating a model, some teams and companies will opt to release it as open source. When a model is open sourced, its weights (how the model assesses a given input in determining its output), architecture, and/or training data are made publicly available, allowing others to use, modify, and build upon it. This follows on decades of successful efforts around open source AI research, as well as the undeniably beneficial effects of open source software as a whole. There are several reasons why an organization might choose to open source a model, many of which boil down to the fact that global collaboration by users, experts, and even other companies in the same space results in safer, higher–performing software. These include: 1. Accelerating research: By making models public, researchers worldwide can study and improve them. 2. Fostering collaboration and democratizing access: Open models encourage usage and development by a wider range of developers and companies, creating a shared foundation for the AI community to work on. 3. Improving transparency and developing standards: Open models can be more easily scrutinized for biases, security issues, or other bugs, and by a much larger number of people. These improved models provide a better baseline for new projects that build upon them. As just one example, the Mistral Large 2 LLM is able to achieve similar results to models more than three times larger by integrating novel efficiency methods from its open source community. In another example, an independent research firm called Nous Research is working on a project that would allow smaller teams to train large AI models across shared, less-expensive GPU resources and standard internet connections, rather than on expensive high-end GPUs wired together by high-speed interconnects. There are also business considerations, such as attracting the level of engineering talent that prefers to work on broadly accessible projects, and achieving product “stickiness” with users who have invested resources (beyond just licensing costs) into helping a new product evolve. ## Barriers to progress in AI While the potential of AI and today's models is immense, there are also significant barriers preventing even faster progress. Fortunately, most of these are solvable with adequate prioritization and investment from countries, states, universities, and other institutions with a vested interest in technological progress. ### Compute Compute scarcity has become a major bottleneck for AI progress. Training a state-of-the-art model like GPT-4 requires over 10,000 petaflop/s-days - a scale only accessible to a handful of tech giants. For context, a single petaflop/s-day is equivalent to performing 1015 neural net operations per second for 24 hours straight. This compute barrier limits model size and slows iterations for most researchers. A model like GPT-3, with 175 billion parameters, can cost over $10 million to train. Even with access to funds, chip shortages mean waiting months for high-end GPUs. Inference is costly as well - OpenAI spends an estimated $700,000 daily on inference for ChatGPT. ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXesEW6Uu2EFy-sq3K5aAQJ5-_N4kSICClzH_2CB_qwapMnfMCpMBwMvEwbcrnmDOgeOElPhY_ZPsqem5YEAds9rvTZkG7HyYeTiFcycIiDB10pW0fHMRqXLshqs_XS78iBHmh-HzTgTVWYLTVcrBcq0Oug?key=NBZjkpw2yqHSCQn2zZCuiQ) To democratize AI, we need a 10-100x improvement in compute efficiency and access in the next 2-5 years. Hardware breakthroughs like photonic chips and analog AI accelerators show promise. Distributed compute fabrics and algorithmic scaling advances are also key. But most of all, we need bold policy action to drive compute abundance: incentives for chip factories, support for open compute infrastructure, and a concerted push to offset 100x the carbon footprint of AI research by 2030. We can't afford to let the AI revolution be constrained by 20th century computing. With the right choices now, efficient exaflop systems will empower amazing AI for all within a decade. Addressing these compute barriers will be crucial for unlocking the next wave of AI advancements. This will likely require a combination of technological innovations in hardware, more efficient algorithms, and increased investment in AI infrastructure. ### Governance Increasing scrutiny from governments and calls for AI regulation threaten to slow the pace of research and deployment. To make matters worse, much proposed (and actual) legislation is based on hypothetical, unsubstantiated fears about what AI might one day be able to do, often pushed by questionable think-tanks and opposed by AI current practitioners. Navigating complex regulatory landscapes across different jurisdictions is also becoming a major challenge for AI companies and researchers. Whether it's between states in the United States, or across national borders, a patchwork of regulations makes it difficult to construct a viable strategy for incorporation, product/research design, and customer base. Essentially, lawmakers across the globe are racing to score political points on a hot issue, rather than encouraging more AI development, investment, and education. And despite some recent wins for AI developers, such as California Governor Gavin Newsom’s decision to not sign into law that state’s controversial [SB 1047 (‘23-’24)](https://legiscan.com/CA/text/SB1047/id/2999979), appetite for AI regulation remains strong across various governments and industries. ### Energy As AI models have grown exponentially larger, so too has the appetite for compute. A modern AI data center can consume over 100 megawatts of power — equivalent to the electricity needs of 75,000 homes or the energy required to melt 150 tons of steel in an electric arc furnace. This insatiable demand has created a power paradox at the heart of the AI industry. Companies are racing to build ever-larger data centers, but the availability of electricity is fast becoming the binding constraint on their ambitions. This is true despite remarkable advances in data center efficiency and cooling over the past decade and a half, largely driven by hyperscale companies like Google and Meta. ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXdIyNjW63I2NGQiQXvsJW5E8eo5uKgEudoJ-6ybP3l2kTx7wjL0SjIPgb9OBA85DlySPZ-5X87PBocKRFo0LgoodWVoglLV-IaO1nzN07s0aLft5gBOIcRI-cs4bra5QdjlzU71x9GFmmYhTB9r76s3jwE?key=NBZjkpw2yqHSCQn2zZCuiQ) Even as individual data centers have become more efficient, the sheer scale of AI computation threatens to overwhelm these gains. While global data center workloads increased by 340% between 2015 and 2022, energy consumption rose by "only" 20-70%. But the AI revolution may shatter this trend. The analysts at SemiAnalysis project that data center electricity consumption could triple by 2030, potentially reaching 4.5% of global electricity demand. Along with other data center demands, the availability of reliable power will shape the future of AI as much as any algorithm or dataset. As one industry insider put it, "data centers are on a one to two-year build cycle, but energy availability is three years to none." Bridging this gap may well determine which nations lead the AI race in the decades to come. ### Data High-quality, diverse training data is becoming a scarce resource. As models consume more and more of the available human-generated content on the internet, researchers are turning to synthetic data generation and other novel approaches to feed the data-hungry training processes. This is especially true for language and text data, and will become increasingly true for other modalities, like video, as investment in multimodal models ramps up. However, it's still unclear whether synthetic data will yield results on par with organically generated data — especially when the synthetic data in question has been produced by AI models themselves. There's a risk that whatever errors, idiosyncrasies, or other unwanted characteristics of that data will poison future models and make true improvement much more difficult. This is not ideal: According to a concept called "scaling laws" within the AI community, and assuming a precise engineering analysis, a larger collection of high-quality data can mitigate the need for huge models with more compute. ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXccVArevy0Y5VyIxh3ZGqbpc6WM7ovNtnblhlMkl2_olaPVzB3Tjf08Iqz8sjPVcjTd5wQJwnkef8hYssz61u3UQkdwDQbT3RX9fBPjYXbxKbejCo3qwJLMx_ZhviOY-ZHB_d4Tx2iI2FEo5rWipBpEwOQN?key=NBZjkpw2yqHSCQn2zZCuiQ) Further complicating the issue is that high-profile lawsuits asserting intellectual property claims over training data are forcing startups to spend their limited resources on legal defense or extortive licensing fees instead of innovation. The uncertainty and expense is not only a major distraction, but it precludes otherwise efficient practices that may well be perfectly lawful. ### Algorithmic progress Designing optimal model architectures remains more art than science. Researchers must rely heavily on empirical experiments and intuition, as we lack robust theoretical frameworks for predicting how different architectural choices will impact model performance. While transformer-based models have dominated the field, exploration of alternative architectures continues. Codestral Mamba exemplifies this trend, using state space models (SSMs) instead of attention mechanisms. This approach offers advantages like linear time inference, efficient memory usage, and strong parallelizability. Mamba shows particular promise for code-related tasks and can handle extremely long context windows of up to 256k tokens. The success of Codestral Mamba highlights the potential for architectural breakthroughs that could dramatically improve model performance or efficiency. It also suggests we may see more specialized architectures optimized for particular use cases. However, widespread adoption of new architectures faces hurdles, as the AI community has built up significant tooling and optimization techniques around transformer models. As AI models continue to grow in size and complexity, the search for more efficient and capable architectures will remain a critical area of research. ### Talent The pool of researchers and engineers with the skills to push the boundaries of AI is currently limited. As a result, competition for top talent is fierce, driving up costs and concentrating expertise in a handful of well-resourced labs. This is a problem in many domains, but possibly more so with regard to AI. One reason is that the technological aspects of AI are intrinsically connected with "soft" concerns relating to national and even global culture, and we should be leery about giving huge corporations even more de facto control over such things. Another major concern is that a dearth of AI talent among democratic nations will help cede control to less-democratic ones that view AI as a competitive advantage, and will make the development of AI skills a national priority. Fortunately, this is an obstacle we can address head-on with more investment in education. Countries, or even states, that prioritize relevant math and computer science curricula as early as middle school will have an advantage in generating a talent pool that can contribute to advancing AI instead of merely consuming it. They will help ensure that national economies continue to thrive, militaries remain on the cutting edge, and the future of AI is shaped by the ideals of free markets and free nations. ## What now? When it comes to artificial intelligence, the cat is out of the bag. Although the past couple years or progress might seem like they materialized out of thin air, these recent advances are just the latest products of decades of AI research — turbocharged by the web and the availability of compute power that would make Seymour Cray blush. To date, we’ve only seen glimpses of AI’s power to transform everything from personal creativity to the global economy, but those changes are coming. The questions now are who will shape the future of AI and who will reap its ample rewards. The raw materials are code, math, data, compute, and energy, so the competition is open to nearly every state, province, and nation on earth. We would like to see the United States and our allies maintain the leadership position, and, despite our differences, we believe that most AI skeptics and lawmakers would like to see the same thing. ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXc5amH65nXiSJ2LCy_DPNPgMf9YmVm39bHUnEbcxTEe8BsSerKOl4X9CmlKx8MTR8qkNER0rf7KACU4kANEc5dLUwUavt_cUynuES54RKRp0n-S2_NJMtLmeihr_B6i60lyusmuo8-7htZrnsrf6lrJsqyk?key=NBZjkpw2yqHSCQn2zZCuiQ) If we share a high-level understanding of artificial intelligence, we can have more productive discussions about how to govern it. Those discussions might be about specific definitions in a piece of legislation or a debate over how seriously to take the idea of existential risk. They might span areas as diverse as copyright law, energy policy, and international trade. But the goal is that these discussions do happen, and that they aren’t derailed by fundamental misunderstandings of how AI works — much less what it is or is not. If you’d like to talk about how to maximize the benefits of AI across as many people as possible, please do reach out.