Notes from NVIDIA CEO Jensen Huang Keynote at CES 2025 - Ram-bunctious Ruminations

This isn't a press release or a technical deep dive. Instead, it's my attempt to explore and explain some of the key ideas discussed in the keynote—concepts that resonated with me and made me think about their implications. You can access the whole keynote on YouTube below. <iframe width="700" height="350" src="https://www.youtube.com/embed/k82RwXqZHY8" title="NVIDIA CEO Jensen Huang Keynote at CES 2025" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> --- ## Section 1 - A short history of NVIDIA [Founded in 1993](https://www.nvidia.com/en-gb/about-nvidia/corporate-timeline/) by Jensen Huang, Chris Malachowsky, and Curtis Priem, NVIDIA began as a company focused on developing graphics chips. ### 1999 - GPU makes better graphics possible NVIDIA introduced the world’s first **graphics processing unit (GPU)**, the [GeForce 256](https://blogs.nvidia.com/blog/first-gpu-gaming-ai/). This innovation transformed PC gaming and 3D graphics, offering a dedicated transform and lighting engine that dramatically improved realism and rendering capabilities in personal computers. ### 2006 - CUDA makes general purpose computing on GPUs possible NVIDIA took another major leap with the release of [CUDA](https://www.youtube.com/watch?v=IzU4AVcMFys) (Compute Unified Device Architecture). CUDA enabled **general-purpose computing on GPUs**, unlocking their potential beyond graphics and making them indispensable for scientific, engineering, and data-intensive applications. ### 2012 - CUDA makes training deeper (and more powerful) AI models possible CUDA played a pivotal role in advancing deep learning. In 2012, AlexNet, a landmark convolutional neural network, used CUDA to process large image datasets, leading to a transformative win in that year’s ImageNet challenge. - AlexNet introduced a deeper network design with a significant number of parameters. Training such a model on CPUs would have been prohibitively slow. - Even with GPUs, memory limitations on a single GTX 580 GPU (3GB) required the use of **parallel training on two GPUs**, a feat made possible by CUDA. [Read the paper](https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf). ## Section 2 - Types of Artificial Intelligence A quick search on the *different types of AI* gives us a variety of classifications depending on what the criteria for distinction are. In this keynote, we get a distinction on the basis of what the AI can do. ![[../assets/nvidia_keynote_types_ai.png]] [Video Timestamp](https://youtu.be/k82RwXqZHY8?t=388) - **Perception AI**: This type of AI specialises in understanding and interpreting sensory inputs like images, words, and sounds. - _Example_: Your phone's ability to recognise your face for unlocking (facial recognition technology) or virtual assistants like Siri and Alexa understanding your spoken commands. - **Generative AI**: As the name suggests, this AI can create content such as images, text, music, or even videos. - _Example_: Tools like ChatGPT (for generating text) or DALL-E (for creating images) are great examples of generative AI at work. It’s like having a digital artist or writer on demand! - **Agentic AI**: These are AIs that can perceive, reason, and act autonomously to achieve specific goals. They go beyond just understanding or generating; they can make decisions and act based on them. - _Example_: AI-powered trading bots in financial markets or complex game-playing AIs like AlphaGo, which can strategise and make moves independently. - **Physical AI**: This is AI applied to the physical world, where it interacts with its environment and performs tasks. - _Example_: Self-driving cars like those from Tesla or Waymo and delivery robots used in warehouses. These combine sensors, perception, and decision-making to operate in the real world. > "Machine Learning has changed how every application is going to be built, how computing will be done and the possibilities beyond" ## Section 3 - Advancements with NVIDIA DLSS > "GeForce enabled AI to reach the masses, and now AI is coming home to GeForce" As someone who has never really been a gamer, a lot of what was discussed in this section didn't immediately make sense to me. So, I went down the rabbit hole of "gamer-speak" to get a better grasp of these concepts. Now, I think I have a reasonable understanding to share. To enjoy PC gaming, one crucial element is having graphics that look as close to reality as possible, with no lag or performance hiccups. However, rendering every pixel in high resolution is computationally heavy and not ideal for performance. That’s where NVIDIA’s DLSS (Deep Learning Super Sampling) comes in. It uses deep learning (a key method in AI) to convert low-resolution images into higher resolution ones. Essentially, it delivers stunning visuals without overloading your system. ![[../assets/nvidia_dlss.png]] [Source](https://images.nvidia.com/aem-dam/Solutions/geforce/ada/ada-lovelace-architecture/nvidia-ada-gpu-science.pdf) In the keynote, Huang unveiled DLSS 4, the latest iteration of NVIDIA’s DLSS technology ([Learn more here](https://www.nvidia.com/en-gb/geforce/technologies/dlss/)). He explained real-time AI-infused graphic generation with DLSS 4 as a two-step process: First, the engine uses programmable shading and ray-traced acceleration to generate some of the pixels. These are computationally intensive operations. Then, AI steps in to use these generated pixels as a reference and creates the remaining ones. The AI is trained on a supercomputer and perform inference on your local GPU to generate these additional pixels in real time. Visually, it would look something like this ![[../assets/dlss_pixel_ai.png]] One standout feature of DLSS 4 is [multi-frame generation](https://www.nvidia.com/en-gb/geforce/technologies/dlss/). Essentially, it can create up to three additional frames for every frame that’s actually computed. For example, during the real-time demo at CES, only 2 million pixels were computed out of the 33 million pixels required for four frames in 4K resolution. AI handled the rest, which means higher-quality graphics with much less computational load during gameplay. Yaay, AI! Of course, training the AI will be computationally expensive, but once trained, inference is expected to be efficient. However, it’s not entirely clear to me who bears the costs of training and what exactly these costs mean. From [the research from Epoch AI](https://epoch.ai/blog/how-much-does-it-cost-to-train-frontier-ai-models), the costs of training include expenses for staff, hardware, cloud compute, energy, data center infrastructure, and more. These costs currently reach tens of millions of dollars for the most advanced models. Apart from the economic cost, there’s also a significant environmental load. Epoch AI estimates that the most power-intensive training runs could draw 1 GW of energy by 2029. To put that into perspective, the top ten largest power plants in the US have a capacity of 3 GW to 7 GW. Balancing these costs with the benefits of AI technologies like DLSS will definitely be an ongoing challenge for the industry. But for now, yayy AI! ![[../assets/training_cost_power.png]] [Source](https://arxiv.org/pdf/2405.21015) ## Section 4 - The GeForce RTX 50 Series Again, this was another area where I initially felt completely out of my depth. While I work in tech, hardware and infrastructure have never been my strong suits. My interest in these topics has only recently developed, driven by necessity. After all, any meaningful conversation about AI advancements requires at least a basic understanding of the hardware that powers it. ![[../assets/rtx_50_series.png]] [Video Timestamp](https://youtu.be/k82RwXqZHY8?t=1170) ### Tech Specs of The GeForce RTX 5090 GPU - Built with 92 billion transistors and 4,000 AI TOPs (trillions of operations per second), aimed at supporting AI-related tasks and generative applications - Capable of 4 petaflops of AI performance, 380 ray tracing teraflops, and 125 shader teraflops. - Equipped with G7 memory, doubling bandwidth to 1.8 Tb/s compared to the previous generation. - Features an AI-Management Processor and neural texture compression to enhance data handling and efficiency. If like me before I watched this keynote, you too are unaware of what the above numbers mean, let me try to make it clearer. ### GPU Metrics and their Meanings **92 billion transistors** - A transistor is the smallest atomic unit of a GPU. More transistors enables the GPU to process more instructions simultaneously and improves performance. However, more of these also mean greater manufacturing costs and energy consumption. **4000 AI TOPs** - TOP stands for **Trillions of Operations Per Second**. AI TOPs refers to *how many operations can the GPU handle when running AI tasks*. Fancy LLMs that generate content need to process a LOT of AI workloads, and they need to process these AS FAST THEY CAN. So, higher AI TOPS = Better Performance for generative AI Workloads. **4 petaflops of AI performance** - FLOPS stands for **Floating Point Operations Per Second**. This indicates the GPU's ability to perform high-precision calculations that are important for AI workloads. Higher, the better. **380 ray tracing teraflops** - [Ray tracing](https://developer.nvidia.com/discover/ray-tracing#:~:text=Ray%20tracing%20generates%20computer%20graphics,back%20to%20the%20light%20sources.) is a graphics technique used to create realistic lighting and reflections. This number measures how quickly the GPU can compute ray-traced effects. Higher teraflops mean smoother, more realistic visuals in games and simulations. **125 shader teraflops** - Shaders are small programs that run on the GPU to process image rendering tasks, like colors and textures. More shader teraflops mean better overall graphics performance. Higher is better, especially for gaming and creative work. **G7 memory with 1.8 TB/s memory bandwidth** - Memory bandwidth is the rate at which data can be read from or written to memory. Higher bandwidth allows the GPU to handle more data-intensive tasks without bottlenecks. Doubling the previous generation's bandwidth is a significant improvement, and higher is always better. The RTX 5070 offers comparable performance to earlier high-end GPUs like the RTX 4090 but at a lower price point ($549). The RTX 5070 Laptop priced at $1,299, delivers desktop-level performance. This "lack of blowup in costs even with much better performance" has been made possible due to **neural rendering**. Neural rendering, where AI predicts and creates portions of images, has become a central component in how these GPUs operate. During the CES demo, Huang showed how neural rendering can handle the majority of pixels in 4K frames, significantly reducing the computational workload. > "The future of computer graphics is neural rendering, the fusion of AI and computer graphics" ## Section 5 - The GB200 NVLink72 ![[../assets/grace_blackwell_nvlink72.png]] Large Language Models (LLMs) come with massive computational requirements, as discussed earlier. These demands translate into higher energy consumption, increased operational costs, and reduced profitability—particularly when targeting the average consumer, where pricing constraints limit revenue potential. NVIDIA’s [Blackwell architecture](https://resources.nvidia.com/en-us-blackwell-architecture) has been purpose-built to address these challenges, optimizing for the specific needs of AI and providing more efficient and cost-effective solutions for scaling AI systems. The [GB200 NVLink72](https://www.nvidia.com/en-gb/data-center/gb200-nvl72/) is designed to handle the growing computational needs of AI systems. It's such interconnects that generate the *tokens* (or outputs) of LLMs such as o3. This interconnect system combines 36 Grace CPUs and 72 Blackwell GPUs into a unified rack-scale setup. It functions as a single large GPU, enabling faster processing for trillion-parameter LLMs, reportedly delivering a 30x improvement in real-time inference. This system weighs 1.5 tonnes and and is powered by the Grace Blackwell Superchip. The chip is noted for its capability to process massive data volumes, including the equivalent of the world’s internet traffic in a single second. Efficiency benchmarks indicate a **4x improvement in performance per watt** and a **3x enhancement in performance per dollar** compared to the previous generation. In just one generation, these performance improvements have been remarkable, leading to significant energy efficiency gains and reduced power consumption. However, it’s important to acknowledge that AI systems still consume substantial energy compared to the _pre-generative AI_ era. From an environmental perspective, true progress will be achieved when AI not only minimises its environmental footprint but also actively contributes to _offsetting_ environmental damage, rather than adding to it. ## Section 6 - Agentic AI Interacting with an AI chatbot is fairly straightforward. You ask it something, and it answers. All it needs to do is interact with you in the moment and provide a response based on what it already knows. This process is direct and self-contained. With agentic AI, the workflow changes fundamentally. When you provide an input, it doesn’t just process your query in isolation. Instead, it spins up multiple models and programs, schedules tasks, and executes them in coordination to accomplish more complex goals. In the keynote, Huang discusses advancements from NVIDIA to simplify the development of agentic workflows. ![[../assets/nvidia_agentic.png]] [Video Timestamp](https://youtu.be/k82RwXqZHY8?t=2259) The above screenshot shows the 3 layers NVIDIA provides for developers to build agentic workflows. The first layer has NVIDIA NIMs which are basically pre-packaged AI models, then NVIDIA NeMo which is a digital employee onboarding pipeline and finally AI Blueprints. ### NVIDIA NIM [NVIDIA NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/) simplifies AI deployment by providing pre-packaged models in containers, making them easy to integrate into various systems. "Easy" because these are just API calls at the end of the day. These models cater to diverse use cases, including vision, speech, language understanding, and physical AI. And as every cloud provider has an NVIDIA GPU available, NIMs can run on every cloud provider. These NIMs can be accessed [here](https://build.nvidia.com/models), and there is documentation on how to get started with [NIMs for LLMs](https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html). A fun little NIM that caught my eye is the [google/deplot model](https://build.nvidia.com/google/google-deplot) that translates images of plots into tables. ![[../assets/nim_example.png]] An [example](https://build.nvidia.com/google/google-deplot) of a NIM. ### NVIDIA NeMo [NVIDIA NeMo](https://www.nvidia.com/en-gb/ai-data-science/products/nemo/) introduces AI as part of the workforce by enabling the onboarding of AI systems tailored to an organisation’s needs. These AI agents are positioned as digital coworkers capable of handling specific tasks, reflecting a shift in enterprise operations. This approach suggests that IT departments might evolve to manage digital AI agents in roles similar to how HR oversees human employees today. > "IT department of a company in the future is going to be the HR department of digital AI agents in the future" ### NVIDIA AI Blueprints [NVIDIA Blueprints](https://build.nvidia.com/blueprints) provide open-source templates designed to simplify the creation and deployment of AI agents. These templates act as a foundation, offering predefined structures and configurations that developers can use to build customised AI solutions quickly. By reducing the need to start from scratch, Blueprints enable teams to focus on refining and deploying their AI systems efficiently. An example of a blueprint is the [digital-human blueprint](https://build.nvidia.com/nvidia/digital-humans-for-customer-service/blueprintcard). Head over to the link to access loads of information about the blueprint such as what it does, what's included, example walkthrough, system requirements etc. I have screenshotted the architecture diagram to indicate the magnitude of processes in a very common agentic workflow, a digital human service agent. ![[../assets/digital-human-nvidia.png]] [Source](https://assets.ngc.nvidia.com/products/api-catalog/digital-humans-for-customer-service/diagram.png) ## Section 7 - Physical AI > "Cosmos world foundation model being open, we really hope will do for the robotics and ai industry what Llama3 has done for enterprise AI" ![[../assets/physical_ai_nvidia.png]] [Video Timestamp](https://youtu.be/k82RwXqZHY8?t=3161) In contrast to large language models (LLMs) focused on linguistic tasks, a "World Foundational Model" is necessary for understanding the physical world. This model should aim to emulate human-like intuitive understanding of physics, motion, and environmental interactions. In the keynote, Huang announces the release of [NVIDIA Cosmos](https://www.nvidia.com/en-gb/ai/cosmos/), a world foundational model (WFM) platform. It offers state-of-the-art generative world foundation models (WFMs), advanced tokenizers, guardrails, and an accelerated data processing and curation pipeline. Trained on 20 million hours of video from diverse physical scenarios—like humans walking, natural environments, and fast camera movements—Cosmos excels in capturing the essence of the physical world. Its key features include: - **Model Architecture**: Cosmos integrates different WFMs like autoregressive models, diffusion models, and advanced video tokenizers. - **Video Processing**: It includes the first CUDA- and AI-accelerated video processing and curation pipeline, designed to handle vast datasets effectively. - **Open Licensing**: Cosmos is open-licensed and accessible on GitHub, encouraging collaboration and innovation. - **Video Captioning**: Cosmos can caption videos, creating valuable data for training other AI models, including LLMs. [NVIDIA Omniverse](https://www.youtube.com/watch?v=dvdB-ndYJBM) is a realtime 3D graphics collaboration platform. One of its key uses is to create a "digital twin" (like real world infrastructure in the metaverse). It's basically a simulator. Connecting this to Cosmos would be like giving Cosmos the contextual ground truth of the scene which will enable the AI to generate. ![[../assets/omniverse_cosmos.png]] [Video Timestamp](https://youtu.be/k82RwXqZHY8?t=3646) Every robotics system needs a 3 computer solution for industrial applications. 1. **DGX**: Used to train AI models. 2. **AGX**: Deployed to make AI systems autonomous. 3. **Digital Twin**: A bridge between DGX and AGX. This is basically where the AI can test out what it has learned before stepping into the real world. This is basically where Omniverse with Cosmos can help. ![[../assets/3_computer_solution.png]] [Video Timestamp](https://youtu.be/k82RwXqZHY8?t=3774) Check out a portion of the keynote here to see how the above 3 computer solution works in practice for the [use case of industrial digitalisation](https://youtu.be/k82RwXqZHY8?t=3894). Apart from this, one of Cosmos's standout benefits is its ability to generate synthetic data samples that replicate real-world scenarios. These samples are invaluable for training autonomous vehicles (AVs) and [robots](https://blogs.nvidia.com/blog/isaac-gr00t-blueprint-humanoid-robotics/), as they simulate the complex environments these systems encounter. By significantly expanding the training dataset, Cosmos helps improve the accuracy and robustness of autonomous systems. Additionally, synthetic data is expected to be more cost-effective than real-world data collection over time, making it a scalable solution for advancing physical AI. ## Section 8 - Project DIGITS In 2016, NVIDIA created an out of the box supercomputer DGX1 and OpenAI were the first recipients of it. Almost a decade later, AI has now been unleashed into the masses. AI has become a cornerstone of modern computing—a "new way of doing computing" that is reshaping industries worldwide. In response to this, NVIDIA has developed [Project DIGITS](https://www.nvidia.com/en-gb/project-digits/). It's basically an AI supercomputer that is *far more affordable* and *accessible* than the super computers we have generally heard about in popular science. And it is capable of running large models, with upto 200 billion parameters. Some of the features Huang spoke about include - **Compact Design:** Smaller than its predecessor, yet equally powerful, thanks to the GB10—the smallest and most advanced Blackwell chip. - **Full NVIDIA Stack:** Runs NVIDIA’s entire ecosystem of AI and accelerated computing software, ensuring maximum compatibility and performance. - **Wireless Capability:** A truly modern design, Project DIGITS can operate wirelessly, reducing setup complexity and enabling flexible deployment. - **Cloud-Ready Platform:** Effectively a cloud computing platform that sits right on your desk, it allows users to access it like a cloud supercomputer. - **DGX Cloud Integration:** Project DIGITS is fully compatible with NVIDIA’s DGX Cloud, bringing enterprise-level AI capabilities to a personal scale. - **Scalable Connectivity:** With ConnectX, multiple DIGITS devices can be connected, enabling the creation of a powerful, distributed AI supercomputing network. These are expected to hit the market around May 2025, starting at $3000. I am quite excited for this one, even though procuring it would mean bidding adieu to night outs and fancy trips for the foreseeable future. But hey, with a super computer at home, why would I want to step out!? *Fin.*