Unpacking "12 Days of OpenAI" - Insights and Experiments - Ram-bunctious Ruminations

[12 Days of OpenAI](https://openai.com/12-days/) was an exciting series of updates and product demos unveiled by OpenAI in December 2024. This initiative showcased a wide range of advancements across AI capabilities, tools, and integrations, setting new benchmarks for what AI can do. In this article, I summarise these releases, sharing my personal experiences, thoughts, and experiments with some of the most intriguing and impactful features. ![[../assets/openai_shipmas.webp]] *DALL-E Prompt - "a ship docked at a port in the arctic. container cargo is being loaded onto the ship by reindeers and elves. there is snowfall, anime, wide aspect ratio"* ## Summary For those who enjoy short-form content more, here is the tldr version of OpenAI's 12 days of Christmas. 1. **Smarter AI Models**: The o1 model now works faster, smarter, and can process text + images. 2. **Personalised AI**: Fine-tune ChatGPT to excel in tasks unique to your needs. 3. **Video Made Simple**: Sora transforms your words into short, shareable videos. 4. **Collaborative Workspace**: Canvas combines AI-powered writing and coding tools in one space. 5. **Apple Integration**: ChatGPT boosts Siri’s abilities for smarter help and better results. 6. **Voice and Video**: Talk to ChatGPT naturally and watch it guide you step-by-step. 7. **Organised Projects**: Group chats, files, and instructions for a streamlined workflow. 8. **Smarter Searching**: Real-time web searches and embedded videos in your chats. 9. **Better Developer Tools**: New APIs make building with AI faster and easier. 10. **AI on Call**: ChatGPT now works via phone or WhatsApp for instant help. 11. **Desktop Ready**: A new app integrates ChatGPT seamlessly into your computer. 12. **Future of AI**: The o3 model showcases breakthrough reasoning like never before. ## Day 1: The Release of o1 Full Version and o1 Pro Mode In day 1, OpenAI released 2 features - a full version of a new large language model as well as a new commercial tier. The former includes the full release of the o1 model, indicated to be OpenAI's strongest model so far. The commercial tier alludes to the creation of ChatGPT Pro mode, which offers more compute power and naturally, comes at a 10x cost when compared to ChatGPT Plus. In September 2024, ChatGPT's [o1 model was released in preview mode](https://openai.com/index/introducing-openai-o1-preview/). The key differentiator between o1 and the previous 4o model was in that OpenAI claims o1 can think through problems before the respond, a bit like most human problem solving. Along with this, there was also o1-mini, which was a faster, cheaper version of the o1. As an in-preview product, o1 was not without its faults such as very delayed responses (10 seconds to respond to a "Hi!") and it was not multimodal[^1]. With this release, o1 now has performance improvements which drastically reduces the wait time on tasks. As per the team, it is 50% faster than o1-preview, and makes 34% less mistakes than o1-preview. The o1 model also is multimodal now, which personally, I find very exciting. Let's see how this works in practice through a simple example - In one of my past articles, I had drawn up a graph using [Excalidraw](https://excalidraw.com/) to represent the rise and decline of motivation when I want to start a new project. I want to now understand what aspects of psychology can help me understand this. For the experiment, I have used both 4o and o1. ![[../assets/multimodal_example.png]] ChatGPT provides a cohesive answer even though almost all the information it would need to do this is present ONLY in the image that has been provided. From observation, I felt the answer provided by o1 was better (even though 4o does a very decent job) as it was succinct and made clearer call backs into the text I have on my diagram. > [!tip]- Click to see the response from o1 > From a psychological perspective, this chart illustrates a very common emotional trajectory when pursuing any challenging goal or creative endeavor: > > **Early Enthusiasm (“I am gonna be famous!”)** - At first, excitement and motivation surge. The mind crafts idealized visions of success—e.g., you imagine your work going viral or receiving universal acclaim. This optimism often leads to a burst of energy and dedicated effort. Psychologically, you are fueled by **intrinsic motivation** (because it’s fun, new, and full of possibility) and by the **dopamine spike** that comes with novelty and anticipation. > > **Peak of Expectation (“Hmm…”)** - As time goes on, the initial rush of enthusiasm remains high, and you start feeling a hint of uncertainty (“Is it really going to turn out the way I imagined?”). Effort is still strong, but cracks of doubt may surface. This stage often involves a **cognitive dissonance** between the early high hopes and the first whiffs of reality. You might find yourself vacillating between euphoric expectation and creeping self-questioning. > > **Reality Check** - This is the vertical red bar—where you confront the actual complexities, setbacks, or external feedback (e.g., critiques, rejections, or just realizing that success is more complicated than you thought). Psychologically, this triggers a **discrepancy** between your mental model of how things “should” go and how they are _actually_ going. Feelings of disillusionment, frustration, or even shame may arise here. > > **Negative Self-Evaluation (“Noooo! I suck!”)** - After the reality check, you dip into a low—often called the **“Valley of Despair”** in learning or the creative process. The mind can flip from inflated optimism to harsh self-criticism or even despair. You may conclude that all your effort was wasted or that you’re incapable. This reflects our innate tendency toward **catastrophic thinking** when reality does not match ideals. Perfectionistic or all-or-nothing thinking makes the lows feel especially sharp at this point. > > **Potential Outcomes Going Forward** - Though not shown explicitly, there is typically a gradual climb out of this trough if you persist. From a psychological lens, this involves developing: > 1. **Realistic Expectations** – Shifting from “I’m going to be instantly famous” to “I’m improving my craft day by day.” > 2. **Self-Compassion** – Recognizing that setbacks are normal and do not reflect your overall worth. > 3. **Growth Mindset** – Understanding that skill and success evolve through sustained effort and learning from mistakes. > > In sum, this curve represents a **classic motivational arc**: early excitement, a euphoric peak, a stark encounter with reality, and the subsequent emotional slump. The psychological key is learning that the slump is not failure but part of the process—by reframing it as an opportunity to refine your approach, you can navigate beyond the dip and continue on a more sustainable, grounded path. I am no psychology expert, but the answer is pretty much good enough for me. As with all LLMs, o1 might not be perfect, however it can definitely be a worthy tool for visual learners / writers like me who prefer to draw things out to understand concepts. The multimodal capability lets o1 analyze diverse inputs, such as screenshots or handwritten notes. For instance, a student can upload notes, and o1 can summarize, generate questions, or visualize concepts, showing its versatile potential. Now, coming to [ChatGPT Pro](https://openai.com/index/introducing-chatgpt-pro/), this is a new pricing tier. It offers better reliability and accuracy[^2] than o1. Users of this tier have unlimited access to all models released prior to and including o1 pro mode, and including advanced voice mode. As its priced at $200/month, a 10x jump from ChatGPT Plus; it is a model recommended only for those power users who are already pushing the limits of o1. For example, those solving very complicated problems in math or scientific research. As it uses a lot more compute, it could be slower than o1. So, unless one really can't do something with o1, I guess pro mode is not really the better financial choice. In the near future, OpenAI plans to integrate o1 into APIs and expand Pro Mode with more compute power, web browsing, and file uploads. OpenAI is also opening up some funding for Grants for U.S. medical researchers. I can only imagine this is a strategic move to become market leaders in terms of AI usage in the scientific research community. ## Day 2: Fine-Tuning o1 with Your Own Data On Day 2, OpenAI unveiled the ability to fine-tune the o1 model using your own datasets. Leveraging reinforcement fine-tuning (RFT), this approach allows users to enhance o1’s reasoning capabilities for custom domains. Although the full release is planned for next year, this preview demonstrates immense potential. [Reinforcement Fine Tuning (RFT)](https://help.openai.com/en/articles/10250364-how-to-access-reinforcement-fine-tuning) is an alternative approach to supervised fine tuning when the focus is on improving the process of answering a question rather than the answer itself. Supervised fine tuning is ideal for tasks with clear, explicit answers, like summarisation or classification, while RFT optimises for subjective preferences, especially when answers need to be seeded in deep domain expertise. This method of fine tuning enables o1 to learn new reasoning patterns and adapt to specialised domains. By reinforcing correct thought processes and disincentivizing wrong ones, RFT mirrors techniques OpenAI uses internally. This means users can tailor the model for expert-level performance in fields requiring systematic reasoning, such as medicine, law, or engineering. To showcase the potential of RFT, the team at OpenAI in their demo used RFT to improve o1-mini's performance and make it outperform o1 (a naturally more powerful model than o1-mini) to answer questions over a large corpus of medical and genetic data. See screenshot below taken from the video. ![[../assets/RFT_openai.png]] [Source](https://youtu.be/yCIYS9fx56U?t=828) This feature is invaluable for professionals needing expert domain knowledge. For example, a healthcare researcher could fine-tune o1 to analyse complex medical data or a legal expert could tailor it to interpret case law. By bridging the gap between general-purpose AI and domain-specific expertise, RFT expands o1’s versatility. Fine-tuning makes AI both domain-specific and general-purpose, offering unprecedented adaptability. OpenAI plans to expand this program, inviting researchers to explore its potential through initiatives like the [RFT Research Program](https://openai.com/form/rft-research-program/). ## Day 3: Sora – Turning Text into Videos Day 3 of OpenAI’s 12 Days introduced [Sora](https://sora.com/) Turbo, the latest version of Sora, a tool that generates videos directly from text, offering ChatGPT Plus and Pro subscribers a creative and flexible way to bring ideas to life visually. Sora is a video generation tool designed to transform textual descriptions into short videos, providing customisation options such as aspect ratio (horizontal, square, or vertical), resolution (480p for fastest results, 720p at 4x slower speed, or 1080p at 8x slower speed), video length (from 5 to 20 seconds), and variations (one, two, or four versions of a video). Users can also apply preset styles or create their own for unique visual effects. Beyond basic customisation, [Sora supports advanced features that elevate creativity](https://openai.com/index/sora-is-here/). The **storyboard functionality** allows users to add different descriptions to timeline cards, enabling Sora to connect actions between them seamlessly. Images can also be used to create storyboards, making the process even more versatile. Features like **Remix** let users update videos by modifying the original, while **Recut** enables trimming and extending videos with additional directions. The **Loop** feature helps create seamless looping videos, and **Blend** merges two scenes into a single cohesive video. These tools make Sora a powerful yet approachable platform for video creation. Sora also fosters a sense of community through its [Explore](https://sora.com/explore) feature, which allows users to view and draw inspiration from videos created by others, encouraging collaboration and creativity. Currently, Sora is available in select countries, starting with the U.S., and offers 50 monthly generations for Plus users and up to 500 fast-mode generations for Pro users, who also enjoy unlimited slow-queue generations. Although it is not yet available in regions like Great Britain and the EU (sad me!), OpenAI plans to expand its availability. OpenAI has ensured that Sora’s capabilities remain conservative to prevent misuse and ensure legality. Videos generated by Sora include [C2PA metadata](https://c2pa.org/), a standard for content provenance and authenticity, making it clear they are AI-generated. This approach builds on OpenAI’s implementation of similar metadata in [DALL-E](https://help.openai.com/en/articles/8912793-c2pa-in-dall-e-3), reinforcing transparency and ethical use. The OpenAI team underscores that while Sora is not intended to replace professional filmmaking, it serves as a versatile tool for creating engaging visual content quickly. Its applications span marketing campaigns, educational videos, and more, empowering individuals to visualise their ideas in entirely new ways with user-friendly controls. By combining innovation and accessibility, Sora bridges the gap between creativity and technology. ## Day 4: Canvas – A New Collaborative Workspace Day 4 of OpenAI’s 12 Days marked the [official release of Canvas, a full-fledged feature moving out of beta](https://openai.com/index/introducing-canvas/). Canvas introduced a dual-pane interface where the left pane hosts the familiar ChatGPT chat and the right pane is a collaborative workspace for projects. This setup transforms ChatGPT into an interactive writing or coding assistant, allowing users to work seamlessly on documents and code while engaging with ChatGPT. #### **Using Canvas for Writing** When it comes to writing, it offers a range of features that make the writing and editing process intuitive and efficient. Commonly used shortcuts include suggesting edits, adjusting text length, modifying reading levels, applying final polish, and even adding emojis. To provides an example of just how good[^3] this can be, I tried generating a story based off my own cricketing interests. I have used the 4o model for this. ![[../assets/canvas_openai.png]] If you are interested in reading the whole generated story, [see here](https://chatgpt.com/share/6769c107-33cc-8011-b6b1-54ebb0c23e8a). One of Canvas' standout capabilities is the ability to leave comments directly on specific sections of a document. This makes feedback clear, precise, and targeted, significantly improving the editing experience. Growing on the previous example, here is how comments work. *I did have to ask ChatGPT twice because it errored out the first time*. ![[../assets/canvas_comments_openai.png]] As I see it, Canvas is like having an AI-infused google docs where the AI acts as your collaborator. Pretty cool! If you have any kind of reservations around such a workflow, I would urge you read this [brilliant piece on why using AI for your work is actually a smart call, and not taboo](https://every.to/learning-curve/the-future-of-work-isn-t-about-doing-less-it-s-about-doing-better). #### **Using Canvas for Coding** A remarkable addition to Canvas is its support for running Python code. If a user adds Python code to the workspace, ChatGPT recognizes it and builds out a runtime environment, enabling users to debug code directly within Canvas. This is made possible by a web assembly Python emulator integrated into the workspace, creating a seamless development and testing experience. One again, I have tried a simple enough example to try this out. My aim is to see if it can generate a plot (similar to what the OpenAI team showed in their demo). The results are quite promising! ![[../assets/canvas_code.png]] If you are interested in the whole chat, [see here](https://chatgpt.com/share/6769c945-bf84-8011-a832-f974e2fd8bf3). Canvas will soon also be available within custom GPTs, another OpenAI feature which enables users to create their own GPTs that contextually have domain information. ## Day 5: ChatGPT Integration with Apple Intelligence Day 5 of OpenAI’s 12 Days introduced a groundbreaking integration between ChatGPT and Apple Intelligence, bringing the power of conversational AI to Apple’s ecosystem. This collaboration enhances the capabilities of Apple’s devices, offering users new ways to interact with technology through three key integrations. The first integration enables Siri to hand off complex tasks to ChatGPT. By delegating tasks that require more in-depth reasoning or nuanced conversation, Siri enhances its utility while leveraging ChatGPT’s strengths. For instance, users can ask Siri to draft a detailed email or brainstorm ideas, and ChatGPT seamlessly takes over, providing a polished response. [Learn more about using ChatGPT with Siri](https://support.apple.com/guide/iphone/use-chatgpt-with-apple-intelligence-iph00fd3c8c2/ios?utm_source=chatgpt.com). The second integration focuses on writing tools, allowing users to compose documents from scratch directly through their Apple devices. This feature empowers users to create professional-quality content effortlessly, whether drafting reports, crafting letters, or outlining creative projects. [Explore the full capabilities of Apple Intelligence](https://www.apple.com/newsroom/2024/12/apple-intelligence-now-features-image-playground-genmoji-and-more/?utm_source=chatgpt.com). The third integration involves camera control, where ChatGPT helps describe what the camera sees. This capability can be invaluable for visually impaired users or those needing real-time assistance in interpreting their environment. From identifying objects to describing scenes, this feature demonstrates the practical and inclusive applications of AI. [Read more about ChatGPT camera integrations](https://support.apple.com/guide/iphone/use-chatgpt-with-apple-intelligence-iph00fd3c8c2/ios?utm_source=chatgpt.com). As a recent iPhone user, I am quite keen to see how this integration would help me. I am a bit confused about where Siri's boundaries stop and where ChatGPT takes over, but I guess this is something we will learn more about over time. ## Day 6: Advanced Voice and Video Features Day 6 of OpenAI’s 12 Days brought an update with advanced voice capabilities and a new video mode, further enhancing ChatGPT’s multimodal functionalities. These features enable users to interact naturally through voice inputs and receive voice outputs, making interactions more seamless and engaging. The advanced voice feature that was [previously made available last year](https://openai.com/index/chatgpt-can-now-see-hear-and-speak/?utm_source=chatgpt.com), allows users to communicate with ChatGPT using voice, with the AI responding in kind. Additionally, the new video and screen-sharing functionalities enable real-time collaboration. For example, ChatGPT can guide users through step-by-step instructions for tasks like making coffee or assembling furniture, adding a dynamic and practical layer to user interactions. This video mode is currently available only on iOS or Android apps (for plus and pro users). I have tried my hand at experimenting with this feature too. The results are so exciting! It is like having my own [pokedex](https://www.youtube.com/watch?v=wVcerPofkE0)! ![[../assets/openai_video.png]] A festive addition to this update is the introduction of Santa mode, available globally. This feature brings a touch of seasonal fun to ChatGPT interactions, making it perfect for engaging with children or adding a holiday theme to conversations. Users can activate Santa Mode by tapping the snowflake icon on the ChatGPT home screen or through voice settings. [Discover more about Santa mode](https://help.openai.com/en/articles/10139238-santa-s-voice-in-chatgpt?utm_source=chatgpt.com). ![[../assets/santa_mode.png]] ## Day 7: Projects in ChatGPT Day 7 of OpenAI’s 12 Days introduced "Projects," a feature designed to enhance organisation and personalisation within ChatGPT. This functionality allows users to streamline their workflows by grouping conversations, files, and instructions into dedicated project spaces. [Learn more about Projects](https://help.openai.com/en/articles/10169521-using-projects-in-chatgpt?utm_source=chatgpt.com). With Projects, users can upload files, set custom instructions, and tailor ChatGPT’s behaviour for all conversations within a specific project. The feature also lets users organise their conversations effectively, making it easier to manage complex or ongoing tasks. Creating a project is simple: users can add a title, assign a folder colour for easy identification, and start new chats directly within the project folder. Additionally, older chats can be dragged and dropped into the folder for consolidation. One of the standout capabilities of Projects is the ability to upload files, such as data or documentation, directly into the project space. This integration ensures that all relevant materials are easily accessible and centralised. Users can also provide custom instructions to refine how ChatGPT responds to queries, tailoring interactions to the unique needs of the project. Once again, I have a demo of how this feature works. I have created a project called "Becoming a better cricketer". In the project, I have uploaded 9 files with match reports I had written up last season on my individual performances. I also have provided an instruction so ChatGPT understands how to answer questions. I have asked two questions and ChatGPT has answered them by referring to the content in the uploaded match reports. I can't wait to show up on the field as a better cricketer next season!!![^4] ![[../assets/chatgpt_project.png]] Recent coverage highlights how Projects can revolutionise workflows. For instance, [The Verge](https://www.theverge.com/2024/12/13/24320800/openai-chatgpt-projects-folders-ai-chats?utm_source=chatgpt.com) describes them as "fancy folders for your AI chats," while [TechRadar](https://www.techradar.com/computing/artificial-intelligence/chatgpts-new-projects-feature-can-organize-your-ai-clutter?utm_source=chatgpt.com) emphasises their potential to "organise your AI clutter." This feature is available to Plus, Pro, and Team users, making it a versatile tool for individuals and collaborative teams alike. By combining organisational tools with ChatGPT’s conversational prowess, Projects represents a significant step forward in optimising productivity and collaboration. ## Day 8: ChatGPT Search and Real-Time Information Access Day 8 of OpenAI’s 12 Days released the search functionality in ChatGPT to all users logged into ChatGPT. This was [initially available only for paid customers](https://openai.com/index/introducing-chatgpt-search/) when it was release two months ago. One of the most exciting aspects of this update is the ability to search as you talk to ChatGPT. This feature allows users to access up-to-date web information directly within their conversations, offering a seamless way to find and utilise real-time data. ChatGPT can now embed videos in its output, enhancing its ability to provide rich, engaging responses. Additionally, it provides sources for the information it retrieves, adding a layer of transparency and reliability to its outputs. The mobile experience has also been improved, with faster and more responsive interactions, making it even more convenient for users on the go. ![[../assets/openai_search.png]] Here is an example of ChatGPT giving me a video embed of Daniel Day Lewis' acting. This is almost like having a distraction free YouTube! A notable addition is the ChatGPT search extension, accessible from the browser bar. This feature streamlines navigation and eliminates the need to sift through multiple links on traditional search engines. It’s particularly appealing for users who prefer a direct, AI-driven approach to finding information. However, in my opinion, as with any AI tool, the effectiveness of this extension may depend on the type of task and the level of trust users place in the AI’s responses. Voice-enabled search is another standout feature, allowing users to access up-to-date web information through conversational interactions. This offers a hands-free, intuitive way to engage with ChatGPT, though it invites comparisons to voice assistants like Siri. How this differs in practicality and scope from existing voice technologies remains to be fully explored. ## Day 9: Developer Tools and OpenAI Dev Day Day 9 of OpenAI’s 12 Days focused on developers, unveiling new tools and updates announced during OpenAI’s Dev Day. These releases aim to empower developers with improved APIs, fine-tuning options, and expanded SDKs. [Learn more about the updates from OpenAI Dev Day](https://openai.com/index/o1-and-new-tools-for-developers/). One major highlight is the release of the o1 API, now out of preview. This updated API introduces several powerful features. As verbatim on the openAI page, these are : - **Function Calling**: Seamlessly connect o1 to external data and APIs. [Learn more about Function Calling](https://platform.openai.com/docs/guides/function-calling). - **Structured Outputs**: Generate responses that reliably adhere to your custom JSON Schema. [Read about Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs). - **Developer Messages**: Specify instructions or context for the model to follow, such as defining tone, style, and other behavioral guidance. - **Vision Capabilities**: Reason over images to unlock applications in science, manufacturing, or coding where visual inputs are crucial. - **Lower Latency**: o1 uses, on average, 60% fewer reasoning tokens than o1-preview for a given request, making it faster and cheaper to use. Real-time capabilities have also been enhanced, with WebRTC integration enabling developers to add real-time features with minimal code. Cost reductions further increase accessibility for real-time API applications, solidifying OpenAI’s commitment to affordable and scalable solutions. Preference fine-tuning is another exciting addition, now available through the API. Unlike traditional supervised fine-tuning, this method uses Direct Preference Optimization to help the model differentiate between preferred and non-preferred outputs. This approach is particularly effective for tasks where "better" responses are subjective, offering more flexibility to developers seeking tailored solutions. To support broader adoption, OpenAI has also introduced Go and Java SDKs, currently in beta. These SDKs expand the ecosystem, making it easier for developers working in diverse environments to integrate OpenAI’s tools into their workflows. ## Day 10: ChatGPT Comes to the Telephone Day 10 of OpenAI’s 12 Days takes a quirky turn with the introduction of ChatGPT on the telephone. This new feature allows users to engage with ChatGPT by dialling **1-800-ChatGPT (1-800-242-8478)**. Currently, this service is available for calls within the USA, while users in other countries can access it via WhatsApp. If making a call, [up to 15 minutes are free](https://www.theverge.com/2024/12/18/24324376/openai-shipmas-1-800-chatgpt-whatsapp). The primary goal of this feature is to make ChatGPT accessible even without a data connection. However, I think it raises an amusing question: how would WhatsApp even work without data?Clearly, this particular convenience caters to the U.S. audience for now. Though I do not yet see the impact this feature can have, the update nevertheless highlights OpenAI’s efforts to diversify how users can interact with ChatGPT, pushing the boundaries of accessibility and convenience in creative ways. Only time will tell if this feature will get enough buy in from consumers. ## Day 11: ChatGPT Desktop App and "Work with Apps" Day 11 of OpenAI’s 12 Days saw the release of the ChatGPT desktop app, introducing the "Work with Apps" feature. This capability allows ChatGPT to automatically pull context from your system, removing the need to copy-paste content manually. [Learn more about this feature here](https://help.openai.com/en/articles/10119604-work-with-apps-on-macos). The idea behind "Work with Apps" is to position ChatGPT as a versatile tool that integrates seamlessly with other applications, aiming to be a one-stop solution for productivity tasks. For instance, it can work in tandem with existing tools to streamline workflows. ![[../assets/chatgpt_desktop.png]] [Source](https://help.openai.com/en/articles/10119604-work-with-apps-on-macos) Personally, I’m uncertain about this feature’s appeal, partly because I don’t have a macOS system that meets the recommended specifications. When it comes to coding, I find this feature less compelling as tools like GitHub Copilot, which integrate directly into IDEs like VSCode, offer a more robust experience (even if we set aside concerns about exposing an entire codebase). If, as showcased in demos, this feature is intended to assist with writing, I’d argue that ChatGPT’s Canvas feature already fulfils that need effectively. That said, there’s no harm in OpenAI exploring new avenues. However, as a consumer, this feels like a weaker addition to the lineup, though it may find its niche among macOS users with specific needs. ## Day 12: o3 Preview Day 12 of OpenAI’s 12 Days concluded with the announcement of the o3 model preview, marking a significant leap in AI capabilities. The [o3 model surpasses o1 in several advanced domains](https://www.datacamp.com/blog/o3-openai), including competitive coding, complex mathematics, and PhD-level science questions. Notably, while an expert PhD achieves a 70% accuracy rate on challenging scientific queries, o3 excels with an impressive 87.7% accuracy. Additionally, o3 [performed exceptionally well on the ARC AGI benchmark, scoring 87.5%](https://arcprize.org/blog/oai-o3-pub-breakthrough). The ARC AGI benchmark, short for the Abstraction and Reasoning Corpus Artificial General Intelligence benchmark, tests an AI’s ability to think and reason like a human. Instead of evaluating tasks based on patterns the AI has seen before, this benchmark focuses on whether the AI can handle entirely new problems without prior specific training. Imagine you’re given a puzzle where you have to figure out the rules based only on a few examples, and then solve new puzzles using those rules. Humans excel at this because we can reason abstractly and apply knowledge to unfamiliar situations. The ARC AGI benchmark measures whether AI can perform this type of reasoning. For instance, the AI might be shown a sequence of shapes that follow a hidden pattern (e.g., each shape is rotated by 90 degrees) and then asked to predict the next shape in the sequence. A high score, like o3’s 87.5%, indicates significant progress toward general intelligence, where AI can tackle novel and abstract problems like humans do. And if you take a look at mainstream media, almost everyone is going bonkers over this possibility of AGI (in fact they have for quite sometime now)! Currently, o3 is not available for general use but is accessible to safety researchers for evaluation and testing purposes. This ensures that its deployment aligns with robust safety standards and ethical considerations. [Learn more about early access for safety testing here](https://openai.com/index/early-access-for-safety-testing/). --- Thanks for reading if you are with me so far! Wrote this around Christmas 2025, so here is my favourite Christmas music. If you are an Office fan, you will probably agree with me! <iframe style="border-radius:12px" src="https://open.spotify.com/embed/track/3sBnSBnzpEYKpJYGsAIbup?utm_source=generator" width="100%" height="152" frameBorder="0" allowfullscreen="" allow="autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture" loading="lazy"></iframe> [^1]: Multimodal AI is capable to taking in input from different input types at the same time such as text, images, voice etc. [^2]: **Accuracy** is about giving the right answer in the moment, while **reliability** is about giving the right answer consistently over time and across different situations. [^3]: When I say *good*, I don't mean its better than having a person write this. I just think it does a much better job than what we generally like to give it credit for! [^4]: Well I am joking. So, don't hold me to this!