New AI Models o3 and o4-mini by OpenAI Can Now ‘Think With Images’

OpenAI’s Latest AI Models: o3 and o4-mini
OpenAI has introduced two new AI models, called o3 and o4-mini, which represent a remarkable advancement in image processing. These models enable machines to interpret images with a level of reasoning usually associated with text. They allow for manipulations such as cropping, zooming, and rotating within their analytical process, thus enhancing their understanding of visual data.
What Does “Thinking with Images” Mean?
The concept of “thinking with images” applies to how these models process visual information more intuitively. Unlike previous models that relied on separate visual recognition systems, o3 and o4-mini combine text and image processing tools. This integration leads to more comprehensive and accurate results.
In practical terms, this means that if users provide images such as a handwritten math problem, a blurred street sign, or an intricate chart, the models can analyze and explain them step by step. As emphasized by OpenAI, the improved visual intelligence of ChatGPT helps in addressing complex queries more effectively than before.
Key Features of o3 and o4-mini
- Native blend of text and image processing
- Ability to manipulate images during analysis (crop, rotate, zoom)
- Enhanced understanding of complex visual problems
Performance Comparison with Previous Models
OpenAI highlights that o3 and o4-mini outshine their predecessors in important benchmarks. For instance, they have set new records in challenging academic and AI performance tests, including:
- STEM question-answering (such as MathVista)
- Chart reading and reasoning capabilities (CharXiv)
- Visual search performance (V*)
According to OpenAI, their models achieved remarkable accuracy in visual reasoning tasks, with a 95.7% success rate in the V* benchmark, demonstrating significant advancements in technology.
Challenges and Limitations
Despite these advancements, OpenAI acknowledges that the new models have limitations. Sometimes, they may overly complicate image analysis, leading to unnecessary manipulations. There can also be situations where the AI misinterprets images, even if it uses its tools effectively. As a result, reliability issues can arise when attempting the same task multiple times.
Access to o3 and o4-mini
As of April 16, the models o3 and o4-mini are available for ChatGPT Plus, Pro, and Team customers. They are replacements for older models like o1 and o3-mini. Additionally, users from education and enterprise sectors can expect access soon, while free users will have a chance to test o4-mini through an exciting new feature called “Think.”