AI Struggles to Read Clocks

AI’s Struggles with Time Interpretation
Advances in AI Technology
Artificial intelligence (AI) has made incredible strides in recent years. It can create lifelike images, compose stories, assist in educational tasks, and even help scientists predict how proteins structure themselves. However, new research highlights a significant shortcoming of these systems: their inability to accurately tell time.
Recent Research Findings
Researchers at the University of Edinburgh conducted a study that assessed the time-telling abilities of seven notable multimodal large language models (MLLMs). These models are designed to understand and generate various forms of media, including text and images. Their findings, which will be published in April, reveal that these AI systems struggle with even simple time-related questions based on images of clocks and calendars.
The researchers explained that interpreting time accurately from visual inputs is a vital skill for many real-world applications. These range from organizing events to use in autonomous vehicles. Despite the progress in MLLMs, many studies have concentrated on other areas like object recognition or scene analysis, leaving the aspect of temporal reasoning less explored.
The Testing Process
In their study, the team tested several AI models, including OpenAI’s GPT-4o and GPT-o1, Google’s DeepMind Gemini 2.0, Anthropic’s Claude 3.5 Sonnet, Meta’s Llama 3.2-11B-Vision-Instruct, Alibaba’s Qwen2-VL7B-Instruct, and ModelBest’s MiniCPM-V-2.6. The models were shown various images of analog clocks, featuring different dial designs, Roman numerals, and even some without second hands, alongside ten years’ worth of calendar images.
The researchers posed straightforward questions like "What time is shown on the clock in the image?" for clock images, and more complex queries for calendars, such as "What day of the week is New Year’s Day?" and "What is the 153rd day of the year?"
Understanding AI Performance
Reading analog clocks and understanding calendars involves several intricate cognitive processes. These include detailed visual recognition—like determining the position of clock hands and the layout of days on a calendar—and complex numerical reasoning, such as calculating day offsets.
Unfortunately, the results were not promising. On average, the models were able to read the time on analog clocks correctly less than 25% of the time. They faced particular challenges when interpreting clocks with Roman numerals or unusual designs, as well as those missing a second hand. The difficulties seem to stem from the models’ struggles to detect clock hands and interpret their positions accurately.
Highlights of the Results
Among the tested models, Google’s Gemini 2.0 performed best on the clock task, while OpenAI’s GPT-o1 achieved an 80% accuracy rate for calendar-related questions—significantly better than its counterparts. However, even the top-performing model still made mistakes around 20% of the time.
Rohit Saxena, a co-author of the study and PhD student at the University of Edinburgh, emphasized the implications of these findings. He noted that most people learn to tell time and use calendars during childhood, which makes the limitations of AI in these areas particularly notable.
For AI to be effectively integrated into environments that rely on accurate time management—such as scheduling tasks, automating processes, and developing assistive technologies—these deficits in basic functionalities will need to be addressed.
Implications for Future AI Development
While AI technology continues to advance and can tackle sophisticated subjects, these research findings underscore the necessity for developers to focus on enhancing the models’ ability to interpret time. Until these issues are resolved, relying on AI for time-sensitive tasks may not be advisable.