AI Search Engines Struggling with Inaccuracies

Chatbots Provide Incorrect Answers to Over 60% of Queries, Study Reveals
Overview of AI Search Tools
A study conducted by the Tow Center for Digital Journalism has raised significant concerns regarding the reliability of AI search tools, which are increasingly being viewed as alternatives to traditional search engines. The research highlights troubling inconsistencies and inaccuracies that may impact how users perceive and utilize these AI technologies in their search for information.
Key Findings from the Study
The research assessed eight AI search engines:
- ChatGPT Search
- Gemini
- Perplexity
- Perplexity Pro
- DeepSeek Search
- Microsoft’s Copilot
- Grok-2 Search
- Grok-3 Search
The study analyzed responses to 200 randomly selected articles from 20 different news organizations, ensuring that each article was present in the top three Google search results for its exact excerpt. The results were graded based on the degree of accuracy in citation—whether the article, publisher, and URL were accurately identified.
Inaccuracy Rates
The study’s findings indicated that:
- More than 60% of the responses from the chatbots were incorrect overall.
- Grok-3 Search demonstrated a staggering error rate, providing correct answers only 4% of the time.
- Microsoft’s Copilot was unable to respond to 104 out of 200 queries, and of the 96 responses it provided, only 16 were completely accurate.
- ChatGPT Search addressed all 200 queries, achieving correct answers just 28% of the time, while being entirely wrong in 57% of cases.
This widespread inaccuracy supports ongoing concerns about AI systems, which can confidently assert misinformation—often referred to as “hallucinations.” These inaccuracies were previously underscored in a 2023 report by Ted Gioia, which described ChatGPT’s tendency to present incorrect data with undue certainty.
Misleading Confidence
The study found that some AI chatbots even fabricated references and failed to provide necessary citations when asked. Particularly troubling is the observation that paid tools, like Perplexity Pro (costing $20/month) and Grok-3 Search (costing $40/month), exhibited higher error rates than freely available counterparts.
Moreover, some chatbots contravened Robot Exclusion Protocols, accessing material from blocked publishers. For instance, Perplexity Pro was noted to have correctly cited excerpts from articles it wasn’t supposed to access nearly one-third of the time.
The Role of Partnerships
The accuracy of chatbot responses did not necessarily improve even with partnerships between AI companies and news organizations. The findings revealed varying levels of accuracy when citing partnered publishers, emphasizing the need for caution in relying solely on AI for reliable information.
Implications for Users
The research underscores a critical issue: many users, particularly younger individuals, may perceive AI as a quick and authoritative source for information. This reliance could lead to a lack of essential research and analytical skills.
The Tow Center advocates for a viewpoint shift—encouraging AI to be seen as a tool aimed at enhancing human capabilities rather than a substitute for traditional research methods.
Through this study, the hope is that users approach AI tools with a critical mindset, always verifying information through trusted sources rather than relying solely on the outputs from chatbots.