Comparing The Accuracy Of Various AI Chatbots

Major Flaw in AI-Powered Search Engines

A recent investigation by the Tow Center for Digital Journalism has shed light on an alarming issue with AI-driven search engines. The study examined eight popular AI platforms and revealed that over 60% of their responses contained inaccuracies or misleading citations. Among the various platforms evaluated, Elon Musk’s Grok 3 was found to be particularly problematic, with an astounding 94% error rate in its citations. In contrast, the AI platform Perplexity had a lower error rate of 37% but still demonstrated significant inaccuracies.

The Issue of Citation Accuracy

AI search engines and chatbots, such as ChatGPT and Grok, often reference renowned news sources like BBC, The New York Times, and Reuters. The intention behind this strategy is straightforward: by linking to credible publications, these services aim to enhance their trustworthiness. However, the findings of the Tow Center’s study cast doubt on this practice.

Many cited sources do not actually lead to the original articles. Instead, they may include fabricated links, reuse or misappropriate published work, or incorrectly attribute articles to various publishers. This not only damages the credibility of the chatbots but also raises concerns about the impact on the reputation of the original news sources.

Worse yet, users who fail to verify these sources may inadvertently share misinformation, perpetuating the inaccuracies present in the AI-generated responses.

Handling of Restricted Content by AI Platforms

The analysis also highlighted troubling inconsistencies regarding how AI chatbots manage access to restricted content. Certain platforms, such as ChatGPT and Perplexity, occasionally provided answers to questions about content that they should not have had access to. For example, Perplexity Pro recognized nearly one-third of 90 excerpts from articles known to be off-limits to crawlers.

A concerning illustration of this was found in the free version of Perplexity, which successfully answered 10 queries related to articles from National Geographic, a publisher that has explicitly prohibited Perplexity from accessing its content. Although AI systems can sometimes derive insights from publicly available material, this situation raises questions regarding Perplexity’s adherence to publisher restrictions.

In January, reports revealed that Perplexity referred to New York Times content 146,000 times, even though the publisher had blocked the use of its crawlers. Though ChatGPT answered fewer questions related to restricted content, it still demonstrated a tendency to provide incorrect information rather than decline to answer.

AI’s Overconfidence Problem

One notable flaw of AI technologies is their tendency to display unwarranted confidence, even when they are incorrect. AI search engines often do not acknowledge gaps in their knowledge; instead, they generate assertive but incorrect responses. This phenomenon, frequently referred to as "hallucination," can make it difficult for users, particularly those lacking expertise in specific topics, to identify misinformation.

The Necessity of Human Oversight

Given the persistent challenges associated with accuracy in AI search engines, the role of human judgment becomes even more vital. Activities such as fact-checking, cross-referencing multiple sources, and applying critical thinking skills are essential for discerning fact from fiction. Until AI platforms markedly improve their reliability in sourcing information, users must maintain a level of skepticism regarding AI-produced citations.

For those interested in exploring the study in greater depth, you can access the complete findings from the Tow Center here.

Please follow and like us: