09/04/2025

NewsGuard One-Year AI Audit Progress Report Finds that AI Models Spread Falsehoods in the News 35% of the Time

New report ranks chatbots by performance as average fail rate doubles

(Sept. 4, 2025 — New York, NY) NewsGuard today published its anniversary edition of the AI False Claims Monitor, the standardized monthly benchmark for how the world’s leading generative AI tools handle provably false claims. For the first time, NewsGuard de-anonymized the audit results and named the scores for the top LLMs. 

On average, the chatbots spread false claims when prompted with questions about controversial news topics 35 percent of the time, almost double the 18 percent rate of a year ago.   

The audit report focuses on the 10 leading large-language model chatbots: OpenAI’s ChatGPT-5, You.com’s Smart Assistant, xAI’s Grok, Inflection’s Pi, Mistral’s le Chat, Microsoft’s Copilot, Meta AI, Anthropic’s Claude, Google’s Gemini, and Perplexity’s answer engine. 

This chart shows the percent of the time each of the AI models spreads false claims:

This special anniversary edition breaks from NewsGuard’s standard practice of reporting only monthly aggregate results without reporting the performance of chatbots by name. After a year of conducting audits, NewsGuard’s company-specific data was robust enough to draw conclusions about where progress has been made, and where the chatbots still fall short.

 

NewsGuard’s Monthly AI False Claims Monitor

As the domain experts in data reliability for controversial news topics, NewsGuard provides the leading red-teaming analysis for information reliability. This latest report demonstrates that chatbots continue to face significant challenges in ensuring their models provide safe, accurate responses to prompts instead of spreading false claims on the internet or refusing to respond to topics in the news.

Read more about the methodology here. Read our methodology FAQ here.

 

August 2025 Anniversary Progress Report

The main finding is that on average, the top AI models spread false claims 35 percent of the time.

A third alternative to responses with accurate results or with false claims in our audits was caution: The AI models during earlier audits would decline to answer prompts about many news-related topics. In the initial audits begun last year, this resulted in an overall, broader fail rate—defined as either repeating a false claim or declining to debunk it by simply refusing to answer—that was higher last year, when that broader fail rate a year ago it was 49 percent, whereas this past August it was 35 percent. But that was only because last year the chatbots cautiously refused to assert that they knew the answer, whereas this year they answered prompts 100 percent of the time, but with wrong answers 35 percent of the time.

“For the past year, we’ve kept our AI audit results anonymous to encourage collaboration with the platforms. But the stakes have grown too high. By naming the chatbots, we’re giving policymakers, journalists, the public, and the platforms themselves a clear view of how the major AI tools perform when confronted with provably false claims,” said Matt Skibinski, NewsGuard’s Chief Operating Officer. 

Download the full report for details on which chatbots produced false claims in their responses on news topics assessed by NewsGuard analysts. 

 

About NewsGuard

NewsGuard helps consumers and enterprises find reliable information online with transparent and apolitical data and tools. Founded in 2018 by media entrepreneur and award-winning journalist Steven Brill and former Wall Street Journal publisher Gordon Crovitz, NewsGuard’s global staff of information reliability analysts has collected, updated, and deployed more than seven million data points on more than 35,000 news and information sources, and cataloged and tracked all of the top false narratives spreading online.

NewsGuard’s analysts, powered by multiple AI tools, operate the trust industry’s largest and most accountable dataset on news. These data are deployed to fine-tune and provide guardrails for generative AI models, enable brands to advertise on quality news sites and avoid propaganda or hoax sites, provide media literacy guidance for individuals, and support democratic governments in countering hostile disinformation operations targeting their citizens.

Among other indicators of the scale of its operations is that NewsGuard’s apolitical and transparent criteria have been applied by its analysts to rate news sources accounting for 95 percent of online engagement with news across nine countries.