Securing LLMs Against Foreign Influence Operations

NewsGuard recently introduced guardrails that completely detoxed AI models from hostile disinformation campaigns. Overlaying two NewsGuard datasets on a commercial large language model, red-teaming analysts eliminated all false claims seeded by Russian influence operations. In contrast, when our data sets were not overlaid, those same prompts led the top 10 chatbots to yield Russian disinformation in one of five responses.

 

Download the Report

To download this NewsGuard report, please fill out your details below and you will be redirected to it. If you'd like to learn more about working with NewsGuard, email [email protected].

  • This field is for validation purposes and should be left unchanged.
  • By submitting this form, you agree to receive email communications from NewsGuard.