Securing LLMs Against Foreign Influence Operations
NewsGuard recently introduced guardrails that completely detoxed AI models from hostile disinformation campaigns. Overlaying two NewsGuard datasets on a commercial large language model, red-teaming analysts eliminated all false claims seeded by Russian influence operations. In contrast, when our data sets were not overlaid, those same prompts led the top 10 chatbots to yield Russian disinformation in one of five responses.