Popular methods used for disclosing synthetic content only rate as ‘Low’ or ‘Fair’ in Mozilla analysis. Shortcomings persist amid 2024’s record-number of elections

(BERLIN, GERMANY | FEBRUARY 26, 2024) — As AI-generated content of presidents and pop stars spreads across the internet, the most popular forms of detection and disclosure perform “Low” or “Fair” at best, according to a new Mozilla analysis.

The research, In Transparency We Trust? Evaluating the Effectiveness of Watermarking and Labeling AI-Generated Content,” found that strategies and technologies for helping internet users distinguish synthetic content — like watermarking and labeling — encounter significant challenges. Using a Fitness Check, researchers evaluated seven different machine-readable and human-facing methods, none of which received a “Good” appraisal.

The lack of effective guardrails is especially urgent as more than half of the world's population prepares to vote in elections this year, including in the U.S., India, Russia, South Africa, and Mexico. Billions of people will head to the polls amid a proliferation of easily accessible AI chatbots, text and image generators, and voice cloning platforms. These tools and the content they produce are already polluting the integrity of elections in Argentina, Slovakia, India, and beyond.

Mozilla's research offers recommendations for strengthening detection and disclosure. For example, prioritizing machine-readable methods coupled with robust detection mechanisms at the point of distribution. New governance approaches such as human-centered regulatory sandboxes would allow policymakers to test new regulations for their impact.

Says Ramak Molavi Vasse'i, Mozilla Research Lead, AI Transparency, and co-author of the report: “When it comes to identifying synthetic content, we’re at a glass half full, glass half empty moment. Current watermarking and labeling technologies show promise and ingenuity, particularly when used together. Still, they’re not enough to effectively counter the dangers of undisclosed synthetic content — especially amid dozens of elections around the world.”

Molavi Vasse’i adds: “Companies like OpenAI have published election standards to prevent abuse of their products, but voluntary requirements are not enough. And while companies like Meta have proposed industry-wide efforts to label AI-generated content, it’s crucial that open standards are created with the participation of all stakeholders, especially those potentially affected. Further, a coalition of powerful players such as Meta, Google, and Adobe should not become gatekeepers of content integrity.”

Current watermarking and labeling technologies show promise and ingenuity, particularly when used together. Still, they’re not enough to effectively counter the dangers of undisclosed synthetic content — especially amid dozens of elections around the world

Mozilla Research Lead, AI Transparency

______

KEY FINDINGS

Human-facing disclosure methods fall short

Visible labels and audible warnings rely heavily on the perception of the recipient. In addition, they are vulnerable to manipulations and may not prevent or effectively address harm once it has occurred. While these methods aim to inform, they can lead to information overload, increasing public distrust and societal divides.

Overall Fitness: LOW

Machine-readable methods can be effective when combined with robust detection mechanisms

Invisible watermarking embedded during content creation and distribution offers relative security against tampering by malicious actors. Still, their overall effectiveness is compromised without robust and unbiased detection tools.

Overall Fitness: FAIR

Need for holistic approach to governance

Neither human-facing nor machine-readable methods alone provide a comprehensive solution. There must be a multi-faceted approach that combines technological, regulatory, and educational measures to effectively mitigate the harms of undisclosed AI-generated content.

_______

DEFINITIONS

(Machine-readable methods)

Cryptographic Watermarking: Secret information is encoded into the cryptographic functions (or circuits) of content, which can then only be detected, removed, or changed using an encryption/decryption process.

Frequency Component Watermarking: Content is decomposed into different frequency parts, and a watermark is then inserted into low-frequency bands that are less sensitive to attacks and alterations.

Metadata Watermarking: A description of the author’s information, timestamps, editing history, and the software used is embedded in the content.

Statistical Watermarking: Information is inserted into the statistical patterns of the content’s data structure. This usually involves altering pixels, color frames, sound components, or other values in an imperceptible way.

(Human-facing methods)

Audio Labels: Audible items affixed directly on content that discloses its origins.

Disclosures and Descriptions: Written information like bylines, captions, descriptions, or tags are used to inform users about the content.

Visual Labels: Written or graphic items, like nutrition labels or icons, affixed directly on content to disclose its origins.


Press contacts:

Europe: Tracy Kariuki, [email protected]

U.S.: Helena Dea Bala, [email protected]