As concerns mount over the impact of deepfakes and other AI-generated or AI-modified content, discourse over how to tackle this challenge is expanding. Tech platforms and AI companies make announcements, coalitions are formed, and regulations are proposed. At Mozilla, where we promote trustworthy AI and call the tech giants to account for their actions, we are working to understand the policy choices that are being made, while tracking their enforcement and watching how they play out in the real world.

A clear-eyed analysis of platform policy is impossible without addressing the elephant in the room: Is it possible to reliably determine whether a piece of content was created or modified by AI? Are those social media posts written by real people or by ChatGPT? Has that video been edited to make a politician do something they didn’t? Is that audio clip generated by AI to impersonate the voice of a famous person? Addressing that elephant, we at Mozilla have kicked off a program of research to understand the space of “AI detectors” and evaluate their effectiveness.

Is it possible to reliably determine whether a piece of content was created or modified by AI? Are those social media posts written by real people or by ChatGPT?

~

A simple framework

There is a zoo of different approaches and techniques out there. Here’s a simple primer.

How synthetic?

As in Mozilla’s recent report “In Transparency We Trust?”, which focuses on evaluating the effectiveness of transparency approaches AI content, “syntheticity is a spectrum” – content may be a raw untouched photo or a human-written composition, it may also include some minimal editing by AI or other tools, or it may be heavily processed, through something like Photoshop or by an AI model, or the content may be the entirely synthetic output of an AI model. It’s also important to note that modern smartphone cameras are often AI-enhanced, such as Apple’s “Portrait mode”, further blurring the line between real and synthetic.

While acknowledging the full spectrum, we’ll tend to just talk about AI-generated content, meaning content that is completely synthetic, or AI-modified content, meaning content that is modified by AI in a way that is potentially misleading or harmful. We’ll refer to both cases in general as “AI content”

What kind of content?

It’s important to remember that there are many different types of content that AI is able to modify and generate. And the risks vary: AI-generated text may allow propaganda to spread more easily on social media while AI-generated audio clips may create false accusations against a public figure.

Generally, we will think about text, images, audio, and video as the modalities that generative AI works with, and in future articles will do deep dives into each of these.

AI Detectors and Provenance Tools

There are different approaches to tackling the risks of AI content. Mozilla’s report “In Transparency We Trust?” focuses on provenance tools: ways of marking a piece of content with information about how it was created and/or modified, whether with a real camera or an AI tool. AI detectors, on the other hand, are designed to determine if a given piece of content was AI-generated or -modified after the fact – the promise is that you put in a piece of content and it tells you whether AI was involved.

An AI detector is an appealing notion – a tool that can examine a text, image, video, or audio clip and decisively determine whether it was created or modified using an AI tool. But such tools are unlikely to be reliable for all applications. A tool might claim to be able to determine whether an essay was written by an AI model with a 99% accuracy in correctly identifying human-written essays, but even this may be unacceptably unreliable if it means falsely accusing one out of every hundred students of submitting an AI-generated essay. And these methods are likely to fall into an arms race between improving detection tools and improving AI models and tools designed to disguise their output.

As Nick Clegg, President of Global Affairs for Meta wrote in a recent blog post:

"This work is especially important as this is likely to become an increasingly adversarial space in the years ahead. People and organizations that actively want to deceive people with AI-generated content will look for ways around safeguards that are put in place to detect it. Across our industry and society more generally, we’ll need to keep looking for ways to stay one step ahead."

The Content Authenticity Initiative is a great example of a provenance tool. The idea is that, for example, when a camera captures an image, it will encode a cryptographic signal into that image signing that it was captured using that camera. Compliant tools that edit the image will add additional cryptographic information about all the edits made and how they were performed. The final image can be verified with a tool that will report the full history of the image’s provenance.

For images generated with these tools, trust can be strengthened, but what about all the images captured with cameras that do not support this standard? For the foreseeable future, we are living in a world in which most content does not conform to these sorts of standards. As such, it’s unreasonable to expect the public to mistrust any image that does not carry a provenance record.

Still, these tools have potential value. A news media company might decide to ensure that provenance is tracked for all photos in their publications, helping to establish trust. But there will always be limitations. There will always be a legitimate need for the ability to anonymously and untraceably capture content, such as a human-rights defender working to develop evidence in a repressive regime. As well, these tools will never be perfectly secure – a determined actor might be able to physically manipulate the hardware in a compliant camera to trick it into signing inauthentic content that it’s fed.

Both types of tools may have their place. Provenance tools may be more reliable, at least to the extent that they are standardized, effectively deployed, and reliably used. AI detectors, if adequately accurate, could be even more important as they will support analysis of content that does not comply with provenance standards.

For more details, the Partnership on AI has an excellent “Glossary for Synthetic Media Transparency Methods” that maps to and builds upon this simple framework.

Context and the human element

It’s important to remember that just because AI was involved, doesn’t mean that a piece of content is necessarily false or misleading. Likewise no AI is needed to create deceptive content – look at the rise of “shallowfakes” in which real photos are shared with misleading context, for example, posting a image of a war zone from many years ago claiming it represents present-day lawlessness in an urban area.

It’s also important to remember that content does not exist in a vacuum. Information held by tech platforms can help establish the trustworthiness of content including the history of the account posting the content.

Any technical solution may raise the bar for the difficulty or cost involved in malicious deployment of AI content. But they will never be a complete solution. The human factor is critical. Educating people to think critically about content they encounter, providing resources to fact checking organizations and forensic content analysis, and providing transparency into AI systems to researchers and civil society are all crucial mechanisms to protect our information ecosystem.

The problem is complex. As seen in one recent report, deepfakes don’t have to deceive to be effective, and even AI content the audience knows to be AI content can make for effective propaganda. A healthy information ecosystem depends on a healthy system of trust. We’ve seen how social media may have eroded our trust in institutions like media, academia, and government. These new challenges may turn out to be a call to action to rebuild this trust.

In an era in which the trustworthiness of content is impossible, trust in the institution that publishes the content may be critical. And the technology to allow new media brands to cryptographically sign their content has existed for many years. It’s simple for the Washington Post, for example, to prove that they published an article. So if people trust the Washington Post, they may not need to worry about whether each photo in the article is real or not.

What to do?

Current approaches are summarized nicely in Microsoft’s recent announcement: “Meeting the moment: combating AI deepfakes in elections through today’s new tech accord” in which the company proposes eight commitments, divided into three broad categories:

  1. Addressing deepfake creation (referring to guardrails in AI systems and provenance tools)
  2. Detecting and responding to deceptive deepfakes (referring to AI detectors)
  3. Transparency and resilience (in which they talk about the human element)

(of course, Microsoft is just one of twenty companies invoiced in the Munich accord, and it’s important to note that their own tools have played a role in recent viral deepfakes and they have ignored concerns about the risks of their products)

In a world in which AI detectors and provenance tools are highly effective, these categories might be seen to be in order from most to least important – if we can prevent the creation of deepfakes, the problem is solved. However, as we will see, there are always going to be gaps in the technical solutions, and so transparency and resilience will be the most valuable tools for protecting our information ecosystem.

Appropriately weighting these approaches will depend on understanding the effectiveness of each option. For that reason, in our coming articles, we will share results from our evaluations of the real-world effectiveness of AI detection tools.