How To Spot A Deepfake When It’s AI-Generated Audio

Have you ever had one of those moments that makes you go, “I can’t believe President Biden would call people and tell them not to vote,” only to find out that President Biden did not in fact call people and tell them not to vote? Deepfakes are getting convincing and it’s becoming hard to tell what actually happened and what was generated by software. Enter Reality Defender, a service that helps you discern true footage from the fake stuff.

Reality Defender offers a way to detect video deepfakes, but the service can be used to detect audio deepfakes too. Hearing fake audio may not be what comes to mind when you think of the term “deepfake,” but software-generated audio can be problematic in its own way. “The impact of fake video is obvious, but people haven’t thought of what audio deepfakes can do,” says Zee Fryer, a senior data engineer with Reality Defender. “When you start telling people the possibilities — bank fraud, phishing scams, etc. — the dangers become clear quickly.” Think of someone posing as your sibling asking for money.

Zee’s colleague, Scott Steinhardt, Reality Defender’s head of marketing, notes how many video deepfakes are low quality and easy to suss out. “We call these ‘cheap fakes,’” Scott says. “Unless you have access to incredible processing power, many video deepfakes are low quality. Images are better but there’s still that uncanny valley effect. With audio, though, you can clone someone’s voice with such accuracy and efficiency that it calls into question everything you hear.” Great!

How To Spot A Deepfake — Even Audio Ones

We all know to look at hands and teeth and for that general “weirdness” when trying to suss out a deepfake image. What about audio?

Reality Defender detects real from fake using an AI model. The company trained this model by feeding it genuine audio and fabricated audio. When it comes to the fake stuff, “the model listens for anomalies,” says Zee. “When you listen to Amazon’s Alexa speak, for example, the speed and tone don’t vary. So we look for irregularities like that rather than the actual content of what folks are saying.”

What about the real stuff? When it comes to detecting organic audio, Reality Defender trained their AI using Mozilla’s Common Voice. The Common Voice project is an effort to make speech recognition and voice-based AI more inclusive. In addition to its very knowledgeable product director who has thoughts on Apple’s foray into voice AI, a big plus of Common Voice is its openness. Companies like Reality Defender can use the open dataset to train things like its fake voice detector.

Can AI-detection tools be culturally biased? Yes they can.

“Common Voice is great because it contains different dialects, accents, a good balance of pitches and speaking speeds,” says Zee. “And it’s all tagged and labeled. Plus it’s all public domain. We try to take copyright seriously and with other datasets, researchers haven’t always made everything available to use.”

Common Voice’s diversity of data improves deepfake detection. “Many audio models tend to be biased toward recognizing western, American voices,” Zee says. “This can include deepfake detection models as well as anything that relies on listening to audio to spit out transcriptions, carry out voice print identification or even when you use Alexa and Siri. What this means in the real world is that if the model encounters an accent it hasn’t heard before, it’ll take a guess. In the case of a deepfake detector, it may classify a voice as a deepfake if it can’t recognize that what it may be hearing is due to an accent.” Common Voice offers variety, which can be a start in combating the problem.

What does Common Voice think? For one, Common Voice product director Em Lewis-Jong is a fan of the service. “We love what Reality Defender is doing with Common Voice," said Em. “This kind of deepfake detection should be available to all language communities!" In an effort to further inclusion in this space, Em and crew are working on expansions to make Common Voice’s dataset more useful for organizations like Reality Defender. “For example, people read differently than they organically speak in a conversation. So this year we're expanding the roll out of a platform we built called Spontaneous Speech that captures that more organic mode of speech. We're also continuing to work with communities and partners to grow our languages, variants, accents and demographic representation.”

We may be moving toward a future where Reality Defender’s deepfake detection tools become a necessity. If you ask Scott and Zee, it’s not enough for tools like these to be used by everyday people. “Imagine a fabricated image or piece of fake audio goes viral on social media,” says Zee. “It’s good if me and Scott check it and know it’s fake, but it’s even better if social media platforms implement these tools. Then everyone using that platform would immediately know it’s fake.” It’s important for these tools to not only be effective, but near-ubiquitous. “If I have access to tools like these and you don’t, that’s unfair,” says Scott. “The onus shouldn’t be on you to decide what’s real and fake, it should be on the platforms.

Audio Deepfakes Sound Like Trouble — This Company Uses Common Voice To Detect Them

Written By: Xavier Harding

Edited By: Audrey Hingle, Kevin Zawacki, Tracy Kariuki, Xavier Harding

Art By: Shannon Zepeda

Audio Deepfakes Sound Like Trouble — This Company Uses Common Voice To Detect Them

How To Spot A Deepfake — Even Audio Ones

Can AI-detection tools be culturally biased? Yes they can.

Related content