If you follow Mozilla on TikTok, you probably saw that iOS 17 on iPhone now lets you make a clone of your voice. The feature is equal parts creepy and cool — creepy because a copy of your voice can now live on your phone. But also cool, because those who have lost the ability to speak can use the feature to communicate out loud using their phone’s apps.
We here at Mozilla Foundation know a thing or two about voice and AI. Mozilla’s Common Voice project seeks to improve speech recognition to be more multilingual and inclusive. Everyday people like you can donate audio samples of your voice here and then those creating AI systems can use the set of data to make sure their voice products represent and understand voices heard across the world.
It’s no surprise that the folks over at Common Voice have been experimenting with Apple’s new voice feature and they’ve got thoughts. Em Lewis-Jong is Common Voice’s product director — here's what has her both hopeful and worried about iOS 17’s Personal Voice feature.
It’s easy to forget how impressive it is that many folks walk around with a supercomputer in their pocket. Similarly, it’s easy to take for granted just how impressive a feature like Personal Voice in iOS 17 actually is. “Even five years ago, doing this on-device, and getting this kind of quality, wouldn’t have been possible,” says Em. “It’s amazing that Apple’s solution only requires 150 utterances — it’s a step ahead of anything else consumer-facing that I’ve seen.”
Perhaps more impressive than the tech are its real world uses. Apple’s Personal Voice feature’s accessibility benefits can’t be understated. It significantly improves on what was previously available. Em points to an example she saw via Common Voice: “When I first joined Common Voice, it was really exciting to hear some of the different text-to-speech applications,” says Em. “One of them was a professor who was going to lose her voice due to a medical illness but really wanted to continue lecturing in her own voice. Use cases like that are why I think tools like these are really exciting.”
This feature is English-only. Here’s why that’s a problem
In many ways, English is the language of the internet — 64% of websites use English as their primary language. Similarly, tech companies often release their products in English first. Case in point: Apple’s Personal Voice feature. Designed in California, Personal Voice in iOS 17 is only available in Apple’s first language.
With over 100 languages, Common Voice is angling to reduce the internet’s reliance on English. According to Em, data about English speech was easy to find, data about English voices with non-dominant accents, less so and languages out of low resource communities even less so.
“It’s really a vicious cycle,” says Em. “The internet is basically in a few languages so the next generation communicate mostly online using second or third languages — English, Spanish, French — while the language of their grandparents becomes increasingly forgotten. It’s normal and natural for Apple to roll out in English first, but it reinforces a dynamic we often see where anglocentrism of technology has real consequences for internet users whose first language isn’t a dominant one.”
Is Apple using Personal Voice to affect how it builds its products? Unclear.
Apple is upfront about this feature and the privacy users have. Apple’s explainer on Personal Voice notes that training the AI happens locally on your device. The company is also forward about your voice print touching the cloud if you have “share across devices” enabled. That said, what’s Apple doing with all this voice data it’s collecting to train your voice in the first place?
“Your personal voice is protected locally and on the cloud, but, from what I’ve seen, Apple hasn’t said anything about the data you handed over to train the model in the first place,” says Em. “We know the synthesized voice is stored locally or is end-to-end encrypted if you share between Apple devices. What about the voice data used to train the synthesized voice? Where does it go? Is it continually used by Apple at all? Apple does explicitly say that it can use your speech clips to improve its products and services for things like Siri, so it would not be a wild assumption to think that that’s what they might be doing here.”
For those who know where to find the feature, Apple’s Personal Voice offers a user-friendly introduction into the world of AI-powered vocals. But it’s just a start — especially considering it’s only available in one language.
English will probably be the internet’s default language for a while. Ask Common Voice and they’d say part of the fix here is data donation. “More diverse training data is part of the solution for this,” says Em. “Communities need to get together and mobilize to create datasets. Waiting for companies to come along and fix it and take an interest in their language community isn’t the way to go — because if companies don’t see serious commercial viability, they often won’t go there. So it really makes sense for communities to try and solve this problem themselves and say, ‘okay, we want speech recognition to work for us, we’re going to collect that data for ourselves and for our communities and for folks that speak our language.’”