FairEVA’s research reveals that dangerously few datasets are used in voice recognition training

(SAN FRANCISCO, USA | MONDAY, MAY 22) Your voice is your personal information and a powerful tool that is increasingly being used for access. For example, your voice can be used to log in to your bank account in Canada, as a password in Kenya, or to access pension funds by using it to prove you’re still alive in Mexico.

These examples are shared by FairEVA, a Mozilla Technology Fund (MTF) awardee, in their Voice Biometrics 101 video ”Rethinking Voice in the Age of AI.” An open-source project, FairEVA seeks to “help developers, researchers, and consumers to anticipate harms and build more inclusive voice systems.” MTF seeks funds open-source software that addresses internet health issues in alignment with Mozilla’s mission.

Says Wiebke Hutiri, Team Lead at FairEVA: “If you phone a call center they say ‘this call is being recorded for training and quality purposes’. What is being trained? And is it identifying your voice? Is it identifying you? Does it then associate, for example, your emotional state and personal information with your voice? This is why we created a video to explain to people what voice recognition is and where it's used, and why bias might be a problem.”

This video is just one component of the work FairEVA is launching. They have also produced a dataset audit for speaker recognition training and evaluation datasets, and a Python library called bt4vt for researchers and developers to recognize bias in automatic speech processing models.

The bt4vt library, which currently supports bias tests for speaker verification, provides evaluation measures and visualizations to interrogate model performance across demographic groups. The dataset audit studied over 700 scientific papers from the last 10 years and found that the rapid adoption of deep learning has favored training and evaluation on one dataset, VoxCeleb.

The FairEVA team has conducted a follow-up study on this dataset and found it to fall short on representing people with different voices equitably. This means there is a limited number of voices being used to evaluate speaker recognition in research. This shortage of diverse evaluation datasets in research finds its way back into technology products, which can be biased.

Hutiri says, unlike in the face recognition space where there is a lot of advocacy and research on bias and fairness, voice recognition is under-researched in this particular aspect. There is also confusion with automatic speech recognition, like when Zoom turns your voice into captions. FairEVA seeks to bridge this gap and break new ground by creating resources for civil activists and by building software to evaluate bias in voice recognition. Hutiri adds that bias is currently difficult to test for, which is why they need more research in this field.

FairEVA’s work can be accessed at https://www.faireva.org/projects.

The Mozilla Technology Fund (MTF) supports open-source technologists whose work furthers promising approaches to solving pressing internet health issues. The 2023 MTF cohort will focus on an emerging, under-resourced area of tech with a real opportunity for impact: auditing tools for AI systems.

Press contact: Shandukani O. Mulaudzi, [email protected]