Profile feature of WEDO, one of the winning teams from Mozilla’s Common Voice, ‘Our Voice’ competition. In conversation with the team representative: Dr. Nattiya Kanhabua.
It’s not unlikely for smart home assistants such as Alexa, Siri, Cortana, and Google to fail at recognizing particular accents or even dialects. A prompt such as; “Alexa, will it rain today,” can easily elicit an unclear or misguided response such as; “There are four restaurants in your area today!”
To change the course of bias and its manifestation, is to diversify the dataset on which these models are trained — This is what inspired a local tech firm in Thailand called WEDO. Their mission is to create innovations by Thai people that have a positive impact on people, communities, and the environment. They are building voice-activated smart home appliances such as speakers and faucets in Thai while ensuring that the dataset is gender inclusive. They are also building voice-enabled devices with cameras and navigation features for the visually impaired and the elderly by utilizing Mozilla’s Common Voice data corpus.
‘We want to be the owners of our language and take a front seat in designing products that work for everyone,’ says Dr. Nattiya Kanhabua, WEDO’s project lead. But again, ‘garbage in - garbage out. Good quality and non-biased outputs require quality data. That’s why we built an automatic speech recognition (ASR) model for Thai with a special focus on gender inclusivity.’ she adds.
Their winning project analyzed the Common Voice 11th dataset release, and despite the dataset being male dominated, with the female dataset being about 0.6 times smaller; their model experiment showed that the identified female dataset was more recognizable and performed slightly better than the male dataset. Read about Mozilla's Our Voice Competition winners.
They proposed a gender classification model that can infer gender classification with an F1-score of 0.95 while analyzing and validating model performance bias between the male and female datasets. Read the full scope of their proposal here.
Their team is forging ahead by using open source data to train voice recognition models for all genders, accents, and dialects, which will be used in designing health tracking wearables, AI visual aid for the blind, and AI audiobooks for the elderly, the blind, and everyone.
‘Soon, we want to develop more open use case studies which illustrate how to leverage open source data to build fair and inclusive tech products, which enable people of all walks of life to interact with technology,’ she concludes.