A profile picture of Büllent Özden

This is a profile of Bülent Özden one of the four winners of Mozilla's Common Voice, ‘Our Voices’ competition

Bülent Özden is a computer engineer and museologist. He started working with Voice AI in 2020 and immediately recognized the bias problem and the importance of dataset quality. He was thinking of ways machine learning datasets could be filtered to detect bias — A focus area that would steer his decades-long hiatus from designing museum exhibitions back to designing AI models for voice driven exhibits.

Özden is now one of the four Mozilla Common Voice ‘Our Voices’ competition winners receiving the cash prize award of $2,000 USD which asked technologists worldwide to propose innovative solutions for making voice technology less biased and more inclusive.

‘It’s all in the dataset!’, ‘The crucial part of designing any AI model is examining how healthy data is. If you just take a dataset and use it without a bias check, you most probably get a biased model’, says Özden.

His winning project is a toolbox model that helps detect bias in datasets. Embedded within the Mozilla Common Voice platform, the feature analyses and provides insights about the data, informing developers of potential biases. Currently, the feature automates key data highlights such as gender, age distribution, number of sentences, and the duration of each voice contribution and visualizes these statistics on a dashboard.

While analyzing the Turkish dataset, Özden realized that there are two ways of navigating a gender bias check. He could cut the data to equal contributions from both male and female voices and work with a smaller dataset — hence reducing accuracy— or improve the dataset's health to ensure that there are more female voice contributors and of good quality.

With healthier datasets, Özden hopes that technologists using Common Voice and other volunteers like himself working on enhancing their language database can quickly identify and mitigate biases. His work will be crucial in the upcoming dataset releases.

Contenido relacionado