why sharing metadata helps make speech technology more inclusive

AI training data is not representative of the world. This phenomenon is well documented. Most marginalised communities are under-represented in datasets. From dermatological image datasets that don't represent People of Colour, to speech datasets that don't represent 99% of the world's languages - bias can be embedded in our technology, further entrenching it. One of the reasons training data bias is particularly pernicious is it makes bias even less visible, and thus harder to challenge.

Mozilla Common Voice invites people to share information about chracteristics that might influence the way that they speak, precisely so we can identify bias and design mitigation strategies with communities. We know that - for example - if a dataset does not include any speakers with a particular accent, then technology which uses this dataset is unlikely to perform well for those people. This experience can be alienating, isolating - and in fact, dangerous. As more and more devices become voice operated, the stakes for making sure everyone can be hard get higher and higher.

Nobody is forced to share information about themselves in order to take part in Common Voice - you can submit clips without adding any metadata about your age, sex or accent.

But if you do share some information, then it helps us to 1) identify representation issues, 2) engage with communities on mobilisation strategies to address those issues and 3) help make technology work better for everyone.

Have questions, ideas or concerns? Reach out to us on [email protected]

Why metadata matters