Common Voice

From smart speakers to smart phones, speech recognition has revolutionized the way we interact with machines. While the technology continues to mature and become more ubiquitous, we’re still seeing significant barriers to innovation for most of the world’s languages. Our goal is to change this status quo, starting with a unique partnership to create open data for the Kinyarwanda language.

Speech recognition data is siloed. Only a handful of companies handle the majority of speech recognition interactions, and they often do so using their own proprietary data, which stifles innovation in the broader ecosystem.

Existing speech recognition services are only available in major languages. Currently, neither Amazon’s Alexa, Apple’s Siri, nor Google Home, the main players in the global voice assistants market, support a single native African language. They also tend to work better for men than women and struggle to understand people with different accents, all of which is a result of biases within the data on which they are trained.

Existing speech recognition services are only available in major languages. They also tend to work better for men than women and struggle to understand people with different accents."

~

Languages are central to our culture. But the steady advance of technology has not benefited all languages equally. There are approximately 7,100 living languages in the world today. Only a fraction of these are currently supported by voice technologies. Speakers of these “under-served languages” are ignored in voice-enabled applications and services, further entrenching lines of inequity caused by technology. This is an important problem that Mozilla and our partners have been actively working together to change. Because truly representative speech recognition is not a convenience factor, but a matter of inclusion and accessibility.

Common Voice launched to help address exactly these biases and subsequent inequalities in technology accessibility. Since 2017 we've made unparalleled progress in terms of language representation: There's no comparable initiative nor any open (CC0) dataset that reaches the diversity of over 50 languages (importantly, also under-resourced and under-served) languages, making it the largest multilingual public domain voice dataset.

Among these is Kinyarwanda, a widely spoken language in Rwanda with over 12 million speakers. Technology adoption is starting to gain momentum in the country. Many services are going digital, notably government services, which in fact are now only accessible online. While generally being a positive development, this poses not only significant challenges to the people depending on these services, but also to solution providers who face only a very slow increase in platform adoption due to a lack of digital literacy.

Fortunately, the local tech ecosystem is starting to organize themselves, supported by governmental and non-governmental initiatives, funding and mentorship. Here’s how:

Background

Last year, Mozilla and Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ) co-hosted an ideation hackathon in Kigali to create a data corpus for Kinyarwanda and to lay the foundation for local voice-recognition applications. The hackathon gave way to Digital Umuganda, a Rwandan startup that proposed a solution using a mechanism that is rooted in the Rwandan culture, ‘Umuganda’. It is a concept of self-help and cooperation in Rwanda. Every last Saturday of the month people gather around in their communities, bring efforts together to build physical infrastructures such as roads, schools, and more. Digital Umuganda brought this concept to the digital edge to help in building digital infrastructure such as voice data.

“Technology literacy”, limiting the ability to mobilize contributors

Boris Mugisha, Community Manager for Digital Umuganda, faced enormous challenges while organizing contributors. “It was such a huge challenge to make contributors understand why they should donate their voice,” he says. “People in their early 20s were likely to recognize the benefit of a common voice dataset. Some of the issues were managing the gender ratio among Commoneers, more contributors were men”.

“Commoneers” are volunteers who help Digital Umuganda in mobilizing contributors and managing offline events. Communities were formed in schools, and sometimes schools will be closed leading to not gathering people at events.

Lack of resources slowing voice data creation

Despite the progress made by Boris in building the Commoneers community, enormous challenges remained. Access to the internet is by far the most prevalent. “In some cases, contributors will not have enough money to get access to the internet,” Boris says. “You will also find volunteers willing to contribute but not having handheld devices... Internet is still expensive, a gigabyte costs a dollar. Considering that the majority of contributors are university students and as gatherings are not permitted due to the coronavirus pandemic. Events couldn’t take place at physical locations which were filled with public wifi. This led to poor voice donation since I couldn’t train new contributors and check the quality of voice data being uploaded.”

Yvette Umubyeyi, who also contributed to and mobilized the Kinyarwanda voice data creation, faced the same challenges. She started to contribute to the Common Voice platform during the COVID-19 pandemic in April. “It was very hard for me to contribute since it was in the lockdown, I didn’t have enough notion on how to contribute,” says Yvette.

The startup also faced the challenge of getting publicly available sentences, which sometimes will imply a cost. Media partners like IGIHE and the Rwanda Broadcast Agency (RBA) who were familiar with the Umuganda concept quickly understood the importance of building an open Kinyarwanda voice dataset.

Commoneers active participation

Despite challenges faced while building the community and creating the Kinyarwanda voice dataset, the startup managed to collect 1,211 hours of Kinyarwanda voice data and arranged a diverse set of over 420 contributors. “This was possible by finding a venue where they could provide free internet and if contributors live in the same neighborhood, I send them internet packages so that they continue donating their voice”, says Boris.
Boris also mentioned that it was easier to encourage people to contribute during the lockdown since people had extra time. The company changed its main strategy to “building community by hosting offline events” as gatherings are not permitted. Digital Umuganda continued to use its core contributors as commoneers to keep up the momentum by adding more contributions and as well as mobilizing other contributors. Boris thinks that after COVID, a decrease in contribution might occur since most of the contributors will go back to their normal lives. He is planning to set up a hybrid model, on-site and off-site recording, by leveraging experience in organizing and managing events as well as involving commoneers to mobilize online events contribution. This will speed up contributions and also accommodate for any eventual unforeseen circumstances they have faced this year.

Next steps

Digital Umuganda has been recently selected among nine other winners of the Smart Development Hack, a European Commission-led hackathon designed to produce digital solutions to tackle the effects of coronavirus pandemic in developing countries around the world. Digital Umuganda will receive technical, methodological support as well as an amount of over one million euros for project implementation in Rwanda.

“Mbaza”, the AI-backed COVID-19 chatbot with speech-to-text and text-to-speech functionality, is a project in collaboration with partners such as the Rwanda Information Society Authority (RISA), Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ), Translators without Borders (TWB), and Mozilla. The project will provide critical COVID-19 information in the Kinyarwanda language.

The team behind Digital Umuganda wants to extend access to information and services through voice technology to more languages and countries. This requires collaboration with different stakeholders. They are not only building the dataset but also working on products and services that are locally relevant and that relate to their mission, “a mission to democratize access to information and service”. The company strongly believes in the role played by the community in reaching its mission. “Without a community, there is no data. We are building this community dedicated to digital infrastructure, raising awareness around Common Voice, artificial intelligence, and the benefits of voice technology for currently underserved languages”, says Audace Niyonkuru, Chief Executive Officer of Digital Umuganda.

Kinyarwanda is one of the fastest growing languages on the Common Voice platform: 1,000+ hours in the past few months. This increase is thanks to a partnership among Mozilla, the German Ministry for Economic Cooperation and Development and a young AI startup in Rwanda called “Digital Umuganda”, who built a local network of contributors and supporters around the initiative.

The language has recently benefited from a speech recognition model which will open up unlimited opportunities to the local tech ecosystem and entrepreneurs.

“Mbaza”, which will be the first project to benefit from this technology, will provide COVID-19 informations to the local community in Kinyarwanda.

It’s just the beginning of a journey that will shift the terms of digital access and inclusion in Rwanda and beyond.