Common Voice

Common Voice is the most diverse open voice dataset in the world. Most voice datasets are owned by companies, which stifles innovation. They also under-represent almost every language in the world, as well as people of colour, disabled people, women and LGBTQIA+ people. We want to change that by mobilising people everywhere to share their voice.

Programmatic work

What are Mozilla Common Voice fellows?

Mozilla Fellows are activists, open-source researchers, engineers, and technology policy experts who work on the front lines of that movement. Fellows develop new thinking on how to address emerging threats and challenges facing a healthy internet. Four fellows from Rwanda, Kenya and Tanzania have been selected as leaders in the open voice technology space that are embedded in our focus communities. These fellows have led the growth of the platform, connected the project to diverse stakeholders and championed the value of having more investment in African Languages.

Meet the 2021/2022 Kiswahili fellows

Britone Mwasaru

Britone Mwasaru will be working on voice technology with a focus on the Kiswahili language in order to address the exclusion of those whose first or preferred language is Kiswahili. Before joining Mozilla, Britone was Director of Technology at Swahilipot Hub where he led use and adoption of technology in the technology and arts community in Mombasa and Coast region of Kenya.

Kathleen Siminyu

Kathleen Siminyu is an AI Researcher who has focused on Natural Language Processing for African Languages. She will be joining Mozilla Foundation as a Machine Learning Fellow to support the development of a Kiswahili Common Voice dataset and to build speech transcription models for end use cases in the agricultural and financial domains. In her NLP research, Kathleen has previously worked on speech transcription for Luhya languages and contributed to machine translation for Kenyan languages as part of Masakhane. Before joining Mozilla, Kathleen was Regional Coordinator of AI4D Africa, where she worked with ML and AI communities in Africa to run various programs.

Rebecca

Rebecca will be working on establishing and supporting diverse Kiswahili language and tech communities along axes of gender, age, regional origin, accent and vernacular usage towards building an open voice dataset in Kiswahili. She will work to ensure that the dataset accurately represents the Kiswahili population with the goal of encouraging adoption and implementation of voice technology. Before joining Mozilla, Rebecca has been an Internet Society fellow, an Afrisig fellow, a Google Policy fellow, a national geographic explorer and a digital rights program officer at Paradigm Initiative.

Read more about Mozilla Fellowships here


Mozilla Common Voice Kiswahili use case Awards

Common Voice awarded eight projects each upto USD $50,000, leveraging the Kiswahili language and voice technology to increase social and economic opportunities for marginalized groups in Kenya, Tanzania, and the Kiswahili-speaking Democratic Republic of Congo.

These grants are supported by the Gates Foundation in collaboration with the Foreign Commonwealth and Development Office (FCDO) and GIZ, as a response to a gender conscious and community centred approach to tech development.

Read the grant announcement here

Meet the awardees:

ChamaChat
ChamaChat by Ujuzi Craft LTD | Kenya

A Chama management system with a chatbot that interacts with members and gives voice replies in Kiswahili via SMS and Whatsapp. It connects to the group Payment API, ie M-Pesa API. Members can interact with the Chama admin bot on a variety of functions, including instance check balance, loan requests and receiving transaction statements.

Kiazi Bora
Kiazi Bora by Sustain Earth's Environment Africa | Tanzania

Kiazi Bora, “Quality Potatoes’’ in Swahili, uses a voice enabled application that informs vulnerable women living in rural areas and marginalized communities of Tanzania on the nutritional values of Orange Fleshed Sweet Potatoes (OFSP), farming skills for better yields, and detailed market availability for raw or processed OFSP food products, all through a voice data set app.

Learn more about Kiazi Bora.

Wezesha na Kabambe
Wezesha na Kabambe by University of Westminster, U.K | Moi University, Kenya | Technical University of Kenya | Western Michigan University, USA.

A mobile enabled Swahili audio chatbot not reliant on internet connectivity. It is developed in collaboration with rural small-holder women farmers in Kenya as an alternative source for agricultural information. Using the Mozilla Swahili data sets, the mobile-enabled chatbot can be used on both feature phones (kabambes) and smartphones by rural smallholder farmers. The interactive Swahili chatbot is powered by a database of frequently asked questions from smallholder women farmers, a marginalized and digitally excluded group. It is inspired by existing familiarity, adoption, and acceptance of mobile technologies in rural areas in Kenya.

Learn more about Wezesha na Kabambe.

LivHealth Kiswahili Corpus
LivHealth Kiswahili Corpus by Badili Innovations | Kenya

LivHealth Kiswahili Corpus aims to empower local communities to correctly identify livestock syndromes and get timely interventions from qualified livestock practitioners. Using Natural Language Processing (NLP), Machine Learning (ML), and Artificial Intelligence (AI), the project will build Kiswahili text-to-speech models for disseminating disease information to marginalized communities. Working closely with their partner, One Health Center in Africa (OHRECA) based at ILRI, they will enhance the functionality of the LivHealth system to enable local communities easy access to disease information on demand and in Kiswahili.

Imarika
Imarika by Strathmore University | Kenya

Imarika is a conversational chatbot offering digital climate advisory services in English and Swahili that will support smallholder farmers to adapt to changing weather patterns. The project aims to address the vulnerability of farmers to weather unpredictability due to the lack of accessible, reliable, and localized weather forecasts. Access to weather information is highly variable across sub-Saharan Africa and is usually limited to low-accuracy national or regional forecasts broadcast on radio and/or TV. The project specifically hopes to serve smallholder farmers who often have limited access to localized climate advisory services due to barriers such as slow technology penetration or digital illiteracy.

Learn more about Imarika.

Paza Sauti
Paza Sauti by Tech Innovators Network Ltd | Kenya

The project is developing a chatbot and an interactive voice response service that will provide voice-enabled services in the domain of business registration and raise awareness about the use of collateral (security) to access credit in Kenya. The main objective is to increase financial literacy around moveable properties as collateral, particularly for women in business, and in particular agriculture, for purposes of accessing credit. Although there has been an increase in the ease of getting credit, most members of the population are still unaware of their capability to access further credit as a result of using moveable properties as collateral. This project will be a continuation of an already ongoing collaboration with the Business Registration Service - BRS (State Corporation) in Kenya in the domain of financial inclusion, which serves the Kenya public.

Kiswahili Text and Voice Recognition Platform (KTVRP)
Kiswahili Text and Voice Recognition Platform (KTVRP) for Agricultural Advisory and Financial Services for Smallholder Farmers by Duniacom Group, LLC| Tanzania / United States

A majority of smallholder farmers in Tanzania are only able to communicate through the Kiswahili spoken language and its dialects. A text and voice-based platform made available in the language of the underserved (i.e., Kiswahili) would be key to wide access, adoption, and usage of digital agricultural advisory and financial services in Tanzania. The objective is to develop a text and voice recognition platform that will offer smallholder farmers in the Tanzanian Maize Value Chain personalized digital financial and non-financial automated services based on location, agro-ecological zones, and crop cycle. Based on gender-disaggregated data from the pilot phase, it is anticipated that the majority of participants will be women.

Learn more about KTVRP.

Haki des femmes
Haki des femmes by Core23Lab | Democratic Republic of Congo

Haki will leverage voice technology to provide access to legal information and support for women in Katanga and Lualaba provinces of the Democratic Republic of Congo to ensure they have the right to access, use, inherit, control, and own land. Majority of women in DRC often lose their access to land after the passing of a loved one or husband due to lack of knowledge of land rights. This solution will help women to access information and legal support in securing their land rights in Kiswahili.

Learn more about Haki des femmes.

Programmatic work

What are Mozilla Common Voice fellows?

Mozilla Fellows are activists, open-source researchers, engineers, and technology policy experts who work on the front lines of that movement. Fellows develop new thinking on how to address emerging threats and challenges facing a healthy internet. Four fellows from Rwanda, Kenya and Tanzania have been selected as leaders in the open voice technology space that are embedded in our focus communities. These fellows have led the growth of the platform, connected the project to diverse stakeholders and championed the value of having more investment in African Languages.

Meet the 2021/2022 Kiswahili fellows

Britone Mwasaru

Britone Mwasaru will be working on voice technology with a focus on the Kiswahili language in order to address the exclusion of those whose first or preferred language is Kiswahili. Before joining Mozilla, Britone was Director of Technology at Swahilipot Hub where he led use and adoption of technology in the technology and arts community in Mombasa and Coast region of Kenya.

Kathleen Siminyu

Kathleen Siminyu is an AI Researcher who has focused on Natural Language Processing for African Languages. She will be joining Mozilla Foundation as a Machine Learning Fellow to support the development of a Kiswahili Common Voice dataset and to build speech transcription models for end use cases in the agricultural and financial domains. In her NLP research, Kathleen has previously worked on speech transcription for Luhya languages and contributed to machine translation for Kenyan languages as part of Masakhane. Before joining Mozilla, Kathleen was Regional Coordinator of AI4D Africa, where she worked with ML and AI communities in Africa to run various programs.

Rebecca

Rebecca will be working on establishing and supporting diverse Kiswahili language and tech communities along axes of gender, age, regional origin, accent and vernacular usage towards building an open voice dataset in Kiswahili. She will work to ensure that the dataset accurately represents the Kiswahili population with the goal of encouraging adoption and implementation of voice technology. Before joining Mozilla, Rebecca has been an Internet Society fellow, an Afrisig fellow, a Google Policy fellow, a national geographic explorer and a digital rights program officer at Paradigm Initiative.

Read more about Mozilla Fellowships here


Mozilla Common Voice Kiswahili use case Awards

Common Voice awarded eight projects each upto USD $50,000, leveraging the Kiswahili language and voice technology to increase social and economic opportunities for marginalized groups in Kenya, Tanzania, and the Kiswahili-speaking Democratic Republic of Congo.

These grants are supported by the Gates Foundation in collaboration with the Foreign Commonwealth and Development Office (FCDO) and GIZ, as a response to a gender conscious and community centred approach to tech development.

Read the grant announcement here

Meet the awardees:

ChamaChat
ChamaChat by Ujuzi Craft LTD | Kenya

A Chama management system with a chatbot that interacts with members and gives voice replies in Kiswahili via SMS and Whatsapp. It connects to the group Payment API, ie M-Pesa API. Members can interact with the Chama admin bot on a variety of functions, including instance check balance, loan requests and receiving transaction statements.

Kiazi Bora
Kiazi Bora by Sustain Earth's Environment Africa | Tanzania

Kiazi Bora, “Quality Potatoes’’ in Swahili, uses a voice enabled application that informs vulnerable women living in rural areas and marginalized communities of Tanzania on the nutritional values of Orange Fleshed Sweet Potatoes (OFSP), farming skills for better yields, and detailed market availability for raw or processed OFSP food products, all through a voice data set app.

Learn more about Kiazi Bora.

Wezesha na Kabambe
Wezesha na Kabambe by University of Westminster, U.K | Moi University, Kenya | Technical University of Kenya | Western Michigan University, USA.

A mobile enabled Swahili audio chatbot not reliant on internet connectivity. It is developed in collaboration with rural small-holder women farmers in Kenya as an alternative source for agricultural information. Using the Mozilla Swahili data sets, the mobile-enabled chatbot can be used on both feature phones (kabambes) and smartphones by rural smallholder farmers. The interactive Swahili chatbot is powered by a database of frequently asked questions from smallholder women farmers, a marginalized and digitally excluded group. It is inspired by existing familiarity, adoption, and acceptance of mobile technologies in rural areas in Kenya.

Learn more about Wezesha na Kabambe.

LivHealth Kiswahili Corpus
LivHealth Kiswahili Corpus by Badili Innovations | Kenya

LivHealth Kiswahili Corpus aims to empower local communities to correctly identify livestock syndromes and get timely interventions from qualified livestock practitioners. Using Natural Language Processing (NLP), Machine Learning (ML), and Artificial Intelligence (AI), the project will build Kiswahili text-to-speech models for disseminating disease information to marginalized communities. Working closely with their partner, One Health Center in Africa (OHRECA) based at ILRI, they will enhance the functionality of the LivHealth system to enable local communities easy access to disease information on demand and in Kiswahili.

Imarika
Imarika by Strathmore University | Kenya

Imarika is a conversational chatbot offering digital climate advisory services in English and Swahili that will support smallholder farmers to adapt to changing weather patterns. The project aims to address the vulnerability of farmers to weather unpredictability due to the lack of accessible, reliable, and localized weather forecasts. Access to weather information is highly variable across sub-Saharan Africa and is usually limited to low-accuracy national or regional forecasts broadcast on radio and/or TV. The project specifically hopes to serve smallholder farmers who often have limited access to localized climate advisory services due to barriers such as slow technology penetration or digital illiteracy.

Learn more about Imarika.

Paza Sauti
Paza Sauti by Tech Innovators Network Ltd | Kenya

The project is developing a chatbot and an interactive voice response service that will provide voice-enabled services in the domain of business registration and raise awareness about the use of collateral (security) to access credit in Kenya. The main objective is to increase financial literacy around moveable properties as collateral, particularly for women in business, and in particular agriculture, for purposes of accessing credit. Although there has been an increase in the ease of getting credit, most members of the population are still unaware of their capability to access further credit as a result of using moveable properties as collateral. This project will be a continuation of an already ongoing collaboration with the Business Registration Service - BRS (State Corporation) in Kenya in the domain of financial inclusion, which serves the Kenya public.

Kiswahili Text and Voice Recognition Platform (KTVRP)
Kiswahili Text and Voice Recognition Platform (KTVRP) for Agricultural Advisory and Financial Services for Smallholder Farmers by Duniacom Group, LLC| Tanzania / United States

A majority of smallholder farmers in Tanzania are only able to communicate through the Kiswahili spoken language and its dialects. A text and voice-based platform made available in the language of the underserved (i.e., Kiswahili) would be key to wide access, adoption, and usage of digital agricultural advisory and financial services in Tanzania. The objective is to develop a text and voice recognition platform that will offer smallholder farmers in the Tanzanian Maize Value Chain personalized digital financial and non-financial automated services based on location, agro-ecological zones, and crop cycle. Based on gender-disaggregated data from the pilot phase, it is anticipated that the majority of participants will be women.

Learn more about KTVRP.

Haki des femmes
Haki des femmes by Core23Lab | Democratic Republic of Congo

Haki will leverage voice technology to provide access to legal information and support for women in Katanga and Lualaba provinces of the Democratic Republic of Congo to ensure they have the right to access, use, inherit, control, and own land. Majority of women in DRC often lose their access to land after the passing of a loved one or husband due to lack of knowledge of land rights. This solution will help women to access information and legal support in securing their land rights in Kiswahili.

Learn more about Haki des femmes.