Creating community-driven datasets: Insights from Mozilla Common Voice activities in East Africa

March 7, 2023
AI Fairness, Accountability and Transparency / Internet Health / Movement Building / Voice Technology
CV-GIZ

Overview

Since 2019, the Mozilla Foundation has been working with the GIZ FAIR Forward initiative to promote the creation and use of open voice data and technology in the East African languages Kinyarwanda, Kiswahili, and Luganda. This includes the crowdsourcing of large voice datasets together with local communities using the Mozilla Common Voice platform.

This report summarizes lessons-learnt and strategies that the three voice communities used to create publicly available datasets. It is based on interviews with people who have been driving these efforts, including Mozilla Fellows, community coordinators, contributors as well as supporting donor organizations.

The insights and recommendations are meant to support existing and future voice communities, as well as organizations who support them, with concrete practical insights on technical steps as well as social dynamics at work when creating community-driven datasets.

Collaborators

GIZ FairForward