Speech recognition products have a language problem.
Although several voice-activated chatbots and assistants fluently speak English and other European languages, there are fewer alternatives for African languages and other low-resourced languages.
Mozilla’s Common Voice Kiswahili team has been organizing a four-part hackathon series to increase the number of voice technologists across East Africa by equipping technical communities with the skills to train voice technology.
For voice technology to make a significant step from novelty to everyday utility, it needs to be inclusive. Inclusivity can take various forms: from broadening the voice dataset to feature varied dialects, or equal gender contributions, to increasing the number of local community developers who are actively building voice technology.
In partnership with Africa’s Talking, Zindi and Swahili pot hub, the hackathons attracted at least 20 participants. Each series consists of two training workshops, culminating with an ideation hackathon.
In the first session, participants attended an induction workshop introducing them to artificial intelligence and machine learning, steadily narrowing their focus to how Speech-to-Text (STT) systems work. Attendees used Kiswahili datasets to experiment and refine their skills in training STT language models. Thereafter, the hackathons took off in various locations, with the final part of the series concluding with pitching ideas to a panel of judges for a chance to win USD 1,000 for first place, USD 500, and USD 200 for second and third place respectively.
The first round of hackathons were held in Kenya and Tanzania at Swahili Pot Hub and Africa’s Talking, and there are a few more in progress. In Kenya, the second round of hackathons will be ending on December 5th, and 6th, while the project pitching and awarding will be taking place simultaneously at Swahilipot hub Mombasa and at Africa’s Talking in Nairobi on December 9. In Tanzania, the ideation pitching took place on December 2, at Africa’s Talking in Arusha.
“Capacity building is crucial in advancing inclusive voice technologies. These series of hackathons are a stepping stone in shifting the approach on how we build technologies with and for communities,” says Kathleen Siminyu, Mozilla’s Common Voice Machine Learning Fellow.
These series of activities are aimed at increasing the impact of Mozilla’s Common Voice Kiswahili dataset by fostering a developer community with the necessary skills to use these datasets for developing solutions. Once there are existing solutions for speech recognition in Kiswahili, like readily available APIs, hackathon participants are then challenged to use this voice capability by implementing them in end-user applications that solve various societal problems.
Lessons learned in the first iteration highlight the need for ecosystem intervention in how local voice solutions can disrupt tech. The team has learned the need for ways to build an understanding of applications, as well as invest in infrastructure with free credits for platforms like Colab and RunPod. These platforms could help support model training and inference requirements for challenges involving larger datasets and models.