Common Voice

Common Voice is the most diverse open voice dataset in the world. Most voice datasets are owned by companies, which stifles innovation. They also under-represent almost every language in the world, as well as people of colour, disabled people, women and LGBTQIA+ people. We want to change that by mobilising people everywhere to share their voice.

Upcoming Events

Join the Common Voice Community and Team!

Common Voice

Common Voice Festival First Edition Mombasa

24-25.02.2023

Join us this February 24th to 25th at Swahilipot Hub, in Mombasa, for a two-day Common Voice event. Common Voice is a multi-language, publicly available voice dataset, powered by the voices of volunteer contributors around the world. People who want to build voice applications can use the dataset to train machine learning models.

Day 1 - Awareness and engagement day focused on Data set growth - Voice validation and contributions 9:00AM - 4:00PM

On the first day we will learn about Common Voice and how to contribute to the project. People will be able to contribute their voices & validate already contributed voices. There will be prizes for the top contributors. Register here.

Day 2 - Model Competition open to all across Kenya

Challenge will kick off on Wednesday 20th February and run until Friday 24th February at 3pm EAT.

Register here here.

Join us on Day 2 in person and virtually as we finalise the Model Competition and announce the winners. There is a total cash prize of KES 100,000 for the competition.

Participants should bring their technical capabilities to help train the Kiswahili voice dataset.

This coding challenge will shows a typical workflow for training and testing a Speech to Text model on Kiswahili data from Mozilla Common Voice. Challenge instructions:

  1. Download 10k instances of Mozilla Common Voice data for Kiswahili (pre-formatted for this particular challenge)
  2. Configure the training and testing runs
  3. Train a new model
  4. Test the model and display its performance

We will use a Speech to Text framework known as coqui.ai. You can check out the Coqui STT documentation here.

Submitted solutions will be ranked by the Mozilla Common Voice team and the awards announced on the 25th of February at the event!

We welcome everyone across Kenya to participate in the competition. A Zoom link for the session where participants will get information on how to join the session, how to participate and make submissions once registration is complete here

We look forward to seeing the community come together to build diverse voice dataset for the Kiswahili language.

Upcoming Events

Join the Common Voice Community and Team!

Common Voice

Common Voice Festival First Edition Mombasa

24-25.02.2023

Join us this February 24th to 25th at Swahilipot Hub, in Mombasa, for a two-day Common Voice event. Common Voice is a multi-language, publicly available voice dataset, powered by the voices of volunteer contributors around the world. People who want to build voice applications can use the dataset to train machine learning models.

Day 1 - Awareness and engagement day focused on Data set growth - Voice validation and contributions 9:00AM - 4:00PM

On the first day we will learn about Common Voice and how to contribute to the project. People will be able to contribute their voices & validate already contributed voices. There will be prizes for the top contributors. Register here.

Day 2 - Model Competition open to all across Kenya

Challenge will kick off on Wednesday 20th February and run until Friday 24th February at 3pm EAT.

Register here here.

Join us on Day 2 in person and virtually as we finalise the Model Competition and announce the winners. There is a total cash prize of KES 100,000 for the competition.

Participants should bring their technical capabilities to help train the Kiswahili voice dataset.

This coding challenge will shows a typical workflow for training and testing a Speech to Text model on Kiswahili data from Mozilla Common Voice. Challenge instructions:

  1. Download 10k instances of Mozilla Common Voice data for Kiswahili (pre-formatted for this particular challenge)
  2. Configure the training and testing runs
  3. Train a new model
  4. Test the model and display its performance

We will use a Speech to Text framework known as coqui.ai. You can check out the Coqui STT documentation here.

Submitted solutions will be ranked by the Mozilla Common Voice team and the awards announced on the 25th of February at the event!

We welcome everyone across Kenya to participate in the competition. A Zoom link for the session where participants will get information on how to join the session, how to participate and make submissions once registration is complete here

We look forward to seeing the community come together to build diverse voice dataset for the Kiswahili language.