Common Voice

Common Voice is the most diverse open voice dataset in the world. Most voice datasets are owned by companies, which stifles innovation. They also under-represent almost every language in the world, as well as people of colour, disabled people, women and LGBTQIA+ people. We want to change that by mobilising people everywhere to share their voice.

Platform and Dataset

Common Voice is the largest crowdsourced multilingual speech dataset in the world

common voice about

Why Common Voice?

Voice-enabled technology is becoming increasingly ubiquitous - from smart phone assistants to wearable healthcare devices to language practice software. It’s also leaving a lot of people behind. Voice assistants currently support fewer than 1% of the world’s languages! For some communities, even if their language is supported, they might not be understood as AI training data regularly under-represents gender-diverse communities, People of Color, and those with marginalised or non-native accents.

We’re here to change that! By making it easy for people like you to share your voice.

How does Common Voice work?

  • Step 1. Someone asks for a language to be added.
  • Step 2. The website text is translated into that language by volunteers.
  • Step 3. Sentences are collected for people to read aloud.
  • Step 4. We launch the Common Voice platform in this language.
  • Step 5. People come and contribute their voices.
  • Step 6. Other people validate those voice clips.
  • Step 7. We release the dataset every 3 months.
  • Step 8. The cycle continues! More sentences, more clips, more validations! We always need your help!

So how do I get involved?

Thanks for asking! Check on the platform to see if your language needs more clips, more validations, or more sentences.

If you’re not sure - go ahead and contribute some clips! It only takes a couple of moments to make AI work better for everyone.

If you want to get more deeply involved, why not become a community mobiliser - from being an educator on AI inclusion issues online to to running local events in your community - there are lots of ways to help! Just reach out and we’ll put you in touch with the right people to help.

What’s next for the Common Voice platform!

In terms of contributor experience, we’re currently working to make the platform easier to use for people in low bandwidth contexts and making it simpler for new people to get involved quickly. For app architecture and infrastructure, we’re making the platform more scalable and the dataset more segmentable for data consumers.

Next year, we will be working on some major platform evolutions - including expanding into spontaneous speech. If you’re an engineer or data scientist who wants to help out - get in touch!

Platform and Dataset

Common Voice is the largest crowdsourced multilingual speech dataset in the world

common voice about

Why Common Voice?

Voice-enabled technology is becoming increasingly ubiquitous - from smart phone assistants to wearable healthcare devices to language practice software. It’s also leaving a lot of people behind. Voice assistants currently support fewer than 1% of the world’s languages! For some communities, even if their language is supported, they might not be understood as AI training data regularly under-represents gender-diverse communities, People of Color, and those with marginalised or non-native accents.

We’re here to change that! By making it easy for people like you to share your voice.

How does Common Voice work?

  • Step 1. Someone asks for a language to be added.
  • Step 2. The website text is translated into that language by volunteers.
  • Step 3. Sentences are collected for people to read aloud.
  • Step 4. We launch the Common Voice platform in this language.
  • Step 5. People come and contribute their voices.
  • Step 6. Other people validate those voice clips.
  • Step 7. We release the dataset every 3 months.
  • Step 8. The cycle continues! More sentences, more clips, more validations! We always need your help!

So how do I get involved?

Thanks for asking! Check on the platform to see if your language needs more clips, more validations, or more sentences.

If you’re not sure - go ahead and contribute some clips! It only takes a couple of moments to make AI work better for everyone.

If you want to get more deeply involved, why not become a community mobiliser - from being an educator on AI inclusion issues online to to running local events in your community - there are lots of ways to help! Just reach out and we’ll put you in touch with the right people to help.

What’s next for the Common Voice platform!

In terms of contributor experience, we’re currently working to make the platform easier to use for people in low bandwidth contexts and making it simpler for new people to get involved quickly. For app architecture and infrastructure, we’re making the platform more scalable and the dataset more segmentable for data consumers.

Next year, we will be working on some major platform evolutions - including expanding into spontaneous speech. If you’re an engineer or data scientist who wants to help out - get in touch!