The open source Common Voice speech corpus is driven by the CC0 text collected and written by community members to power the future of speech technologies in their chosen languages. As part of our roadmap commitment to making it easier and faster for communities to contribute to Common Voice, we’ve added a feature allowing logged in users to contribute batches of up to 1,000 sentences at once.
How it works:
Once you’ve logged into the Common Voice platform, select the write dropdown to contribute your own original sentences or copyright free text with citation. To submit multiple sentences at the same time, select the “Small batch sentence submission” option from the top of the write page.
Add up to one thousand sentences, separated by new line breaks. You can optionally add sentence domains for these sentences, but all the domains should be the same for each small batch submission. You can make multiple submissions to include different domain descriptions.
If you’re submitting sentences to a specified variant of a language, please make sure that all sentences in a single batch sentence submissions are from the same variant.
You will need to include a citation, to describe where these sentences come from, so we can check that they’re copyright free. Please make separate submissions for batches of sentences from different sources.
If you’re submitting hundreds of sentences at once, it can be difficult to make sure they’re all properly formatted. If your submission has any sentences that are rejected, an option allowing you to download these sentences to re-format and send again will be shown.
What if I have more than 1,000 sentences?
The new small batch sentence submission feature is a great way for contributors to quickly add up to 1,000 sentences. To send us larger numbers of sentences, please use the “Bulk sentence upload” option from the Write page of the Common Voice platform. You can find more documentation for the bulk sentence upload process on Github.
Why do I have to wait between small batch and bulk sentence submissions?
To prevent spam and abuse, a short wait has been embedded into both the small batch sentence submission and bulk sentence upload processes. If you see a message asking you to wait, please wait a few moments before submitting another small batch or bulk sentence upload.
What kinds of sentences can I contribute to Common Voice?
Sentences written or collected for the Common Voice platform should be ones that can be comfortably read aloud in around 15 seconds. They should be either sentences from copyright free sources or your own original work that you’re comfortable contributing to a CC0 dataset. Please don’t submit sentences from copyrighted sources or that have been generated or translated using AI or other automated tooling.
Please don’t use numerals in your sentences, since they can be confusing to read aloud. If you want to include numbers, please spell them out, for example writing “four hundred and nine” instead of “409”.
You can learn more about what makes a great sentence at the sentence guidelines!
The language I want to write in isn’t on Common Voice!
At the time this blog post was written, Common Voice had 130 sentences live on our platform. That sounds like a lot, but there’s so many more languages out there in the world! If you speak a language that you would like to see included in Common Voice, please send us a language request from the “Request a language” option at the top of the languages page.
I want to have more input into my language on Common Voice
If you're interested in having a say in your language, we have great news! Common Voice is currently calling for participants in our Language Representative Program. As a first step, we are creating a database of contacts for each language community. This group will consist of volunteers who will act as primary references for any language-specific questions and concerns. If you're interested, you can sign up here.