Mozilla’s Common Voice now features 60 hours of speech datasets in eight new Indigenous languages. Meet the Taiwan Language volunteers at RightsCon on February 27, 2025, in Taipei
As thousands of languages face extinction, linguistic and heritage preservation has never been more critical. In Taiwan, a grassroots volunteer community is utilizing Mozilla’s Common Voice platform — the world’s largest public participation open speech dataset, to preserve Indigenous languages and help build inclusive, voice-enabled AI solutions.
Common Voice, a volunteer-led initiative with over 200 languages, including Traditional Mandarin and Taiwanese Hokkien, will now include eight Indigenous Formosan languages: Atayal, Bunun, Paiwan, Rukai, Oponoho, Teldreka, Seediq, and Sakizaya. The launch of these languages coincides with International Mother Language Day – this year celebrated on February 21, 2025.
Over 60 hours of speech data have already been collected by the local Indigenous language teachers around the island, with the help of the Mozilla Taiwan community, led by Irvin Chen, in collaboration with the Wikimedia Foundation in Taiwan. The dataset will be available to download in June.
“We carry our identity and heritage through our language. By bringing our culture into technology, we’re not just preserving words, we’re keeping our cultures alive,” says Chen.
The expansion of Taiwanese Indigenous languages is part of Mozilla’s Open Multilingual Speech initiative, a broader effort to support more ultra-low-resource and indigenous languages. This first round has included over 70 communities in Southeast Asia and beyond.
“We love seeing local communities mobilize around their languages. Common Voice is really their project. This embodies the true spirit of open-source collaboration and community engagement in shaping ethical AI,” says EM Lewis-Jong, Common Voice Product Director at Mozilla Foundation.
Common Voice's datasets can be used by anyone for free and have been utilized widely, from developing audio translation software in healthcare solutions to voice applications that teach women to better understand and exercise their land rights.
“We carry our identity and heritage through our language. By bringing our culture into technology, we’re not just preserving words, we’re keeping our cultures alive"
Irvin Chen, Taiwan Community Volunteer Lead
Meet the Volunteer community at RightsCon
The Mozilla Taiwan community will have a dedicated booth at RightsCon Taiwan on February 27, where attendees can learn more about Common Voice and help contribute to the initiative. Additionally, on Saturday, February 22, the Taiwanese language community will be participating in the g0v bi-monthly hackathon. You can also learn more about the Common Voice in Taiwan by visiting their project website: moztw.org/common-voice
Join the Movement
At Mozilla, we believe that everyone can shape AI. Be part of a global community advancing these efforts today, by joining our Common Voice community Discord and signing up forMozilla Foundation’s newsletter to get updates about other Mozilla initiatives.