Common Voice logo


Participants will explore how to represent Mixe, Chatino, and other languages in voice technology

Event features four $20,000 MXN cash prizes


(MEXICO CITY, MEXICO | WEDNESDAY, APRIL 20, 2022) — Mozilla Common Voice is hosting a five-day hackathon and competition in Mexico City this May, to ensure voice technology datasets and products better represent the languages and accents of Mexico. Event participants can be technologists, data scientists, or simply people who speak Mexican Spanish or other languages of Mexico.

The event is held in collaboration with the Institutes of Mathematical & Anthropological Research at the Universidad Nacional Autónoma de México. It will feature a workshop, talks, a panel discussion, and a hackathon with several $20,000 MXN prizes. The event will run from Friday, April 29 to Tuesday, May 3, and will occur both online and at the Universidad Nacional Autónoma de México in Mexico City. Winners will be announced and demo their projects on Thursday, May 5.

Register here.

Mozilla Common Voice is an open-source initiative to make voice technology more inclusive. People can donate their voices to an open-source dataset, and technologists can then use that data to train new products. To date, Common Voice has collected 406 hours of Spanish voice data from over 20,000 speakers.

Says E-M Lewis-Jong, Mozilla’s Product Lead for Common Voice: “Current voice technology fails to recognize a wide variety of languages and accents. The result? Millions of people are locked out of using critical technology. This event is an opportunity to mitigate that bias, especially when it comes to the languages and accents of Mexico.”

Highlights of the event include:

Hackathon with several $20,000 MXN cash prizes. Participants who build speech software and datasets have the opportunity to win a cash prize. Categories include: (1) using a pre-trained speech model for Spanish; (2) using a pre-trained speech model for an indigenous language of Mexico; (3) tools for data collection for Spanish; (4) tools for data collection for indigenous languages of Mexico.

Talks and panels by indigenous language experts. Experts will discuss the biases and opportunities of voice technology for indigineous languages in a series of talks titled: “On Speech Technologies for the Indigenous Languages of America.” Further, the event will feature a panel discussion between:

  • Tajëëw Diaz Robles: Is a Mixe from Tlahuitoltepec, she is part of the Colmix collective and currently coordinator of the project Endless Oaxaca Multilingüe of the Foundation Alfredo Harp Helú in Oaxaca and is also part of the Network of Digital Activists in Indigenous Languages.

  • Dr. Hilaria Cruz: Is a speaker of Chatino from San Juan Quiahije and currently a linguist in the Department of Comparative Humanities at the University of Louisville in the United States of America. She specializes in language documentation and revitalisation and works with computational methods of linguistic documentation and the creation of books for children.

  • Huber Benítez Meili: Has a degree in Guarani and German. He is one of the founders of Avañe’ẽ Kuaareka Aty, an organization supporting terminology, translation and interpretation in Guarani and also of the association of German teachers of Paraguay.