A profile picture of Davud Kakaie


This is a profile feature of Davud Kakaie, one of Mozilla’s Common Voice, ‘Our Voice’ competition winners.


Oftentimes, people take speech-to-text technology for granted — a tool that can effortlessly understand their spoken words and turn them into text, making countless tasks more efficient.

But there are millions of people who don’t have that luxury — like Kurdish-speaking people living in Turkey, Iran, and elsewhere around the world. About half of the Kurdish population speaks Northern Kurdish (Kurmanji), and half speaks Central Kurdish (Sorani).

“There are between 30 and 45 million Kurdish people according to different estimations, but none of the Kurdish dialects have a speech-to-text toolkit. There is none,” explains Davud Kakaie, a self-taught software developer based in Iran and one of the winners of Mozilla Common Voice’s “Our Voices” competition.

That’s because there’s a scarcity of voice datasets in Kurdish, which are used to train voice technology to understand and transcribe the language. “I wanted to do my best to change that,” Kakaie explains.

And so Kakaie developed the first-ever model for Kurdish, which won one of the four “Our Voices” awards. Kakaie used the 122 hours of Sorani collected by Common Voice and Quartznet for Automatic Speech Recognition.

“With this work, I am on the verge of releasing a free, small and efficient software package giving users the ability to use voice-enabled Kurdish solutions,” Kakaie explains. The software is under 100MB and available for Windows, Linux, and Microsoft Office. Kakaie is also working on an alternative version that is only 11MB and intended for low-resource devices including smartphones as well as large-scale deployments.

In the months ahead, Kakaie is excited to see Kurdish speakers put the new technology to work. “I’ve seen people enjoy interacting with voice-enabled applications. With this model I hope Kurdish speakers can simplify their workload a bit too,” he says. “There are lots of opportunities — we’ve seen the power of voice-enabled tools in our everyday devices.”

The product will also be available as a Software Development Kit(SDK), providing other developers with a simple API to build different applications on top of it. Furthermore, all compatible models trained in any language can be served by it. “I’m letting my imagination go wild,” Kakaie says.


Contenido relacionado