Language is one important component to digital inclusion. Lots of the technology is built from countries where English is the main language. Beyond device access and affordability, language becomes a huge barrier to access to digital services.
The ability to use the language that you speak with everyday to access digital services is important towards digital inclusion. When this is done on low end devices the better. The capability to process the natural language on a remote platform or on an edge computer becomes critical towards access to a huge demographic that is not served. Voice becomes a critical component to digital inclusion and enabling startups, businesses and organizations in communities where language is a barrier is very important. This is why I believe the common voice platform really matters.
Making voice datasets openly available allows for innovation and research around languages that are not served. This lowers the barrier for entry for those who are building and researching languages. The capital required to collect diverse voice dataset to train machine learning models can be high. A community approach towards building open voice datasets allows for diverse data that reduces bias in the tools built.
Voice is critical but a privacy first approach is very important. Building tools that will collect and process voice data need to put the people’s privacy first.
I am very excited to see what is built using the Swahili language voice datasets that are now available on the common voice platform.