HYDERABAD: Soon you may be able to interact with virtual assistants like Siri, Alexa and smart devices like Google Home in regional languages as International Institute of Information Technology-Hyderabad (IIIT-H) is working on a unique speech technology in Telugu.
A year-long pilot project at a cost of Rs 1 crore has been granted under the central government’s National Mission on Natural Language Translation also known as ‘Bahubashak’ to the Language Technologies Research Centre at IIIT-H by the ministry of electronics and information technology. As part of this, a team of researchers have to develop speech datasets in Telugu.
Currently, users can interact with smart devices such as Alexa only in English or Hindi in India. Researchers said this is primarily because there is not enough datasets available in other regional languages. By developing datasets in regional languages, the idea is to enable smart devices/assistants to remove language barriers in India.
“Artificial intelligence completely depends on datasets. The more data is fed into the system, the more efficient it becomes. We will be collecting close to 2,000 hours in Telugu as part of this pilot project,” said Prof Prakash Yalla, primary investigator for the project, who would also be one of the advisor at the national level.
The researchers would crowd-source conversations in Telugu to develop speech datasets. Not just that, it would also be a nodal agency to establish protocols and standards for other regional languages. “We have to set protocols and mechanisms to enable crowd-sourcing of content. So, we will be deciding on how to make people contribute their conversations, incentives, transcription, quality and privacy of such content,” Prof Yalla said.