Vivekananda Pani, CTO and co-founder of Reverie Language Technologies, likens the development of AI for Indian languages to a journey to the moon: while walking would take 25 lifetimes, building a rocket can drastically shorten the trip. The primary challenge lies in the scarcity of high-quality data for Indian languages, unlike English, which has abundant digital content for training AI models.
The Data Dilemma and Potential Solutions
Pani points out the lack of natural data for Indian languages online, a stark contrast to the availability for English. While artificial methods like machine translation can convert English data into Indian languages, Pani suggests a more sustainable approach: encouraging the creation of natural data through diverse media formats like audio and video transcriptions.
Standardization and the Digital Divide
One issue is the lack of standardization for Indian languages in the digital world. Different keyboards and typing styles create variations in how words are represented, unlike the uniformity of English. This discrepancy is further highlighted by the fact that people tend to use English letters for typing messages in their native languages, even if they don’t know English.
Building AI for Native Languages
Pani emphasizes the necessity of developing AI models specifically for Indian languages, considering that less than 7% of the Indian population is fluent in English. He acknowledges the evolving startup landscape in India, with a growing belief in the potential of AI following OpenAI’s achievements.
Hardware Hurdles and Data Control
Building AI hardware in India presents another challenge, as the country currently lacks sufficient skills and relies on expensive imports. Pani also echoes concerns about data control, noting that data “donated” by Indian users to foreign companies often requires payment for access by Indian researchers. He advocates for greater government intervention in setting standards and establishing fundamentals for India’s computing world, similar to China and Japan.
Add Comment