In the quiet village of Agara, nestled amidst the picturesque landscapes of rural India, a remarkable transformation is taking place. Preethi P., like many others in her region, would typically spend her days sewing clothes to earn less than $1 a day. However, today, she is using her native Kannada language to contribute to a groundbreaking initiative that has the potential to change the landscape of AI and data annotation in India.
Karya, a startup founded in 2021 by Manu Chopra, is at the forefront of this change. This social impact organization has teamed up with some of the tech industry giants, including Google and Microsoft, to address one of the most significant challenges in the world of AI: the need for high-quality data to better serve billions of non-English-speaking users in India.
The Power of Data Annotation
Preethi and 70 other workers in Agara and neighboring villages have joined forces with Karya to collect and label text, voice, and image data in India's vernacular languages. These data annotation workers play a crucial role in providing the foundation for AI chatbots and virtual assistants to generate relevant responses, making AI more accessible to non-English speakers.
What sets Karya apart from other data vendors is its commitment to fair compensation. Most of the workers, predominantly women in rural communities, receive up to 20 times the prevailing minimum wage, significantly improving their livelihoods. For Preethi, the income she generates by working with Karya allowed her to make substantial improvements to her home.
Manu Chopra, the 27-year-old computer engineer behind Karya, highlights the importance of fair pay in the industry, emphasizing that "poor pay for such work is an industry failure." The initiative addresses this issue by ensuring that data annotation workers are compensated well for their contributions.
Silicon Valley's Partnership with Karya
Silicon Valley giants have recognized the potential of Karya's approach and are turning to the startup to meet the demand for high-quality data. Microsoft has leveraged Karya's services to source local speech data for its AI products. The Bill & Melinda Gates Foundation collaborates with Karya to reduce gender biases in data used for large language models. Google, on the other hand, is partnering with Karya and other local organizations to collect speech data in 85 Indian districts, with plans to expand to all districts and develop generative AI models for 125 Indian languages.
The key challenge lies in addressing the inadequacy of non-English datasets, the scarcity of conversational data in Indian languages, and limited digitized content in Indian languages. Many existing AI models have been predominantly trained on English-language internet data, leading to issues such as the generation of inaccurate words and difficulties with grammar when applied to South Asian languages.
Karya's approach aims not only to enhance data quality but also to empower workers in rural areas who may not have access to such opportunities otherwise. The startup's app works offline and provides voice support for those with limited literacy, making it accessible to a wide range of workers.
Manu Chopra's vision extends beyond data annotation. He aspires to combat poverty through Karya, drawing from his own background growing up in an impoverished neighborhood. His startup has made it possible to collect vast amounts of speech data, improving the quality and diversity of data used in AI systems and research.
The Impact of Fair Compensation
Karya's efforts to provide fair compensation to workers have been recognized by the tech community. Saikat Guha, a researcher at Microsoft Research India, commended the quality of data from Karya, emphasizing that fair pay results in better data quality. The startup is also involved in a significant project to reduce gender-related biases in large language models, a critical step toward more inclusive AI.
Karya's impact extends beyond India, as the company is exploring opportunities to expand its platform as a service to organizations in Africa and South America. This initiative not only enhances data quality but also empowers individuals in rural areas, giving them a chance to earn and educate their families.
The transformation happening in villages like Agara and Yelandur reflects a significant shift in the AI and data annotation industry, driven by a commitment to fair compensation, quality data, and inclusivity. Karya's partnership with tech giants like Google and Microsoft paves the way for a future where AI truly serves the diverse needs of billions of non-English speakers in India and beyond.