About IndoKosh
Building rights-verified language data for India's many voices.
IndoKosh is a platform for creating a large corpus of high-quality translations and natural speech recordings in Indian languages. The work starts with carefully translated sentences and grows through reviewed voice recordings from real speakers.
Translate
Contributors translate English sentences into Indian languages using native fluency, local usage, and cultural context.
Record
Speakers record natural audio for approved translations, helping build datasets that include spoken language, accents, and rhythm.
Preserve
Every reviewed contribution helps preserve linguistic knowledge and supports future language access, speech, and AI systems.
Why it matters
Many Indian languages are underrepresented in modern language and speech technology. IndoKosh focuses on collecting data that is useful, reviewed, and properly authorized so it can support research, product development, and preservation over time.
Quality and rights
Contributions go through assignment and review workflows before they become part of the dataset. Contributors also accept a content assignment and data consent agreement so translations and recordings can be used clearly and responsibly.
Want to contribute?
IndoKosh currently supports contributor workflows for translating and recording Indian language sentences, with an initial focus on Hindi, Tamil, and Telugu.