The new Bulbul V3 model focuses on capturing India’s linguistic diversity, enabling real-time, expressive voice output for conversational AI, customer engagement platforms, and large-scale enterprise deployments.
Indian artificial intelligence start-up Sarvam has introduced an upgraded version of its text-to-speech technology, marking a significant step toward building more natural and inclusive voice-based AI systems for the country’s diverse linguistic landscape.
The new model, named Bulbul V3, delivers enhanced speech quality by better capturing regional accents, scripts and speaking styles commonly found across India. According to the company, the model currently supports more than 11 Indian languages and features over 35 distinct voices recorded by professional voice artists. Sarvam said it plans to expand coverage to all 22 constitutionally recognised Indian languages in the coming months.
Bulbul V3 is designed to convert written text into lifelike speech by analysing not just words, but also contextual cues such as pauses, emphasis, tone and pacing. The company said these prosodic elements help the audio output sound more conversational and expressive, especially for real-world applications. A low-latency streaming mode enables real-time speech generation, making the model suitable for interactive and live-use scenarios.
Built for India’s linguistic complexity
Sarvam highlighted that Indian speech patterns present unique challenges, including frequent code-switching between languages, regional variations in pronunciation, and the importance of correctly handling names and emotional context. Bulbul V3 has been developed to handle such complexity without disrupting speech flow, the company said.
The model also includes a consent-based voice cloning feature, allowing enterprises to create customised AI-generated voices at scale. Sarvam said safeguards have been built in to ensure responsible use of the technology, particularly in high-volume commercial deployments.
Testing, access and broader AI push
Sarvam said Bulbul V3 was independently evaluated through a blind listening study covering 11 languages. While some global competitors ranked higher on overall audio quality, the company claimed its model performed strongly in full-band evaluations and delivered superior results in telephony-grade audio, with fewer pronunciation errors.
The release forms part of Sarvam’s ongoing rollout of new AI tools ahead of the India-AI Impact Summit 2026 in New Delhi. The start-up has also been selected under the government’s Rs 10,300-crore India AI Mission to help develop sovereign large language models, which are expected to be showcased at the summit.
Developers can access Bulbul V3 through Sarvam’s dashboard, with unlimited API usage being offered for a limited period to encourage experimentation and adoption.
