AI4Bharat TTS Service¶
Documentation for the AI4Bharat Text-to-Speech (TTS) service integration.
Overview¶
This optional service provides natural speech synthesis for Indic languages using AI4Bharat's IndicParler model.
Supported Languages: - Hindi (hi) - Tamil (ta) - Telugu (te) - Kannada (kn) - Malayalam (ml) - Bengali (bn) - Punjabi (pa) - Marathi (mr) - Gujarati (gu) - And more...
Advantages: - Free and open-source - Optimized for Indic languages - Self-hosted (no API calls needed) - Natural-sounding voices - Multiple speaker options - Can run on GPU for better performance
Quick Start¶
Installation¶
cd ai4bharat_tts_server
# Create virtual environment
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Download models
python download_models.py
Running the Service¶
# Development
python server.py
# Via Docker
docker build -t ai4bharat-tts .
docker run -p 8002:8002 ai4bharat-tts
# With GPU support
docker run --gpus all -p 8002:8002 ai4bharat-tts
Configuration¶
Environment Variables¶
# Server
HOST=0.0.0.0
PORT=8002
WORKERS=2
# Model
MODEL_NAME=indic-parler-hi
MODEL_PATH=/models
DEVICE=cuda # cuda or cpu
ENABLE_CACHING=true
# Audio
SAMPLE_RATE=16000
AUDIO_FORMAT=wav
SPEED=1.0 # 0.5 - 2.0
# Logging
LOG_LEVEL=INFO
API Endpoints¶
Synthesize¶
Endpoint: POST /synthesize
Request:
{
"text": "नमस्ते, आपका स्वागत है।",
"language": "hi",
"speaker": "female",
"speed": 1.0
}
Response:
Content-Type: audio/wav
Body: Binary audio data (16-bit PCM, 16kHz)
Health Check¶
GET /health
Speakers & Voices¶
Available Speakers¶
Each language supports multiple speakers:
Hindi (hi):
- female (default)
- male
- child
Tamil (ta):
- female (default)
- male
Telugu (te):
- female (default)
- male
And so on for other languages...
Integration with VoiceERA¶
Configuration¶
In voice_2_voice_server/.env:
TTS_PROVIDER=ai4bharat
TTS_SERVICE_URL=http://ai4bharat_tts_server:8002
TTS_LANGUAGE=hi
TTS_SPEAKER=female
TTS_SPEED=1.0
Usage¶
class AI4BharatTTS:
def __init__(self, service_url, language="hi", speaker="female"):
self.service_url = service_url
self.language = language
self.speaker = speaker
async def synthesize(self, text, speed=1.0):
import aiohttp
async with aiohttp.ClientSession() as session:
response = await session.post(
f"{self.service_url}/synthesize",
json={
"text": text,
"language": self.language,
"speaker": self.speaker,
"speed": speed
}
)
audio_data = await response.read()
return audio_data
Voice Customization¶
Adjust Speed¶
# Slow down speech
await tts.synthesize("Hello", speed=0.8)
# Speed up speech
await tts.synthesize("Hello", speed=1.2)
# Range: 0.5 (very slow) to 2.0 (very fast)
Choose Speaker¶
# Female voice
response = await session.post(
url,
json={
"text": "नमस्ते",
"speaker": "female"
}
)
# Male voice
response = await session.post(
url,
json={
"text": "नमस्ते",
"speaker": "male"
}
)
Performance¶
Benchmarks¶
| Language | Quality | Speed | GPU Required |
|---|---|---|---|
| Hindi | High | Real-time | Yes (recommended) |
| Tamil | High | Real-time | Yes (recommended) |
| Telugu | High | Real-time | Yes (recommended) |
| Kannada | High | Real-time | Yes (recommended) |
Optimization Tips¶
- Enable audio caching for repeated text
- Use GPU for production deployments
- Pre-warm models on service startup
- Batch synthesis requests when possible
- Use appropriate sample rate (16kHz recommended)
Caching Strategy¶
Enable Caching¶
ENABLE_CACHING=true
CACHE_SIZE_MB=500
CACHE_TTL_HOURS=24
How Caching Works¶
Request: "नमस्ते" in Hindi
│
├─► Check cache
│ ├─► Cache HIT: Return cached audio
│ └─► Cache MISS: Generate and cache
│
└─► Return audio to client
Cache Key¶
cache_key = f"{text}:{language}:{speaker}:{speed}"
# Example: "नमस्ते:hi:female:1.0"
Troubleshooting¶
Service won't start¶
# Check Python version
python --version # Should be 3.10+
# Check dependencies
pip list | grep torch
# Download models
python download_models.py
No audio output¶
- Verify text is in the correct language
- Check speaker name is valid for the language
- Ensure model is properly downloaded
- Check service logs for errors
Audio quality issues¶
- Verify language matches text language
- Try different speaker
- Adjust speed parameter
- Check input text for special characters
Out of memory¶
- Reduce cache size
- Disable caching if not needed
- Use CPU instead of GPU
- Enable model quantization
Slow synthesis¶
- Enable GPU if available
- Check system resources (CPU, memory)
- Enable caching for repeated text
- Reduce batch size
Advanced Configuration¶
Docker Compose Integration¶
services:
ai4bharat_tts_server:
build: ./ai4bharat_tts_server
container_name: voicera_tts
restart: unless-stopped
ports:
- "8002:8002"
environment:
HOST: 0.0.0.0
PORT: 8002
MODEL_PATH: /models
DEVICE: cuda
volumes:
- ./models:/models
devices:
- /dev/nvidia.com/gpu=all # GPU support
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8002/health"]
interval: 5s
timeout: 5s
retries: 5
Next Steps¶
- STT Service - Speech-to-Text documentation
- Configuration - Full configuration guide
- Quick Start - Get VoiceERA running