AI4Bharat STT Service¶

Documentation for the AI4Bharat Speech-to-Text (STT) service integration.

Overview¶

This optional service provides high-accuracy speech-to-text for Indic languages using AI4Bharat's IndicConformer model.

Supported Languages: - Hindi (hi) - Tamil (ta) - Telugu (te) - Kannada (kn) - Malayalam (ml) - Bengali (bn) - Punjabi (pa) - Marathi (mr) - Gujarat (gu) - And more...

Advantages: - Free and open-source - Optimized for Indic languages - Self-hosted (no API calls needed) - Low latency - Can run on GPU for better performance

Quick Start¶

Installation¶

cd ai4bharat_stt_server

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Download models
python download_models.py

Running the Service¶

# Development
python server.py

# Via Docker
docker build -t ai4bharat-stt .
docker run -p 8001:8001 ai4bharat-stt

# With GPU support
docker run --gpus all -p 8001:8001 ai4bharat-stt

Configuration¶

Environment Variables¶

# Server
HOST=0.0.0.0
PORT=8001
WORKERS=4

# Model
MODEL_NAME=indic-conformer-hi
MODEL_PATH=/models
DEVICE=cuda                    # cuda or cpu
BATCH_SIZE=32

# Audio
SAMPLE_RATE=16000
AUDIO_FORMAT=wav

# Logging
LOG_LEVEL=INFO

API Endpoints¶

Transcribe (Streaming)¶

Endpoint: POST /transcribe

Request:

{
  "audio": "base64-encoded-audio",
  "language": "hi",
  "format": "wav"
}

Response:

{
  "transcript": "नमस्ते, यह एक परीक्षण संदेश है।",
  "confidence": 0.98,
  "language": "hi"
}

Health Check¶

GET /health

Integration with VoiceERA¶

Configuration¶

In voice_2_voice_server/.env:

STT_PROVIDER=ai4bharat
STT_SERVICE_URL=http://ai4bharat_stt_server:8001
STT_LANGUAGE=hi

Usage¶

class AI4BharatSTT:
    def __init__(self, service_url, language="hi"):
        self.service_url = service_url
        self.language = language

    async def transcribe(self, audio_data):
        import base64
        import aiohttp

        async with aiohttp.ClientSession() as session:
            response = await session.post(
                f"{self.service_url}/transcribe",
                json={
                    "audio": base64.b64encode(audio_data).decode(),
                    "language": self.language
                }
            )
            result = await response.json()
            return result["transcript"]

Performance¶

Benchmarks¶

Language	Accuracy	Speed	GPU Required
Hindi	95%+	Real-time	Yes (recommended)
Tamil	92%+	Real-time	Yes (recommended)
Telugu	93%+	Real-time	Yes (recommended)
Kannada	91%+	Real-time	Yes (recommended)

Optimization Tips¶

Use GPU for production deployments
Batch audio chunks for better throughput
Pre-download models to avoid startup delays
Use appropriate sample rate (16kHz recommended)

Troubleshooting¶

Service won't start¶

# Check Python version
python --version  # Should be 3.10+

# Check dependencies
pip list | grep torch

# Download models
python download_models.py

Low accuracy¶

Verify audio quality (16kHz, mono)
Check language configuration matches audio
Ensure model is properly downloaded

Out of memory¶

Reduce BATCH_SIZE in config
Use CPU instead of GPU for development
Enable model quantization for production

Next Steps¶

TTS Service - Text-to-Speech documentation
Configuration - Full configuration guide
Quick Start - Get VoiceERA running