Data Flow Through the System¶
This document explains how data moves through VoiceERA during various operations.
Voice Call Data Flow¶
The most complex and important flow in VoiceERA.
Complete Voice Call Sequence¶
┌─────────────────────────────────────────────────────────────────────┐
│ Step 1: Call Initiation │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Caller (Phone) ──[Ring & Answer]──► Vobiz Platform │
│ │ │
│ ▼ │
│ Voice Server [WebSocket Ready] │
│ │ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Step 2: Audio Input (Caller Speaks) │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Caller Audio Frames ──[Vobiz]──► Voice Server │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ [Buffer] [Resample] [Validate Audio] │
│ │ │ │ │
│ └───────────────┼───────────────┘ │
│ │ │
│ ▼ │
│ [STT Service] │
│ │ │
│ ▼ │
│ "Hello, I'd like..." │
│ (Transcript) │
│ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Step 3: LLM Processing (Generate Response) │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Transcript ──► [System Prompt + Context] ──► LLM API │
│ │ │
│ ▼ │
│ OpenAI, Claude, etc. │
│ │ │
│ ▼ │
│ "Sure, I'd be happy to help..."│
│ (LLM Response) │
│ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Step 4: TTS Synthesis (Voice Generation) │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Response Text ──► TTS API ──► Voice Generation │
│ │ │
│ ▼ │
│ [Audio Synthesis] │
│ │ │
│ ▼ │
│ Audio Frames │
│ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Step 5: Audio Output (Caller Hears Response) │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Audio Frames ──► [Vobiz] ──► Caller (Phone Speaker) │
│ │
│ [Loop continues] │
│ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Step 6: Call Termination & Storage │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Caller hangs up ──► Voice Server │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ [Save Audio] [Save Transcript] [Log Metadata] │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ MinIO MongoDB MongoDB │
│ (.wav file) (transcript) (call_log) │
│ │
└─────────────────────────────────────────────────────────────────────┘
Detailed Data Structures¶
Audio Frame:
{
"type": "audio",
"session_id": "uuid",
"timestamp": 1234567890,
"sequence": 1,
"data": "<base64-encoded-audio>",
"format": "pcm_16k"
}
STT Request:
{
"audio": "<audio-data>",
"language": "en",
"model": "nova-2"
}
STT Response:
{
"transcript": "Hello, I'd like to know about your services",
"confidence": 0.98,
"alternatives": [
{ "transcript": "Hello, I'd like to know about the services" }
]
}
LLM Request:
{
"model": "gpt-4",
"messages": [
{
"role": "system",
"content": "You are a helpful customer service agent..."
},
{
"role": "user",
"content": "Hello, I'd like to know about your services"
}
],
"temperature": 0.7,
"max_tokens": 150
}
LLM Response:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "Sure, I'd be happy to help! We offer..."
}
}
],
"usage": {
"prompt_tokens": 45,
"completion_tokens": 78,
"total_tokens": 123
}
}
Call Log (MongoDB):
{
"_id": ObjectId(...),
"campaign_id": "uuid",
"phone_number": "+1234567890",
"caller_id": "uuid",
"duration_seconds": 120,
"status": "completed",
"transcript": "...",
"emotions": ["satisfied", "engaged"],
"sentiment": "positive",
"created_at": ISODate("2024-01-29T10:30:00Z"),
"updated_at": ISODate("2024-01-29T10:32:00Z")
}
Recording (MinIO):
Path: recordings/{campaign_id}/{call_id}.wav
Size: ~960 KB (2-minute call at 16kHz)
Format: WAV (PCM, 16-bit, 16kHz mono)
Agent Creation Data Flow¶
┌─────────────────────────────┐
│ Frontend: Create Agent Form │
└─────────────────────────────┘
│
▼
┌──────────────┐
│ Validate Form│
└──────────────┘
│
▼
POST /agents (JSON)
Request Payload:
{
"name": "Sales Agent",
"llm_provider": "openai",
"llm_model": "gpt-4",
"stt_provider": "deepgram",
"tts_provider": "cartesia",
"system_prompt": "You are a sales agent...",
"language": "en"
}
Processing in Backend:
1. Validate request data
2. Check user permissions
3. Create Agent document in MongoDB
4. Generate config file
5. Save config to MinIO
6. Return Agent ID to frontend
Response:
{
"id": "agent-uuid",
"name": "Sales Agent",
"status": "active",
"created_at": "2024-01-29T10:30:00Z"
}
Stored in MongoDB (agents collection):
{
"_id": ObjectId(...),
"id": "agent-uuid",
"user_id": "user-uuid",
"name": "Sales Agent",
"llm_provider": "openai",
"llm_model": "gpt-4",
"stt_provider": "deepgram",
"tts_provider": "cartesia",
"system_prompt": "You are a sales agent...",
"language": "en",
"status": "active",
"created_at": ISODate(...),
"updated_at": ISODate(...)
}
Campaign Launch Data Flow¶
┌──────────────────────────┐
│ Backend: Create Campaign │
└──────────────────────────┘
│
▼
POST /campaigns (JSON)
{
"name": "New Year Sale",
"agent_id": "agent-uuid",
"phone_numbers": ["+1234567890", "+1234567891"],
"start_time": "2024-02-01T00:00:00Z",
"end_time": "2024-02-28T23:59:59Z"
}
┌──────────────────────────────┐
│ Backend: Validation & Setup │
└──────────────────────────────┘
│
┌───────┼────────┐
│ │ │
▼ ▼ ▼
1. Fetch 2. Check 3. Create
Agent Limits Campaign
Config Document
┌──────────────────────┐
│ Save to MongoDB │
│ campaigns collection │
└──────────────────────┘
│
▼
┌───────────────────┐
│ Schedule calls │
│ via Vobiz API │
└───────────────────┘
│
▼
Response: campaign_id
Analytics Data Flow¶
Real-time call data
│
├─► MongoDB (Raw logs)
│
├─► Aggregation Pipeline
│
└─► Cached metrics
(Redis optional)
Frontend Dashboard
│
▼
GET /analytics/calls?agent_id=...
GET /analytics/sentiment?campaign_id=...
GET /analytics/top-phrases?agent_id=...
│
▼
Backend aggregates
│
▼
Return JSON metrics
│
▼
Display in charts
Analytics Data in MongoDB:
{
"_id": ObjectId(...),
"type": "call_analytics",
"agent_id": "agent-uuid",
"campaign_id": "campaign-uuid",
"period": "2024-01-29",
"metrics": {
"total_calls": 150,
"completed_calls": 145,
"average_duration": 180,
"sentiment": {
"positive": 95,
"neutral": 40,
"negative": 10
},
"top_phrases": [
"pricing",
"features",
"support"
]
}
}
Recording Retrieval Data Flow¶
User Action: Download call recording
Frontend
│
├─► User clicks "Download"
│
├─► GET /call-logs/{call_id}
│
▼
Backend
│
├─► Fetch metadata from MongoDB
│
├─► Generate pre-signed URL for MinIO
│
▼
Response
│
├─► {
"call_id": "...",
"download_url": "https://minio:9001/...",
"transcript": "...",
"duration": 120
}
│
▼
Frontend
│
├─► User clicks download link
│
└─► Browser downloads audio from MinIO
Authentication Data Flow¶
┌──────────────────────────────┐
│ Frontend: Login Form │
│ Email: user@example.com │
│ Password: •••••••• │
└──────────────────────────────┘
│
▼
POST /auth/login
{
"email": "user@example.com",
"password": "••••••••"
}
┌──────────────────────────────┐
│ Backend: Authenticate │
└──────────────────────────────┘
│
┌─────┴─────┐
│ │
▼ ▼
1. Fetch 2. Hash &
User Compare
from password
MongoDB
│ │
└─────┬─────┘
│
▼
┌──────────────┐
│ Credentials │
│ Valid? │
└──────────────┘
│
├─── YES ──► Generate JWT
│ │
│ ▼
│ {
│ "user_id": "uuid",
│ "email": "...",
│ "iat": 1234567890,
│ "exp": 1234571490,
│ "permissions": ["read", "write"]
│ }
│ │
│ ▼
│ Response:
│ {
│ "token": "eyJ0eXAi...",
│ "expires_in": 3600
│ }
│
├─── NO ──► Response 401 Unauthorized
Caching & Performance¶
Backend Caching¶
Request 1: GET /agents/123
│
├─► Cache MISS
│ │
│ ▼
│ Query MongoDB
│ │
│ ▼
│ Store in cache (TTL: 10min)
│ │
└─────► Return to client
Request 2: GET /agents/123 (within 10min)
│
├─► Cache HIT
│ │
└─────► Return from cache (no DB query)
Database Query Optimization¶
Without index (slow):
SCAN all documents in call_logs collection
Filter by campaign_id
O(n) - millions of documents
With index (fast):
LOOKUP in index for campaign_id
O(log n) - direct access
Error Handling & Recovery¶
STT Failure Scenario¶
Voice Server receives audio
│
▼
Send to STT Service
│
X Error: Service unavailable
│
▼
Retry with backoff (1s, 2s, 4s)
│
├─► Success on retry 2
│ │
│ ▼
│ Continue call flow
│
└─► All retries failed
│
▼
Fallback: Ask user to repeat
│
▼
If persistent: End call gracefully
Summary¶
| Operation | Data Path | Storage | Latency |
|---|---|---|---|
| Voice Call | Audio frames → STT → LLM → TTS | MinIO + MongoDB | <500ms |
| Create Agent | Form → Backend → MongoDB | MongoDB | <200ms |
| Launch Campaign | Form → Backend → Vobiz | MongoDB | <500ms |
| Get Analytics | Query MongoDB → Aggregate | MongoDB | <1000ms |
| Download Recording | Pre-signed URL → MinIO | MinIO | Depends on file size |
Next Steps¶
- System Design Details - Component implementation details
- REST API - Endpoint specifications
- WebSocket API - Real-time communication protocol