Skip to content

Data Flow Through the System

This document explains how data moves through VoiceERA during various operations.

Voice Call Data Flow

The most complex and important flow in VoiceERA.

Complete Voice Call Sequence

┌─────────────────────────────────────────────────────────────────────┐
│ Step 1: Call Initiation                                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Caller (Phone) ──[Ring & Answer]──► Vobiz Platform                │
│                                           │                         │
│                                           ▼                         │
│                      Voice Server [WebSocket Ready]                │
│                                           │                         │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│ Step 2: Audio Input (Caller Speaks)                                 │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Caller Audio Frames ──[Vobiz]──► Voice Server                     │
│                                        │                            │
│                        ┌───────────────┼───────────────┐            │
│                        │               │               │            │
│                        ▼               ▼               ▼            │
│                    [Buffer]        [Resample]   [Validate Audio]   │
│                        │               │               │            │
│                        └───────────────┼───────────────┘            │
│                                        │                            │
│                                        ▼                            │
│                                   [STT Service]                    │
│                                        │                            │
│                                        ▼                            │
│                                   "Hello, I'd like..."              │
│                                   (Transcript)                      │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│ Step 3: LLM Processing (Generate Response)                          │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Transcript ──► [System Prompt + Context] ──► LLM API              │
│                                                   │                 │
│                                                   ▼                 │
│                                              OpenAI, Claude, etc.   │
│                                                   │                 │
│                                                   ▼                 │
│                                      "Sure, I'd be happy to help..."│
│                                      (LLM Response)                 │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│ Step 4: TTS Synthesis (Voice Generation)                            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Response Text ──► TTS API ──► Voice Generation                    │
│                                   │                                │
│                                   ▼                                │
│                            [Audio Synthesis]                       │
│                                   │                                │
│                                   ▼                                │
│                              Audio Frames                          │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│ Step 5: Audio Output (Caller Hears Response)                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Audio Frames ──► [Vobiz] ──► Caller (Phone Speaker)              │
│                                                                      │
│                        [Loop continues]                            │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│ Step 6: Call Termination & Storage                                  │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Caller hangs up ──► Voice Server                                  │
│                          │                                          │
│          ┌───────────────┼───────────────┐                         │
│          │               │               │                         │
│          ▼               ▼               ▼                         │
│       [Save Audio]   [Save Transcript] [Log Metadata]              │
│          │               │               │                         │
│          ▼               ▼               ▼                         │
│        MinIO          MongoDB         MongoDB                      │
│      (.wav file)      (transcript)    (call_log)                   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Detailed Data Structures

Audio Frame:

{
  "type": "audio",
  "session_id": "uuid",
  "timestamp": 1234567890,
  "sequence": 1,
  "data": "<base64-encoded-audio>",
  "format": "pcm_16k"
}

STT Request:

{
  "audio": "<audio-data>",
  "language": "en",
  "model": "nova-2"
}

STT Response:

{
  "transcript": "Hello, I'd like to know about your services",
  "confidence": 0.98,
  "alternatives": [
    { "transcript": "Hello, I'd like to know about the services" }
  ]
}

LLM Request:

{
  "model": "gpt-4",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful customer service agent..."
    },
    {
      "role": "user",
      "content": "Hello, I'd like to know about your services"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 150
}

LLM Response:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Sure, I'd be happy to help! We offer..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 45,
    "completion_tokens": 78,
    "total_tokens": 123
  }
}

Call Log (MongoDB):

{
  "_id": ObjectId(...),
  "campaign_id": "uuid",
  "phone_number": "+1234567890",
  "caller_id": "uuid",
  "duration_seconds": 120,
  "status": "completed",
  "transcript": "...",
  "emotions": ["satisfied", "engaged"],
  "sentiment": "positive",
  "created_at": ISODate("2024-01-29T10:30:00Z"),
  "updated_at": ISODate("2024-01-29T10:32:00Z")
}

Recording (MinIO):

Path: recordings/{campaign_id}/{call_id}.wav
Size: ~960 KB (2-minute call at 16kHz)
Format: WAV (PCM, 16-bit, 16kHz mono)


Agent Creation Data Flow

┌─────────────────────────────┐
│ Frontend: Create Agent Form │
└─────────────────────────────┘
          │
          ▼
   ┌──────────────┐
   │ Validate Form│
   └──────────────┘
          │
          ▼
  POST /agents (JSON)

Request Payload:

{
  "name": "Sales Agent",
  "llm_provider": "openai",
  "llm_model": "gpt-4",
  "stt_provider": "deepgram",
  "tts_provider": "cartesia",
  "system_prompt": "You are a sales agent...",
  "language": "en"
}

Processing in Backend:

1. Validate request data
2. Check user permissions
3. Create Agent document in MongoDB
4. Generate config file
5. Save config to MinIO
6. Return Agent ID to frontend

Response:

{
  "id": "agent-uuid",
  "name": "Sales Agent",
  "status": "active",
  "created_at": "2024-01-29T10:30:00Z"
}

Stored in MongoDB (agents collection):

{
  "_id": ObjectId(...),
  "id": "agent-uuid",
  "user_id": "user-uuid",
  "name": "Sales Agent",
  "llm_provider": "openai",
  "llm_model": "gpt-4",
  "stt_provider": "deepgram",
  "tts_provider": "cartesia",
  "system_prompt": "You are a sales agent...",
  "language": "en",
  "status": "active",
  "created_at": ISODate(...),
  "updated_at": ISODate(...)
}


Campaign Launch Data Flow

┌──────────────────────────┐
│ Backend: Create Campaign │
└──────────────────────────┘
          │
          ▼
  POST /campaigns (JSON)

  {
    "name": "New Year Sale",
    "agent_id": "agent-uuid",
    "phone_numbers": ["+1234567890", "+1234567891"],
    "start_time": "2024-02-01T00:00:00Z",
    "end_time": "2024-02-28T23:59:59Z"
  }

┌──────────────────────────────┐
│ Backend: Validation & Setup  │
└──────────────────────────────┘
          │
  ┌───────┼────────┐
  │       │        │
  ▼       ▼        ▼
1. Fetch 2. Check 3. Create
   Agent  Limits   Campaign
   Config         Document

┌──────────────────────┐
│ Save to MongoDB      │
│ campaigns collection │
└──────────────────────┘
          │
          ▼
  ┌───────────────────┐
  │ Schedule calls    │
  │ via Vobiz API     │
  └───────────────────┘
          │
          ▼
  Response: campaign_id

Analytics Data Flow

Real-time call data
       │
       ├─► MongoDB (Raw logs)
       │
       ├─► Aggregation Pipeline
       │
       └─► Cached metrics
           (Redis optional)

Frontend Dashboard
       │
       ▼
GET /analytics/calls?agent_id=...
GET /analytics/sentiment?campaign_id=...
GET /analytics/top-phrases?agent_id=...
       │
       ▼
  Backend aggregates
       │
       ▼
  Return JSON metrics
       │
       ▼
  Display in charts

Analytics Data in MongoDB:

{
  "_id": ObjectId(...),
  "type": "call_analytics",
  "agent_id": "agent-uuid",
  "campaign_id": "campaign-uuid",
  "period": "2024-01-29",
  "metrics": {
    "total_calls": 150,
    "completed_calls": 145,
    "average_duration": 180,
    "sentiment": {
      "positive": 95,
      "neutral": 40,
      "negative": 10
    },
    "top_phrases": [
      "pricing",
      "features",
      "support"
    ]
  }
}


Recording Retrieval Data Flow

User Action: Download call recording

Frontend
    │
    ├─► User clicks "Download"
    │
    ├─► GET /call-logs/{call_id}
    │
    ▼
Backend
    │
    ├─► Fetch metadata from MongoDB
    │
    ├─► Generate pre-signed URL for MinIO
    │
    ▼
Response
    │
    ├─► {
         "call_id": "...",
         "download_url": "https://minio:9001/...",
         "transcript": "...",
         "duration": 120
       }
    │
    ▼
Frontend
    │
    ├─► User clicks download link
    │
    └─► Browser downloads audio from MinIO

Authentication Data Flow

┌──────────────────────────────┐
│ Frontend: Login Form         │
│ Email: user@example.com      │
│ Password: ••••••••           │
└──────────────────────────────┘
          │
          ▼
POST /auth/login
{
  "email": "user@example.com",
  "password": "••••••••"
}

┌──────────────────────────────┐
│ Backend: Authenticate        │
└──────────────────────────────┘
          │
    ┌─────┴─────┐
    │           │
    ▼           ▼
1. Fetch   2. Hash &
   User       Compare
   from       password
   MongoDB
    │           │
    └─────┬─────┘
          │
          ▼
    ┌──────────────┐
    │ Credentials  │
    │ Valid?       │
    └──────────────┘
          │
          ├─── YES ──► Generate JWT
          │                │
          │                ▼
          │           {
          │             "user_id": "uuid",
          │             "email": "...",
          │             "iat": 1234567890,
          │             "exp": 1234571490,
          │             "permissions": ["read", "write"]
          │           }
          │                │
          │                ▼
          │           Response:
          │           {
          │             "token": "eyJ0eXAi...",
          │             "expires_in": 3600
          │           }
          │
          ├─── NO ──► Response 401 Unauthorized

Caching & Performance

Backend Caching

Request 1: GET /agents/123
  │
  ├─► Cache MISS
  │      │
  │      ▼
  │   Query MongoDB
  │      │
  │      ▼
  │   Store in cache (TTL: 10min)
  │      │
  └─────► Return to client

Request 2: GET /agents/123 (within 10min)
  │
  ├─► Cache HIT
  │      │
  └─────► Return from cache (no DB query)

Database Query Optimization

Without index (slow):

SCAN all documents in call_logs collection
Filter by campaign_id
O(n) - millions of documents

With index (fast):

LOOKUP in index for campaign_id
O(log n) - direct access


Error Handling & Recovery

STT Failure Scenario

Voice Server receives audio
    │
    ▼
Send to STT Service
    │
    X Error: Service unavailable
    │
    ▼
Retry with backoff (1s, 2s, 4s)
    │
    ├─► Success on retry 2
    │      │
    │      ▼
    │   Continue call flow
    │
    └─► All retries failed
         │
         ▼
      Fallback: Ask user to repeat
      │
      ▼
      If persistent: End call gracefully

Summary

Operation Data Path Storage Latency
Voice Call Audio frames → STT → LLM → TTS MinIO + MongoDB <500ms
Create Agent Form → Backend → MongoDB MongoDB <200ms
Launch Campaign Form → Backend → Vobiz MongoDB <500ms
Get Analytics Query MongoDB → Aggregate MongoDB <1000ms
Download Recording Pre-signed URL → MinIO MinIO Depends on file size

Next Steps