Skip to content

System Architecture Overview

This document provides a high-level overview of the VoiceERA system architecture using the C4 Model for system design.

System Context (Level 1)

The "10,000-foot view" showing VoiceERA's interactions with users and external systems.

Overview

Level1

Key Actors

  • End Users: People making/receiving voice calls
  • Platform Operators: Administrators managing agents and campaigns
  • AI Providers: External services for LLM, STT, TTS
  • Telephony Provider: Vobiz platform for call handling

Container Diagram (Level 2)

Zooming in to show the major technology choices and inter-service communication.

Architecture Components

Level2

Service Responsibilities

Service Technology Responsibility
Frontend Next.js, React, TailwindCSS User interface, dashboards, real-time call monitoring
Backend FastAPI, Python API endpoints, data persistence, authentication, orchestration
Voice Server Pipecat, Python Real-time audio processing, agent orchestration, AI integration
MongoDB NoSQL Database Store users, agents, campaigns, call logs, transcripts
MinIO Object Storage Store audio files, recordings, transcripts
External AI Various APIs LLM, STT, TTS, Translation services

Key Design Patterns

1. Microservices Architecture

  • Independent, deployable services
  • Clear separation of concerns
  • Horizontal scaling capability

2. API-First Design

  • RESTful APIs for data operations
  • WebSockets for real-time communication
  • Clear service boundaries

3. Stateless Processing

  • Frontend and Backend are stateless
  • Enables horizontal scaling
  • Voice Server maintains session state (WebSocket connections)

4. External Service Integration

  • Pluggable AI providers
  • Flexible STT/TTS/LLM configuration
  • Fallback and failover support

5. Data Separation

  • Structured data in MongoDB
  • Unstructured content (audio/files) in MinIO
  • Clear data ownership per service

Communication Patterns

Synchronous Communication

Frontend ──REST/HTTP──► Backend ──HTTP──► MongoDB
         ◄─────────────       ◄──────────

Real-Time Communication

Frontend ──WebSocket────► Voice Server ───► External AI
         ◄─────────────────             ◄──
         (Audio frames, metadata)

Event-Based Processing

Backend ───write────► MongoDB
           (User event, call data)
             │
             ├──────► MinIO (Audio file)
             │
             └──────► Voice Server (via API)

Data Flow at a Glance

Voice Call Flow

1. User calls phone number
         ▼
2. Vobiz routes call to Voice Server
         ▼
3. Voice Server authenticates with Backend
         ▼
4. Voice Server processes audio:
   - STT: Audio ──► Text
   - LLM: Text ──► Response
   - TTS: Response ──► Audio
         ▼
5. Voice Server streams audio back to caller
         ▼
6. Backend logs call metadata & stores recordings

Deployment Architecture

Docker Containers

Each service runs in its own container:

docker containers

Volume Mounts

Host Machine          Docker Container
─────────────         ────────────────
./voicera_backend  ──► /app
./voicera_frontend ──► /app
./data/mongodb     ──► /data/db
./data/minio       ──► /data

Security Layers

Authentication & Authorization

User Login
   │
   ▼
Backend (JWT Token generation)
   │
   ▼
Frontend (Store JWT)
   │
   ▼
Voice Server (Validate JWT)
   │
   ▼
Audio Processing

Data Protection

  • In Transit: TLS/HTTPS for all external API calls
  • At Rest: Database and storage encryption
  • Access Control: MongoDB authentication, MinIO IAM

Scalability Considerations

Horizontal Scaling

Stateless Services: - Frontend (multiple replicas behind load balancer) - Backend (multiple replicas with shared MongoDB)

Stateful Services: - Voice Server (sticky sessions or session store like Redis) - MongoDB (replica set or sharding)

Vertical Scaling

  • Increase CPU/Memory for services
  • GPU acceleration for STT/TTS
  • Connection pooling for databases

Technology Stack Summary

Layer Technology Version
Frontend Next.js 16+
React 18+
TailwindCSS 4+
Backend FastAPI 0.100+
Python 3.10+
Uvicorn Latest
Voice Pipecat Latest
Python 3.10+
Database MongoDB 5.0+
Storage MinIO Latest
Infrastructure Docker 20.10+
Docker Compose 1.29+
Nginx Latest

Next Steps