← All work
Product · 2024

Real-Time Voice AI Backend (Custom LLM over WebSocket)

Overview

A FastAPI backend that serves a custom LLM to a real-time conversational-voice platform (Retell) over a WebSocket, with optional inbound/outbound phone calls via Twilio. It streams OpenAI completions back to the voice agent low-latency and supports LLM function calling.

Why It Exists

To explore building voice AI agents on our own LLM logic rather than a black-box, plugging a custom model endpoint into a managed speech/telephony layer. This validates the architecture for latency-sensitive, tool-using voice assistants.

What We Built

A websocket server (server.py/server2.py) exposing an /llm-websocket endpoint that Retell connects to; LLM orchestration modules (llm.py, llm_in_connection.py, llm_with_func_calling.py) covering plain streaming and function-calling variants; and a Twilio integration (twilio_server.py, call.py) for placing and receiving phone calls. Local exposure is handled through ngrok per the README. Pinned stack includes fastapi, uvicorn, openai, retell-sdk, and twilio.

Technologies & Approach

FastAPI + Uvicorn for an async WebSocket server, the OpenAI SDK for streaming completions, the Retell SDK for the voice agent contract, and Twilio for telephony. Streaming and function-calling code paths are separated to compare approaches.

Outcome / Impact

Proved that a custom, function-calling LLM backend can be driven by a managed voice platform and a telephony provider end-to-end. The README is candid that the OpenAI (vs. Azure OpenAI) endpoint introduces variable latency, a useful, honest finding for productionizing voice AI. Archived as R&D.

Capabilities Demonstrated

  • Real-time streaming voice AI over WebSockets
  • Custom LLM integration with managed conversational-voice platforms
  • LLM function/tool calling in a live agent loop
  • Telephony integration (Twilio) for inbound/outbound calls
More work See all →