What is a voice agent?+
A voice agent is an AI system that processes spoken input, reasons with an LLM, and responds with synthesized speech — all in real time. Used for customer service bots, voice assistants, and accessibility tools.
What stack do real-time voice agents use?+
Typical stack: Deepgram or Whisper for STT, GPT-4o or Claude for reasoning, ElevenLabs or Play.ai for TTS, and LiveKit or Daily for WebRTC transport. LiveKit Agents and Pipecat abstract the full stack.
What latency is needed for a good voice agent experience?+
Sub-800ms end-to-end latency feels natural. Above 1.5s feels like a phone call delay. Achieving <500ms requires streaming STT, streaming LLM output, and streaming TTS — all piped together.
How do voice agents handle interruptions?+
Good voice agents implement Voice Activity Detection (VAD) to detect when a user starts speaking and immediately stop the agent's speech output (barge-in support). LiveKit Agents and Pipecat handle this natively.