Real-time conversational AI over voice

Voice agents combine speech-to-text, LLM reasoning, and text-to-speech into sub-second pipelines for phone bots, voice assistants, and real-time communication apps.

Live dataBrowse all →

Top Voice Agents

Ranked by GitHub stars

See all →

No entries yet — check back soon.

Frequently asked questions about Voice Agents

What is a voice agent?+

A voice agent is an AI system that processes spoken input, reasons with an LLM, and responds with synthesized speech — all in real time. Used for customer service bots, voice assistants, and accessibility tools.

What stack do real-time voice agents use?+

Typical stack: Deepgram or Whisper for STT, GPT-4o or Claude for reasoning, ElevenLabs or Play.ai for TTS, and LiveKit or Daily for WebRTC transport. LiveKit Agents and Pipecat abstract the full stack.

What latency is needed for a good voice agent experience?+

Sub-800ms end-to-end latency feels natural. Above 1.5s feels like a phone call delay. Achieving <500ms requires streaming STT, streaming LLM output, and streaming TTS — all piped together.

How do voice agents handle interruptions?+

Good voice agents implement Voice Activity Detection (VAD) to detect when a user starts speaking and immediately stop the agent's speech output (barge-in support). LiveKit Agents and Pipecat handle this natively.

Explore related categories

🕸️Multi-Agent Frameworks 💻Coding Agents 🌐Browser Agents