HomeExploreVoice Agents
🎙️

Real-time conversational AI over voice

Voice agents combine speech-to-text, LLM reasoning, and text-to-speech into sub-second pipelines for phone bots, voice assistants, and real-time communication apps.

Live dataBrowse all →

Top Voice Agents

Ranked by GitHub stars

See all →
No entries yet — check back soon.

Frequently asked questions about Voice Agents

What is a voice agent?+
A voice agent is an AI system that processes spoken input, reasons with an LLM, and responds with synthesized speech — all in real time. Used for customer service bots, voice assistants, and accessibility tools.
What stack do real-time voice agents use?+
Typical stack: Deepgram or Whisper for STT, GPT-4o or Claude for reasoning, ElevenLabs or Play.ai for TTS, and LiveKit or Daily for WebRTC transport. LiveKit Agents and Pipecat abstract the full stack.
What latency is needed for a good voice agent experience?+
Sub-800ms end-to-end latency feels natural. Above 1.5s feels like a phone call delay. Achieving <500ms requires streaming STT, streaming LLM output, and streaming TTS — all piped together.
How do voice agents handle interruptions?+
Good voice agents implement Voice Activity Detection (VAD) to detect when a user starts speaking and immediately stop the agent's speech output (barge-in support). LiveKit Agents and Pipecat handle this natively.

Explore related categories

🕸️Multi-Agent Frameworks💻Coding Agents🌐Browser Agents