Posts Tagged "real-time-systems"
Voice Agents Don't Need to Be Faster — They Need to Feel Faster
Two agents with identical latency can feel completely different. The gap is fixable at the orchestration layer, without touching the model.
Read Post
Voice Agents Don't Know When You're Done Talking
Most builders assume end-of-turn detection is a silence threshold. That model breaks in production. The fix is architectural: four probabilistic events, speculative reasoning, and everything downstream cancellable.
Read Post
The Boring Stuff That Keeps AI Running at 3am
Exponential backoff, dual timeouts, SSE heartbeats, idempotency caches — the unglamorous patterns that keep LLM-powered systems running at 3am.
Read Post
Sub-10ms AI Responses Without Calling the LLM
Users ask similar questions in different words. Semantic caching with pgvector turns repeated intent into instant answers — no LLM call, no embedding, no retrieval pipeline.
Read Post