Posts Tagged "real-time-systems"
Voice Agents Don't Know When You're Done Talking
Most builders assume end-of-turn detection is a silence threshold. That model breaks in production. The fix is architectural: four probabilistic events, speculative reasoning, and everything downstream cancellable.
Read Post
The Boring Stuff That Keeps AI Running at 3am
Exponential backoff, dual timeouts, SSE heartbeats, idempotency caches — the unglamorous patterns that keep LLM-powered systems running at 3am.
Read Post
Sub-10ms AI Responses Without Calling the LLM
Users ask similar questions in different words. Semantic caching with pgvector turns repeated intent into instant answers — no LLM call, no embedding, no retrieval pipeline.
Read Post