Posts Tagged "production-ai"

Voice Agents Don't Know When You're Done Talking

Most builders assume end-of-turn detection is a silence threshold. That model breaks in production. The fix is architectural: four probabilistic events, speculative reasoning, and everything downstream cancellable.

25 Issues Overnight: Batch AI That Doesn't Need You

The leap from AI-assisted coding to autonomous batch processing. Fresh context per task, filesystem locks, model routing, and orchestration that runs while you sleep.

The Boring Stuff That Keeps AI Running at 3am

Exponential backoff, dual timeouts, SSE heartbeats, idempotency caches — the unglamorous patterns that keep LLM-powered systems running at 3am.

Sub-10ms AI Responses Without Calling the LLM

Users ask similar questions in different words. Semantic caching with pgvector turns repeated intent into instant answers — no LLM call, no embedding, no retrieval pipeline.

Your AI Forgot What You Said 30 Messages Ago

Context windows fill up fast in long AI conversations. Sliding windows, progressive compression, and token budgeting — the patterns I built before I knew their names.