Posts Tagged "production-ai"

An AI Wrote 200% Where It Meant 20%. The Bound Caught It.

Confidence is what the model thinks of itself. Bounds are what your system thinks of the model. They are independent signals; use both.

There Is No Workflow Engine. It's Just Git.

My agent pipeline runs for hours, survives Ctrl+C, and resumes from where it stopped. There's no Temporal, no queue, no database. The state lives in YAML files and commits.

I ran a maintenance agent for 10 days. The number that mattered was 14

A maintenance agent filed 559 bugs and fixed 412 on its own. The interesting number is the 14 it refused to touch.

Parallelism Was a Workaround. Agents Make It Optional.

Feature branches and merge queues were how we coped with slow humans. When an agent finishes a task in ten minutes, the overhead is the bottleneck.

Don't Read the PDF. Write the Parser.

I stopped feeding hospital PDFs to a vision model. When the layout changes, the AI fixes the parser instead — and production never sees a token.

Voice Agents Don't Know When You're Done Talking

Most builders assume end-of-turn detection is a silence threshold. That model breaks in production. The fix is architectural: four probabilistic events, speculative reasoning, and everything downstream cancellable.

The 8th Specialist: An AI That Breaks Things on Purpose

An 8th specialist that touches the browser — navigating, clicking, and typing to find bugs that scripted journeys miss. How exploratory testing with Playwright catches what verification alone cannot.

Specialist Agents: Looking at Every Page with Different Eyes

Seven specialists, each with their own checklist — QA, UX, UI, Security, Performance, Data Leakage, Language. How splitting evaluation into focused agents catches more.