The AI Development Loop — 5-Part Series
  1. AI-Assisted Development: A Loop, Not a Chat
  2. /plan-issue: Collaborative Planning with AI
  3. /work-issue: Autonomous Implementation
  4. /qa-run: AI-Driven QA That Closes the Loop
  5. Specialist Agents: Looking at Every Page with Different Eyes

AI-Assisted Development: A Loop, Not a Chat

I’ve been building a healthcare compliance platform (brot.health, Go + React) with Claude Code for the past few months. I’ve been working with LLMs as coding partners for over a year, but this is the first project where I’ve built the entire architecture around AI-assisted workflows from day one — treating the AI not as a sidekick but as a full-time team. Along the way I’ve hit the same wall that I think most people hit with AI-assisted development: the bottleneck isn’t the AI — it’s me.

I need to explain context, review output, catch hallucinations, verify behavior, and file new work. The AI writes code fast, but only if I do the work of feeding it the right inputs and validating the right outputs. That overhead was eating all the time I thought I was saving.

So I built a workflow around it. Three slash commands that form a loop. This is what works for me — your mileage may vary, and it’s definitely not the only way. But it’s made AI development manageable for my project and my team size (one, plus the AI).

The problem with working inline

My early approach was entirely inline — I’d open Claude Code, describe what I wanted, let it write and edit files directly, fix what didn’t work, ask for more changes, fix more things. It works for small things. It fell apart once the project grew.

The issues I kept running into:

  • Context drift. The AI forgets what it was doing. I re-explain the same architecture decisions.
  • Scope creep. A “simple feature” touches 8 files across backend, frontend, and database. The AI doesn’t know where to stop.
  • No verification. I eyeball the output and ship it. Bugs pile up.
  • No feedback loop. QA findings live in my head. They never become structured work items that the AI can pick up later.

The loop: Plan, Work, QA

The biggest lesson I’ve learned is that AI development works best when it mirrors how structured teams work — not as a single stream of consciousness, but as distinct phases with clear handoffs. I use Linear as my issue tracker, and the three phases map to transitions in the issue lifecycle:

/plan-issue  →  /work-issue  →  /qa-run
     ↑                              |
     ←—— new issues from QA ———’

Each phase has a clear input, a clear output, and a clear handoff:

  1. /plan-issue — I sit with the AI and refine a vague idea into a concrete, implementable spec. This is collaborative — the AI explores the codebase, I make decisions. The output is a Linear issue (moved to Todo) with acceptance criteria, technical approach, and testing strategy. No code is written.
  2. /work-issue — The AI picks up a Todo issue from Linear and implements it autonomously. It reads the spec, writes code following existing patterns, writes tests, runs linting, self-reviews, commits, and marks the Linear issue Done. I’m not involved. One issue at a time, fresh context each time.
  3. /qa-run — The AI drives a real browser through predefined user journeys. At every step, multiple specialist sub-agents observe the same page in parallel — one checks functional correctness, another evaluates UX, another probes security, another measures performance. Each brings a different lens to the same screen. Findings get structured into reports with severity levels and evidence. Blockers and bugs become new Linear issues, which feed back into /plan-issue.

What I learned about giving the AI a memory

One thing I didn’t expect to matter so much: the AI needs its own reference material. I ended up creating a docs/ folder in the project root with architecture documentation that the AI reads during both planning and implementation:

docs/
  architecture/
    platform-overview.md    # System design, service boundaries
    tech-stack.md           # Stack conventions, library choices
    data-model.md           # Database schema, relationships
  features/                 # Feature-specific docs
  workflow/                 # Business process docs
  api/                      # API surface docs

The AI reads these docs at the start of every /plan-issue and /work-issue run. This gives it persistent architectural context without me re-explaining things each time.

Keeping these docs current is part of the workflow — /plan-issue checks if the planned work would change the architecture or data model, and asks me if I want to update the docs. I resisted this at first because it felt like overhead. But I learned that stale docs cause worse problems than no docs — the AI confidently follows outdated patterns. The maintenance cost is small compared to debugging implementations based on wrong assumptions.

What actually changed for me

Each piece only needs to know about its own thing. The AI doesn’t need to hold the entire project in its head. /plan-issue only needs the issue and the relevant code. /work-issue gets a spec with file paths and acceptance criteria. /qa-run follows journey steps and checks what it sees.

I only show up where it matters. Planning is where I make product decisions, answer questions, decide scope. Implementation follows patterns. QA follows checklists. I spend my time on the parts where thinking is needed.

QA findings go back into the issue tracker. They’re not notes in my head — they’re reports with severity levels and evidence. I can turn a [WARNING] Analytics patterns page has 6 failing API endpoints into a Linear issue in minutes. That issue goes through /plan-issue, gets picked up by /work-issue, and the next /qa-run checks if it’s fixed.

Headless batch mode. /work-issue works fine interactively, but the nice thing is you can also run it headless with claude -p. There’s a shell script that processes multiple issues one after the other, each in a fresh context:

./scripts/work-linear-issues.sh -n 10 --stop-on-error

Each issue gets its own context window, its own commit, its own Linear update. I get push notifications via ntfy on my phone as each issue completes.

My actual daily rhythm

My workflow has settled into a natural cycle:

  • Afternoon: I run /plan-issue on 40-50 issues. This is the collaborative part — I’m making product decisions, answering questions, scoping work. Takes maybe an hour.
  • Evening: I kick off the batch script and close my laptop. The issues process overnight, one by one, each in a fresh context.
  • Morning: I review the commits, check the Linear updates, and run /qa-run on the relevant journeys. QA findings become new issues in Backlog. The laptop has been working while I slept.

My laptop basically runs all day. It processes issues overnight and runs QA in the morning while I go through the commits. I still review everything. I still make every product decision. But instead of writing all the code myself, I plan 40-50 issues in an afternoon, go to sleep, and wake up with most of them done. The ones that failed, I re-plan with more detail.

What this doesn’t solve

I want to be honest about the limitations:

  • It still requires my time. Planning takes 5-10 minutes per issue. Reviewing commits takes time. Running QA takes time. It’s less time than doing everything myself, but it’s not zero.
  • The AI still makes mistakes. Bad implementations happen. Tests sometimes pass for the wrong reasons. Self-review catches some issues but not all. I review every commit in the morning.
  • It works for my project size. This is a single codebase, single developer, maybe 30k lines. I haven’t tried this on a large team or a monorepo.
  • Vague issues produce vague implementations. Garbage in, garbage out. If I skip planning, autonomous implementation is a coin flip.
  • Not every issue succeeds. Out of 8 issues overnight, maybe 1-2 fail. They stay In Progress in Linear with a comment explaining what went wrong. I re-plan them with more detail and they usually work on the second pass.

What’s next

In the next posts, I’ll walk through each phase — not just how the commands work, but the specific lessons I’ve learned about what makes each phase reliable (and what I got wrong along the way).