/qa-run: AI-Driven QA That Closes the Loop

The AI Development Loop — 6-Part Series

AI-Assisted Development: A Loop, Not a Chat
/plan-issue: Collaborative Planning with AI
/work-issue: Autonomous Implementation
/qa-run: AI-Driven QA That Closes the Loop
Specialist Agents: Looking at Every Page with Different Eyes
The 8th Specialist: An AI That Breaks Things on Purpose

This is Part 4 of my series on AI-assisted development. Part 1 covers the loop, Part 2 covers planning, Part 3 covers implementation.

After /work-issue commits code, how do I know it actually works? The command runs unit tests and linting, but that doesn’t catch what users see — broken layouts, confusing UX, missing permissions, slow loads, state that doesn’t update after an action.

I built /qa-run for this, and it taught me something I didn’t expect: the most valuable thing AI can do for QA isn’t finding bugs — it’s looking at the same page from perspectives I wouldn’t think to check. It opens a real browser, goes through predefined user journeys, and at each step a bunch of specialist AI agents look at the page and report what they find. Findings go back into Linear as new issues — which closes the loop.

How I set it up

/qa-run 07                           # Run journey 07 (dashboard navigation)
/qa-run 01                           # Run journey 01 (full entity lifecycle)
/qa-run 03 --specialists=qa,security # Only run QA and security specialists
/qa-run 07 --step=5                  # Resume from step 5

Phase 0: Setup

The AI reads the journey file from qa/journeys/, loads the specialist checklists from qa/specialists/, creates a report file, and checks the app is running by hitting http://localhost:5173/login.

Phase 1: Step-by-step execution

Each journey is a sequence of user actions — login, navigate, click, fill forms, verify state. For each step:

Do the action — click, navigate, fill a form (real browser, real clicks via Playwright)
Take a snapshot — grab the page state, screenshot, console messages, network requests
Let the specialists look at it — multiple AI agents check the page in parallel
Write it down — add findings to the report file

Phase 2: Report

After all steps are done (or something blocks the journey), the AI writes a summary — how many issues by severity, a sorted list of findings, and top-5 things to fix.

The specialist model

This is the part I find most interesting. Each step gets evaluated by multiple specialist agents running in parallel — each with their own checklist and perspective. I’ll cover the specialist model in depth in Part 5, but here’s the overview.

I have 7 specialists, each defined as a markdown file in qa/specialists/:

Specialist	What it looks for
QA	Functional correctness — does the page load, does the data match, do transitions work
UX	Usability — is navigation clear, are there dead ends, is cognitive load reasonable
UI	Visual quality — layout, spacing, responsive behavior, color contrast
Security	Auth issues, session management, data exposure, cross-role leaks
Performance	Load times, console errors, unnecessary network requests
Data Leakage	Cross-role data exposure, API over-fetching
Language	Spanish localization consistency (the app is in Spanish)

They run as parallel sub-agents via Claude Code’s Task tool. For a step with 5 specialists, all 5 evaluate simultaneously. Each gets the page snapshot, console messages, network requests, and the step’s expected behavior. Each returns findings in their own severity scale.

If any specialist reports a BLOCKER, the journey stops. I’d rather see the blocker, plan a fix, and re-run than have the AI try to auto-fix in context.

The journeys

Journeys are step-by-step scripts in qa/journeys/. They model real user workflows:

ID	Journey	Roles	Duration
J01	Core entity lifecycle	Manager, User, Reviewer	15-20 min
J02	Template management	Admin	10-15 min
J03	Data entry flow	User	10-15 min
J04	Review & approval	Reviewer	10-15 min
J05	Export & reporting	Admin, Reviewer	10-15 min
J06	User management	Admin, Manager	10-15 min
J07	Dashboard & navigation	All roles	10-15 min
J08	Full lifecycle end-to-end	All roles	30+ min

Each journey defines:

Metadata — which roles, prerequisites (like make reset for fresh data), priority
Steps — action, role, which specialists evaluate, expected behavior, and what counts as a blocker

Here’s what a step looks like in the journey definition:

### Step 6: User Opens Detail View
- **Action**: Click on a pending item to open /items/:id
- **Role**: User
- **Specialists**: [qa, ux, ui, performance]
- **Expected**:
  - Detail view loads with correct data
  - Action buttons are visible and enabled
  - Progress indicator is shown
  - No console errors
- **Blocking**: Detail view fails to load or actions are disabled

What a report looks like

Here’s a real excerpt from a J07 (Dashboard & Navigation) report:

## Step 8: Reviewer Accessible Routes
**Status**: PASS
**Action**: Navigate to /reviews, /templates, /library, /calendar, /analytics

### Findings
- [OK] /reviews — loaded with 1 pending item, stats panel functional
- [WARNING] /templates — redirected with "No tienes permisos" alert.
  Journey definition lists this as accessible but templates are admin-only.
  This is correct app behavior — the journey definition needs updating.
- [OK] /library — loaded with tabs and sidebar filter
- [OK] /calendar — loaded with 12-month layout
- [WARNING] /analytics — page structure loaded but several API endpoints
  returned errors. Data sections empty.
- [SEC-OK] No admin data leaked on any page

And the summary:

## Summary

| Severity | Count |
|----------|-------|
| BLOCKER  | 0     |
| BUG      | 0     |
| WARNING  | 2     |
| OK       | 70+   |

## Top-5 Recommendations

1. Fix analytics API endpoints — multiple endpoints return errors
2. Update journey definition for reviewer /templates access
3. Role-based navigation is solid — each role sees appropriate nav items
4. Consistent redirect behavior by role — good pattern
5. User scoping by role is well-implemented

The key insight: findings that write themselves

The nice thing about /qa-run is that findings are already structured — I don’t have to interpret anything. When I see:

[WARNING] /analytics — several API endpoints return errors

That becomes a Linear issue:

Title: Fix analytics API — multiple endpoints returning errors
Description: /analytics page loads structure but data endpoints fail.
Listed endpoints and error details from the QA report.

Found by: QA Journey J07, Step 8

That issue goes into Backlog. I run /plan-issue on it, refine it into a spec, move it to Todo. /work-issue picks it up, implements the fix, commits. Next time I run /qa-run 07, those endpoints should return data.

That’s the loop. Plan → Work → QA → New Issues → Plan again.

Writing new journeys

When I add a new feature, I write a new journey (or extend an existing one). A journey is just a markdown file — no code, no framework, no test runner. The AI interprets the steps and drives the browser.

This makes journeys easy to write and easy to maintain. When the UI changes, I update the journey description in plain English. No selectors to fix, no test helpers to update.

The downside is that journeys are less precise than coded tests. The AI might read a step slightly differently each run. But for checking things across roles, permissions, and user flows, I’ll take the occasional inconsistency over not checking at all.

What I’ve learned running QA regularly

Run QA after every batch. After processing a batch of issues with /work-issue, run the relevant journeys. This catches interaction bugs that unit tests miss.

Start with J07 (navigation). It’s fast, covers all roles, and catches permission/routing issues immediately. It’s my smoke test.

Use --step=N to resume. If a journey blocks at step 8, fix the blocker, then run /qa-run 07 --step=8 to resume instead of re-running from the start.

Keep specialist evaluations focused. Not every step needs every specialist. A login step needs QA and security, not UI and performance. The journey definitions specify which specialists evaluate each step.

Read the warnings. BLOCKERs are obvious — the journey stops. But WARNINGs accumulate. Three warnings about inconsistent empty states across different pages means there’s a pattern to fix.

What this taught me about testing

/qa-run isn’t a replacement for traditional testing — I still have unit tests and Playwright E2E. What it adds is the specialist perspectives I wouldn’t think to check myself.