25 Issues Overnight: Batch AI That Doesn't Need You

The leap from “AI helps me code” to “AI codes while I sleep” wasn’t the AI. The model was already capable of writing code, running tests, and committing. What was missing was the orchestration — the loop that feeds it tasks one by one, monitors progress, handles failures, and moves on to the next issue without human intervention.

I wrote about the Plan, Work, QA loop in an earlier post. This post is about the mechanics of running that loop unattended — at scale, overnight, across dozens of issues.

Fresh context per task

The single most important design decision in batch AI processing: every task gets its own context. No shared state between issues. No accumulated conversation history from previous tasks. Clean slate.

I learned this the hard way. Early on, I tried processing issues in a single long-running session. Issue 3 would fail because the AI was still thinking about the refactoring it did in issue 2. Variable names from one task would bleed into another. A failed test in issue 5 would make the AI cautious for issues 6 through 10, producing unnecessarily conservative code.

// ProcessBatch runs a list of tasks sequentially, each in a fresh
// AI context. Returns results for all tasks, including failures.
func ProcessBatch(ctx context.Context, tasks []Task, config BatchConfig) []TaskResult {
    results := make([]TaskResult, 0, len(tasks))

    for i, task := range tasks {
        log.Printf("[%d/%d] Processing: %s", i+1, len(tasks), task.Title)

        // Each task gets a fresh context — no shared state
        taskCtx, cancel := context.WithTimeout(ctx, config.TaskTimeout)

        result := processOneTask(taskCtx, task, config)
        cancel()

        results = append(results, result)

        if result.Status == StatusFailed && config.StopOnError {
            log.Printf("Stopping batch: task %q failed", task.Title)
            break
        }

        // Notify progress
        if config.NotifyURL != "" {
            notifyProgress(config.NotifyURL, task.Title, result.Status, i+1, len(tasks))
        }
    }

    return results
}

Fresh context means each task gets exactly the information it needs: the task spec, the relevant code paths, and the project’s architecture docs. Nothing else. The AI doesn’t know what it worked on five minutes ago, and that’s a feature.

The batch loop

The orchestrator is a shell script that coordinates the full lifecycle: fetch tasks, launch AI agents, monitor progress, handle retries, report results.

#!/bin/bash
set -euo pipefail

LOCK_FILE="/tmp/batch-processor.lock"
MAX_ISSUES=${1:-10}
STOP_ON_ERROR=${2:-false}

# Filesystem lock prevents overlapping runs
if [ -f "$LOCK_FILE" ]; then
    pid=$(cat "$LOCK_FILE")
    if kill -0 "$pid" 2>/dev/null; then
        echo "Batch already running (PID $pid). Exiting."
        exit 1
    fi
    echo "Stale lock file found. Cleaning up."
    rm "$LOCK_FILE"
fi
echo $$ > "$LOCK_FILE"
trap 'rm -f "$LOCK_FILE"' EXIT

# Fetch queued tasks from the issue tracker
ISSUES=$(fetch-todo-issues --limit "$MAX_ISSUES" --format json)
TOTAL=$(echo "$ISSUES" | jq length)

echo "Processing $TOTAL issues..."

SUCCEEDED=0
FAILED=0

for i in $(seq 0 $((TOTAL - 1))); do
    ISSUE_ID=$(echo "$ISSUES" | jq -r ".[$i].id")
    ISSUE_TITLE=$(echo "$ISSUES" | jq -r ".[$i].title")

    echo ""
    echo "=== [$((i+1))/$TOTAL] $ISSUE_TITLE ==="

    # Launch AI agent with fresh context, capture exit code
    if process-issue --id "$ISSUE_ID" --timeout 30m; then
        SUCCEEDED=$((SUCCEEDED + 1))
        echo "✓ Completed: $ISSUE_TITLE"
    else
        FAILED=$((FAILED + 1))
        echo "✗ Failed: $ISSUE_TITLE"

        if [ "$STOP_ON_ERROR" = "true" ]; then
            echo "Stopping on error."
            break
        fi
    fi
done

echo ""
echo "=== Batch complete: $SUCCEEDED succeeded, $FAILED failed ==="

# Send summary notification
curl -s -d "Batch complete: $SUCCEEDED/$TOTAL succeeded, $FAILED failed" \
    "${NOTIFY_URL:-}"

The key elements:

  • Filesystem lock. A PID file in /tmp prevents two batch runs from overlapping. The trap ensures cleanup on exit, and the stale-lock check handles crashes.
  • Sequential processing. One issue at a time. Parallel processing sounds faster, but concurrent AI agents writing to the same codebase create merge conflicts and race conditions.
  • Failure isolation. A failed task doesn’t kill the batch (unless --stop-on-error is set). The AI comments on the issue explaining what went wrong, and the batch moves on.
  • Progress notifications. Each task completion sends a push notification. I get pings on my phone as issues complete overnight.

Scheduled runs

The batch script runs on a cron schedule. I typically queue up issues in the afternoon (planning phase), then the batch processes them overnight.

# Process queued issues at 11pm, Monday through Friday
0 23 * * 1-5  cd /path/to/project && ./scripts/batch-process.sh 25 >> /var/log/batch.log 2>&1

The number (25) is the maximum issues per run. In practice, each issue takes 10–30 minutes depending on complexity, so 25 issues fills a full overnight window. If some fail, they stay in the queue for the next run — or I re-plan them with more detail the next afternoon.

Tenant-scoped processing

This is where the batch orchestrator starts shaping the product itself, not just my dev workflow. The app serves multiple organizations, so I’m building tenant isolation into the batch layer. Each batch job processes one tenant at a time, and the AI context only includes that tenant’s configuration, documents, and data.

// RunTenantBatch processes all queued tasks for a single tenant.
// Each tenant's tasks run in isolation — no cross-tenant data leaks.
func RunTenantBatch(ctx context.Context, tenantID string, config BatchConfig) (*BatchReport, error) {
    // Load tenant-specific configuration
    tenantConfig, err := loadTenantConfig(ctx, tenantID)
    if err != nil {
        return nil, fmt.Errorf("loading tenant config: %w", err)
    }

    // Fetch only this tenant's queued tasks
    tasks, err := fetchQueuedTasks(ctx, tenantID)
    if err != nil {
        return nil, fmt.Errorf("fetching tasks: %w", err)
    }

    // Select model based on tenant tier
    model := selectModel(tenantConfig.Tier, config.CostBudget)

    report := &BatchReport{
        TenantID:  tenantID,
        StartedAt: time.Now(),
    }

    for _, task := range tasks {
        result := processOneTask(ctx, task, model, tenantConfig)
        report.Results = append(report.Results, result)
    }

    report.FinishedAt = time.Now()
    return report, nil
}

I caught this during testing: without tenant scoping, the AI was referencing documents from one organization while working on another’s tasks. In a multi-tenant system, that’s a data leak waiting to happen.

Model routing

Another pattern I’m building into the product layer. Not all tasks need the same model. A simple text formatting fix doesn’t need the most capable (and expensive) model. A complex architectural change does. Model routing matches task complexity to model capability and cost.

// SelectModel chooses the appropriate AI model based on task
// priority and the remaining cost budget for this batch run.
func SelectModel(priority TaskPriority, remainingBudget float64) string {
    switch {
    case priority == PriorityCritical:
        // Critical tasks always get the best model
        return ModelHighCapability

    case remainingBudget < LowBudgetThreshold:
        // Running low on budget — use the efficient model
        return ModelCostEfficient

    case priority == PriorityHigh:
        return ModelHighCapability

    default:
        // Routine tasks use the cost-efficient model
        return ModelCostEfficient
    }
}

const (
    ModelHighCapability = "claude-sonnet-4-20250514"
    ModelCostEfficient  = "claude-haiku-4-5-20251001"
    LowBudgetThreshold  = 5.0 // dollars remaining
)

The routing is simple: critical and high-priority tasks get the capable model, everything else gets the efficient one. There’s also a budget guard — if the batch is running low on its cost budget, even high-priority tasks get the cheaper model. This prevents a bad batch from burning through your entire budget for the week.

In practice, about 70% of batch tasks use the cost-efficient model. The quality is good enough for routine implementations, and the cost savings add up. The capable model handles the remaining 30% where the complexity justifies the cost.

If you want to go deeper

  • Agentic workflows — autonomous AI agents that complete multi-step tasks without human intervention. The batch loop is an agentic workflow coordinator.
  • Job scheduling — cron-based task execution with locking and failure handling. The same patterns run ETL pipelines, data processing, and now AI batch processing.
  • Batch orchestration — coordinating multiple independent jobs with monitoring, retry logic, and progress reporting. Airflow, Luigi, and every workflow engine does this. I just did it with a shell script.

Those are the keywords to search. The irony is that the most “advanced AI” part of this system — the batch orchestrator — uses the most old-school infrastructure patterns. Cron jobs, PID files, filesystem locks. It works.

What doesn’t work

  • Vague specs produce vague implementations. This is the biggest lesson. In interactive mode, you can course-correct mid-conversation. In batch mode, the AI gets one shot. If the issue spec says “improve the dashboard,” you get a random interpretation. Batch mode amplifies planning quality — garbage in, garbage out at scale.
  • Sequential processing is slow. 25 issues at 15 minutes each = 6+ hours. Parallel processing would be faster, but the AI agents would step on each other’s code. I’ve experimented with git worktrees for parallel processing, but the merge step introduces its own complexity.
  • Not every failure is retryable. Some issues fail because the spec is wrong, not because the AI had a bad run. Automatically retrying these wastes time and money. I’m still manually reviewing failed issues to decide whether to retry or re-plan.
  • Cost spikes are real. A batch of 25 complex issues using the capable model can cost $30–50. Without the budget guard, it’s easy to accidentally blow through your testing budget in one overnight run.
  • The cron job doesn’t know about holidays. I’ve woken up to 25 completed issues on a day when I had no time to review them. The issues pile up, reviews get rushed, and bugs slip through.