Building with AI Agents

A Multi-Agent Development Workflow for Claude Code and OpenAI Codex

Methodology version: 2026-06-27 — if you're using this method as a standard in your organisation, pin this version. See version history.

Quick Start

Copy-paste one of these into Claude Code or Codex. The agent reads this page and walks you through everything.

Start from scratch — the agent interviews you and creates the plan:

I want to build with AI agents just like Mikko does, read the instructions from https://mikkosniemela.com/build-with-agents

Resume as the builder — plan already exists:

I want to build with AI agents just like Mikko does, read the instructions from https://mikkosniemela.com/build-with-agents — I am the builder, my plan file is ______.md (also read handoff.md if it exists)

Resume as the auditor — plan already exists:

I want to build with AI agents just like Mikko does, read the instructions from https://mikkosniemela.com/build-with-agents — I am the auditor, my plan file is ______.md (also read handoff.md if it exists)

Handoff prompt — use with /compact, at session end (save the output as handoff.md), or as the first prompt of a new session:

Summarize this project for a fresh agent joining now. Cover: goal and current status; what's done, in progress, and next by step state; files involved and their purpose; decisions already made that should not be relitigated; open questions to the human; and any access or permission blockers. Max 3 paragraphs.

This page works two ways.

If you're a human reading this — read on. Understand the methodology, then start a conversation with your agent and build the plan together.

If you're an AI agent and your human sent you here without reading this themselves — you run the show. Here's your playbook:

Interview your human. Ask them what they want to build, who it's for, what the end state looks like. Help them think in capability questions, not feature lists. Keep asking until you have at least 20 strong questions. Write the goal document from the conversation.
Figure out access. Based on the goal, work out what repos, services, APIs, databases, and credentials the builder and auditor will need. Ask your human to confirm what's available and what's missing. List blockers before writing a single line of code.
Write the plan. Create plan.md with phases, acceptance criteria, and space for auditor comments. Put the access requirements at the top.
Decide on cadence. Most modern projects don't need a recurring loop at all — agents work for hours, stop at natural breakpoints, and the supervisor reviews then. If the project benefits from a cadence, present the options (for Claude Code: self-paced /loop, recurring /loop <interval>, or stop-hook polling; for Codex: polling hooks). Recommend no-cadence by default. Set up whichever they choose.
Start the loop. Tell your human to open two terminals — one for the builder, one for the auditor. Give them the exact commands to run. Then get to work.

Either way, the rest of this page explains the methodology and has all the code you need.

The Idea
Two Roles, Two Files
Plan First, Execute Second
The Cadence
Writing the Goal Document
Code: Claude Code
Code: OpenAI Codex
Running It
Version history

1. The Idea

A single AI agent asked to build and review its own code will rationalize its own output. This is the same cognitive failure that makes developer self-review unreliable. The fix is structural: separate the builder from the reviewer.

You need two things to make this work:

A plan file — the shared understanding of what needs to be built and where things stand.
A comments file — a one-way channel from the reviewer to the builder.

Each agent has strict write permissions. The builder writes the plan and the code. The auditor writes comments. Neither touches the other's file. This is the entire coordination mechanism. No orchestration framework, no message queue, no shared database. Two files.

Important: treat write boundaries as policy + guardrails. Role prompts define intent, but hooks should enforce it (for example: block builder writes to comments.md, block auditor writes to plan.md and code paths).

Hooks don't replace great thinking. Hooks are for execution. Do the plan first — brainstorm with Claude or Codex about the plan. When it's ready, then let the agents do their work.

2. Two Roles, Two Files

Role	Reads	Writes	Responsibility
Builder	Plan, code, comments	Plan, code	Executes scoped tasks from comments. Updates the plan with progress. Does not investigate or research unless explicitly instructed.
Auditor	Plan, code	Comments only	Reviews completed work. Investigates upcoming steps — maps risks, dependencies, and fragile points before the builder gets there. Writes concrete instructions. Escalates to human when needed.

The builder is an execution agent, not a discovery agent. It receives scoped tasks from comments with exact file paths, commands, and acceptance criteria. It builds, updates the plan with progress, and can write questions or notes in the plan. It does not do open-ended research or architecture investigation unless the auditor explicitly asks for it.

The auditor is always ahead. While the builder works on step A, the auditor is investigating steps B and C — mapping which repos and services are affected, identifying fragile integration points, listing what could break, and figuring out rollback options. By the time the builder finishes A, the auditor has already translated B into concrete, deterministic instructions in the comments. The builder never walks into unknown territory.

The auditor escalates to the human supervisor when it encounters:

Business decisions — anything where the right answer depends on business context the auditor can't know.
High-impact forks — decisions that change the shape of future steps, where getting it wrong is expensive to undo.
Access and permissions — anything the agents can't resolve themselves.
Cross-team coordination — anything that involves people or systems outside the project.

The human decides, the auditor translates the decision into builder instructions.

The auditor also tells the builder how to test and what evidence to produce. The builder executes the tests and reports results in the plan.

Speak to the owner, not to the system

Escalation isn't just when — it's how. The auditor's job is to translate the system's state into something the owner can act on. Two rules cover the rest.

Bottom line first. Every owner-facing update opens with the business-level truth in one plain sentence: what is blocked, where, and what action unblocks it. Implementation detail — repo names, internal codes, stack traces — comes last, if at all. After the one-sentence truth, state the decision the owner needs to make: builder can continue, owner approval required, or outside help needed.

No softening, no gatekeeping. Bad news arrives direct, complete, and early. An auditor that filters bad news to spare the owner is failing the role. If something is broken, expensive to fix, or going to slip, that is sentence one, not sentence five. The owner cannot make decisions on information they do not have.

The owner is the supervisor of a system, not a reader of its logs. Translate up: business truth first, implementation last, bad news never withheld.

Generic example.

Wrong: "Builder hit a 500 on POST /sync, traced to the v2 API client. Retries didn't help. Investigating dependencies — possible regression in last week's deploy."

Right: "The publish step is failing because the upstream API is broken. Fix is to patch and redeploy that API — builder is on it now. No owner action needed unless it slips past today."

The first version forces the owner to do the translation work. The second tells the owner what to decide.

The owner briefing

The one-sentence rule above is for a single escalation. For a recurring update — a checkpoint, a status request, a decision point — the auditor uses a fixed four-part shape. It matters most when the owner is supervising several projects at once: nobody holds the full state of four projects in their head, so each update has to let the owner re-enter a project cold in seconds and know exactly what, if anything, to do.

Four parts, always in this order:

Done — what is finished, in plain business terms. “Done” means the auditor accepted the evidence (the auditor_accepted state), not that the builder said it passed. Add technical detail only where it changes risk, timing, or a decision.
Left — what remains, one bullet per step, in the order it will happen. Call out anything actively blocking progress.
Status & risk — one short paragraph: is the project going well, badly, or blocked, and what is the single biggest risk right now.
Owner tasks — the part the owner reads first. Either “no owner action right now” or the exact action required. Then say whether the builder should be instructed now or whether everyone waits — and on what.

Every briefing ends by telling the owner one of two things: do this specific action now, or do nothing. Never leave them to infer which.

This is not a template for every message — “step 3 done, moving to 4” needs no ceremony. Use the full shape at the moments that carry a decision: at a review checkpoint, after a CI or deploy failure, before handing the next step to the builder, before any deployment decision, and whenever the owner asks “where do we stand?” These are the natural stopping points from the cadence — the briefing is what the auditor produces when it stops.

A generic example:

Done

Core integration is built and reviewed on a non-production branch.
Backend checks — lint, services, background workers — are green.
The customer-facing UI is green.

Left

Backend API test suite is still running.
Then migration-first deploy planning.
Then a live proof of the core money-path, end to end: a real transaction, the approve / reject / auto-approve paths, and reconciliation.
Then the owner's decision on a controlled rollout.

Status & risk

Good. The architecture questions are settled; we are into the CI, proof, and deploy gates. The main remaining risk is operational — migration order and the live proof must be clean so existing customers are unaffected.

Owner tasks

No owner action right now.
Do not instruct the builder until the backend API tests finish.
If they fail, the auditor investigates and writes the exact fixes in comments; if they pass, the auditor prepares the next builder instruction.

Keep this separate from the handoff summary later in the guide: the handoff is written for the next agent picking up cold; the briefing is written for the human deciding what happens next.

Why this split makes you faster

The role separation isn't just about integrity — it's about using the right model for the right job. The auditor does the thinking: investigation, risk analysis, dependency mapping, sequencing. That's work for a capable, thorough model (Claude Opus 4.8 or GPT-5.5). The builder does the building: scoped tasks with clear instructions. That's work for a fast, precise model (Claude Sonnet 4.6, Haiku 4.5, or a Codex fast tier) that doesn't need to reason about architecture because the auditor already did.

The auditor investigates the next steps, does the thinking, and makes sure the builder just has to build — no research required. This means the builder runs faster, uses fewer tokens, and produces more predictable output. The thinking happened upstream.

Go further: pair across model families

The strongest version of this pattern runs the builder on one model family and the auditor on another — Claude Opus 4.8 as auditor, GPT-5.5 as builder, or the reverse. Two different families reviewing each other can't share the same training blind spots or the same failure modes. The builder and auditor are structurally independent, not just procedurally separated. If one family rationalises a bad pattern, the other family has no reason to.

A good starting lineup today. Auditor: Claude Opus 4.8 (1M context, heavy reasoning) or GPT-5.5. Builder: Claude Sonnet 4.6, Haiku 4.5, or Codex fast tier. Pair across families when you can. Refresh this choice whenever a new flagship ships — the rest of this primer stays the same.

Who does the thinking: follow the smarter model

The default split — auditor thinks, builder executes — assumes the auditor is the smarter model. If it isn't, flip the thinking to where the capability lives.

Fast builder, heavy auditor. Sonnet 4.6 or Haiku 4.5 as builder, Opus 4.8 or GPT-5.5 as auditor. The auditor does all investigation, risk analysis, and sequencing. It translates each upcoming step into precise, executable instructions in comments.md with exact file paths, commands, and acceptance criteria. The builder just executes. This is the fastest path for well-understood work.

auditor prompt — fast builder, heavy auditor

You are the auditor and the thinker. The builder is a fast execution model — it does not investigate, research, or design. Do all investigation, risk analysis, dependency mapping, and rollback planning yourself. Before the builder reaches a step, translate it into precise, executable instructions in comments.md: exact file paths, commands, acceptance criteria, and the evidence you require. The builder should never need to reason about architecture or strategy — only execute.

Heavy builder, pragmatic auditor. Opus 4.8 as builder. Don't waste it on execution alone. The auditor delegates investigation to the builder between steps — "before you start step N, investigate X and report observations in plan.md" — and the builder returns risks, alternatives, and commentary there. The auditor reads those observations alongside the code; the human makes higher-impact decisions with both perspectives in view.

auditor prompt — heavy builder, pragmatic auditor

You are the auditor working with a heavy-reasoning builder. When you need investigation or research for upcoming steps, delegate it via comments.md: ask the builder to investigate specific open questions between build steps and report observations, risks, and commentary in plan.md. Read the builder's observations alongside the code when reviewing. For high-impact forks, escalate to the human with both your assessment and the builder's observations attached.

The rule. plan.md is the builder's voice. comments.md is the auditor's voice. If the builder has something to say beyond "step N finished" — observations, commentary, risks, alternatives — that belongs in plan.md. The plan file is a two-way analysis channel, not just a progress log.

This is also a security pattern

This structure addresses three of the most dangerous failure modes in autonomous AI systems: excessive agency (one agent with unchecked write access everywhere), cascading failures (an agent whose output feeds its own next step with no independent check), and accountability gaps (you can't tell which agent decided what went wrong). File-level write boundaries, independent review, and evidence gating before step completion are the same controls you'd apply to any high-stakes autonomous system. See the agent scenario in the AI Security Primer for the full threat model.

3. Plan First, Execute Second

The most important part of this workflow happens before any agent starts building. You create the goal document — but you don't have to write it alone. The agent interviews you.

You sit down with Claude or Codex and talk about what you want to build. The agent asks questions, challenges your assumptions, and helps you think deeper. Together you produce a goal document — a vision written from the future, describing what life looks like after the software is delivered. No implementation details. No stack decisions. Just: what can a user do, and what questions can the system answer?

Write questions, not requirements

"Which contract renewals are coming in the next 6 months, and which are at risk?" implicitly demands a far richer system than "The system shall have a contract renewal dashboard with alerting."

A question tells the builder what the user needs to know and leaves the implementation open. Questions also set the quality bar implicitly: if the system cannot answer the question, it has failed. No interpretation required.

Write at least 20 questions. The depth of your questions drives the depth of the software. If a question only requires one data source to answer, it's too shallow. The best questions require three or more.

The process

Interview — The agent interviews you. What are you building? Who is it for? What does the user's day look like after this exists? The agent pushes you to think in capability questions (at least 20) and writes the goal document from the conversation.
Access check — The agent figures out what repos, services, APIs, databases, and credentials the builder and auditor will need. It asks you what's available and flags anything missing as a blocker before any code is written.
Plan file — The agent reads the goal and writes a structured plan with phases, acceptance criteria, access requirements at the top, and space for auditor comments.
Hook setup — The agent creates the hook scripts and config files in your project directory.
Start the work — Two terminals. Builder builds, auditor reviews. By default each runs until its task batch is done and then stops for your review; add a recurring cadence only if the project needs one (Section 4). You intervene when the auditor escalates.

Why access goes first

An agent that hits a permissions wall mid-build will either stall silently or invent a workaround you didn't ask for. Both are expensive. The agent should figure out what access is required and ask you about it before writing a single line of code.

Repos and branches — which repositories, which branches, does the builder need to create new ones?
Services and APIs — does the builder need running services, API keys, database access, deployment credentials?
External tools — does the auditor need browser access for end-to-end testing? Does the builder need package registries, CI/CD pipelines?
Permissions mode — is the builder running with --dangerously-skip-permissions or will it need approval for each shell command?

If anything is missing, it goes in the plan as a blocker at step zero. Don't discover it at step five.

The plan is the memory

A 1M-context model like Claude Opus 4.8 can hold the entire goal document, plan history, and comments file in working context for multi-week projects. No compaction, no summarisation, no lost decisions. The plan file stops being just a coordination artefact and becomes a shared, durable memory that survives across sessions, context resets, and agent handoffs.

4. The Cadence

Both agents run concurrently, but they look in different directions. The builder looks at the current step and executes it. The auditor looks backward at the builder's work and forward at what's coming next. The cadence question is how often each one stops to check the shared files — and the right answer is project-specific.

Today's models — Opus 4.8, GPT-5.5 — run autonomously for hours. Earlier in 2026 they worked in short bursts, so this methodology used tight 5-minute polling loops to keep both roles synchronized. That assumption no longer holds. A modern session may only stop two or three times before the work is done, and those stops are precisely the moments where supervisor attention is most valuable.

Pick one of three cadences per project. None is the right answer in general — they trade off differently.

No cadence (default). The agent runs until its current task batch is done, stops, and waits. The supervisor reviews at that natural breakpoint and either approves the next batch or hands off to the other role. This is the honest default for most modern work: long autonomous stretches, deliberate handoffs through the shared files, supervisor in the loop at the moments that matter.

Self-paced check-back. Before stopping, the agent decides whether to schedule a one-off wake-up — 30 minutes, 2 hours, whatever fits the project — and resume on its own to check the other role's file. The model picks the cadence based on the current state of work. Useful when the agent has clear work it could do without supervisor intervention but you don't want it to stop entirely.

Continuous loop. Fixed-interval polling — the classic 5-minute loop. Useful for tight-feedback projects, training runs, or unattended multi-day work where you want both roles checking each other constantly regardless of natural breakpoints. This was the default in early-2026 methodology. It still works; it just isn't necessary the way it used to be.

The implementation of any cadence depends on your tool. Claude Code supports all three. Codex uses polling hooks. The sections below have the code for each.

No orchestration framework needed. The shared files are the coordination layer. The cadence just decides when each role looks at them.

Step states

Each step in the plan moves through these states. A step is not done just because the code compiles and tests pass — the auditor must accept the evidence.

State	Who acts	What happens
`planned`	Auditor	Step is scoped with instructions, file paths, acceptance criteria, and evidence requirements.
`in_progress`	Builder	Builder is executing the step.
`evidence_pending`	Builder	Code is done. Builder provides evidence: test results, runtime output, negative-path checks, screenshots — whatever the auditor asked for.
`auditor_accepted`	Auditor	Auditor reviews the evidence, confirms it meets acceptance criteria and proves real-world behavior. Next step is unlocked.

This prevents the common failure where things move fast on "PASS" without proving actual behavior. The auditor defines what evidence is required before the builder starts, and the builder cannot move on until the auditor accepts it.

Handoffs and compaction

Even with 1M context, sessions end — terminals close, roles switch, a human takes over. You need one handoff artefact that produces the same shape of summary whether you are compacting mid-session or starting fresh.

Write a single handoff prompt and use it three ways: as the instruction for your tool's context-compaction command (in Claude Code: /compact <instruction>), as the first prompt of every new session, and as the last prompt before you close a terminal — save the output next to plan.md as handoff.md so the next agent can read it on cold start. One prompt, three uses. The next agent always lands in the same informational posture.

Here's a good default. Copy it, adapt it if your project needs more, and keep it stable for the life of the project — consistent wording is what keeps every summary comparable.

handoff prompt — copy and reuse

Summarize this project for a fresh agent joining now. Cover: goal and current status; what's done, in progress, and next by step state; files involved and their purpose; decisions already made that should not be relitigated; open questions to the human; and any access or permission blockers. Max 3 paragraphs.

5. The Goal Document

The goal document is the highest-leverage investment in the entire process. For a significant project, the interview that produces it might take hours spread over a day or two. It is worth every minute.

The agent interviews you and writes the goal document from the conversation. You don't need to be a writer. You need to know what you want. The agent's job is to pull that out of you and structure it.

Structure

Walkthrough — A step-by-step narrative of the user's experience. Written as if the product already exists and is working. The agent drafts this from what you describe.
Capability questions — At least 20 questions the finished software will answer. These are the specification. The agent helps you go deeper — the first 10 are easy, the next 10 are where the real value lives.
Scope boundaries — Explicit list of out-of-scope capabilities. Without hard boundaries, an autonomous agent will keep expanding scope. The agent should ask you: "What should this NOT do?"

Good vs. weak questions

Weak: "Can I see a list of my customers?" — produces a database and a list view.

Strong: "Which of my customers are close to their license capacity, have a renewal coming in the next 90 days, and have had no contact from our team in the last 6 weeks?" — requires usage data, contract data, activity tracking, time-based filtering, cross-referencing, and risk surfacing. It will produce all of those things because it has to.

The depth of questions drives the depth of the software. A useful self-check: read each question and count how many independent data sources, processes, or judgements are needed to answer it. If the answer is one, the question is too shallow.

6. Code: Claude Code

Claude Code supports all three cadences from Section 4. The default — no cadence — needs no special setup. The two recurring options each have a corresponding primitive, and the hook-based paths handle unattended multi-day work.

Default: no cadence

The agent runs, finishes its current task batch, and stops. You restart it when there is new work — or hand off to the other role. No loops, no scheduled wake-ups. This is the right starting point for most modern projects.

                                Terminal 1 — Builder (no cadence)
                                CLAUDE CODE
                            

claude "Read GOAL.md and plan.md. Start building. Stop when your current task batch is done — I'll review and tell you what's next."

                                Terminal 2 — Auditor (no cadence)
                                CLAUDE CODE
                            

claude "Read GOAL.md and plan.md. You are the auditor. Review the builder's recent work, write feedback to comments.md, then stop. I'll restart you when there is new work to review."

Self-paced: `/loop` dynamic mode

Invoke /loop with no interval and the agent picks its own cadence between iterations — sleep 10 minutes when work is hot, 2 hours when nothing is happening, or end the loop entirely. The model decides each time.

                                Terminal 1 — Builder (self-paced)
                                CLAUDE CODE
                            

/loop Read comments.md. If there are new instructions from the auditor, execute them and update plan.md. When your current task batch is done, decide whether to schedule a check-back (30 min is a reasonable default) or end the loop and let the supervisor restart you.

                                Terminal 2 — Auditor (self-paced)
                                CLAUDE CODE
                            

/loop Read plan.md and review any code changes since your last check. If the builder completed work, review it and write feedback to comments.md. When you have nothing more to add right now, decide whether to schedule a check-back or end the loop and let the supervisor restart you.

Continuous: `/loop` with interval

Fixed-interval polling. Useful for tight-feedback projects, training runs, or work where you want both roles checking each other on a clock regardless of natural breakpoints. Session-scoped — if the terminal closes, the loop stops. No built-in file-write enforcement — relies on role prompts. For persistent loops with file-write enforcement, use the Stop hook approach below.

                                Terminal 1 — Builder (continuous /loop)
                                CLAUDE CODE
                            

/loop 5m Read comments.md. If there are new instructions from the auditor, execute them and update plan.md. If nothing new, report status briefly and wait.

                                Terminal 2 — Auditor (continuous /loop)
                                CLAUDE CODE
                            

/loop 5m Read plan.md and review any code changes since your last check. If the builder completed work, review it and write feedback to comments.md. If nothing changed, wait.

Persistent: Stop hook + polling

The heavy-duty path for unattended multi-day projects where you need file-write enforcement on every cycle. Shell scripts fire on the Stop lifecycle event, enforce role boundaries, check for changes, and sleep 5 minutes between checks.

Directory structure

                                project layout
                                CLAUDE CODE
                            

your-project/
  .claude/
    settings.local.json      # hook configuration
    hooks/
      build-a-plan.sh        # builder hook
      audit-a-plan.sh        # auditor hook
  plan.md                    # the plan (builder writes, auditor reads)
  comments.md                # comments (auditor writes, builder reads)
  GOAL.md                    # your goal document

settings.local.json

This file configures which hooks fire and when. Use settings.local.json (gitignored) so each developer can run their own role without conflicts. The timeout is set to 420 seconds (7 minutes) to allow the 5-minute sleep between checks.

                                .claude/settings.local.json — builder instance
                                CLAUDE CODE
                            

{
  "hooks": {
    "Stop": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": ".claude/hooks/build-a-plan.sh",
            "timeout": 420
          }
        ]
      }
    ]
  }
}

                                .claude/settings.local.json — auditor instance
                                CLAUDE CODE
                            

{
  "hooks": {
    "Stop": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": ".claude/hooks/audit-a-plan.sh",
            "timeout": 420
          }
        ]
      }
    ]
  }
}

build-a-plan.sh (Builder hook)

                                .claude/hooks/build-a-plan.sh
                                CLAUDE CODE
                            

#!/bin/bash
# Build-a-Plan hook — loops on Stop event
# First check is immediate. Subsequent checks wait 5 minutes.

INPUT=$(cat)
SESSION_ID=$(echo "$INPUT" | jq -r '.session_id // "default"' 2>/dev/null)
STOP_ACTIVE=$(echo "$INPUT" | jq -r '.stop_hook_active // false' 2>/dev/null)
MARKER="/tmp/build-a-plan-${SESSION_ID}"
LOOP_GUARD="/tmp/build-a-plan-loops-${SESSION_ID}"

if [ "$STOP_ACTIVE" = "true" ]; then
  exit 0
fi

LOOPS=$(cat "$LOOP_GUARD" 2>/dev/null || echo 0)
LOOPS=$((LOOPS + 1))
echo "$LOOPS" > "$LOOP_GUARD"

if [ "$LOOPS" -gt 200 ]; then
  echo "[Build-a-Plan] Loop guard reached 200 stop-hook cycles; allowing stop." >&2
  rm -f "$LOOP_GUARD" "$MARKER"
  exit 0
fi

if [ -f "$MARKER" ]; then
  sleep 300
fi

touch "$MARKER"
cat >&2 <<'MSG'
[Build-a-Plan] You are the builder. Do NOT write to the comments
file — that belongs to the auditor. You can write to the plan file
(updates, questions, notes) and to code. Check:
1. Is the plan file up to date? Update it if needed. You can also
   add questions or notes in the plan for the auditor.
2. Are there new comments in the comments file? If yes, take action
   based on them in the plan and code. If no new comments, no
   worries — you'll check again in 5 minutes.
MSG
exit 2

audit-a-plan.sh (Auditor hook)

                                .claude/hooks/audit-a-plan.sh
                                CLAUDE CODE
                            

#!/bin/bash
# Audit-a-Plan hook — loops on Stop event
# First check is immediate. Subsequent checks wait 5 minutes.

INPUT=$(cat)
SESSION_ID=$(echo "$INPUT" | jq -r '.session_id // "default"' 2>/dev/null)
STOP_ACTIVE=$(echo "$INPUT" | jq -r '.stop_hook_active // false' 2>/dev/null)
MARKER="/tmp/audit-a-plan-${SESSION_ID}"
LOOP_GUARD="/tmp/audit-a-plan-loops-${SESSION_ID}"

if [ "$STOP_ACTIVE" = "true" ]; then
  exit 0
fi

LOOPS=$(cat "$LOOP_GUARD" 2>/dev/null || echo 0)
LOOPS=$((LOOPS + 1))
echo "$LOOPS" > "$LOOP_GUARD"

if [ "$LOOPS" -gt 200 ]; then
  echo "[Audit-a-Plan] Loop guard reached 200 stop-hook cycles; allowing stop." >&2
  rm -f "$LOOP_GUARD" "$MARKER"
  exit 0
fi

if [ -f "$MARKER" ]; then
  sleep 300
fi

touch "$MARKER"
cat >&2 <<'MSG'
[Audit-a-Plan] You are the auditor. Your ONLY writable file is the
comments file. Do NOT write to the plan file, code, or anything else.
1. Check if the plan file or the code has changed since you last
   looked. If work is still in progress and it's not the right time
   to audit yet, no worries — you'll check again in 5 minutes.
2. If things look stable, review the plan and the relevant code,
   then write your feedback in the comments file. Be instructive —
   tell the builder clearly what to do.
3. If things are stalled (no progress from the builder, or a
   blocker), decide: should you both wait, or does something need
   to happen? If the decision requires human supervisor approval,
   say so in the comments and wait — do not proceed without it.
MSG
exit 2

Make both scripts executable: chmod +x .claude/hooks/*.sh

Loop safety: keep the stop guard in both scripts (the stop_hook_active check + max-cycle counter) so the Stop hook cannot run forever.

5 minutes is a good default. For fast-moving work, try 2 minutes. For overnight or low-priority runs, 15 minutes saves tokens. Adjust the sleep value in the hook scripts to match your project's pace.

Event-driven: FileChanged hook

An event-driven alternative that reduces wasted polling when the shared files change frequently. It does not strictly enforce role boundaries the way the Stop hook does — use the Stop hook approach above if you need the agent to enforce file-write boundaries on every cycle.

Instead of polling on a timer, react instantly when the file changes:

                                settings.local.json — builder (FileChanged)
                                CLAUDE CODE
                            

{
  "hooks": {
    "FileChanged": [
      {
        "matcher": "comments.md",
        "hooks": [
          {
            "type": "command",
            "command": "echo '[Build-a-Plan] Comments file changed. Read comments.md and take action on new instructions.'"
          }
        ]
      }
    ]
  }
}

                                settings.local.json — auditor (FileChanged)
                                CLAUDE CODE
                            

{
  "hooks": {
    "FileChanged": [
      {
        "matcher": "plan.md",
        "hooks": [
          {
            "type": "command",
            "command": "echo '[Audit-a-Plan] Plan file changed. Read plan.md and review the builder progress.'"
          }
        ]
      }
    ]
  }
}

Combine with the Stop hook for the initial check when the agent starts. FileChanged fires only on subsequent modifications.

7. Code: OpenAI Codex

Codex CLI supports hooks and persistent instructions through AGENTS.md. The approach is the same: two roles, two files, the same cadence choices. The configuration is slightly different.

If your Codex session is long enough to handle a task batch without checking back, the simplest path is to start codex with the role prompt and let it stop at natural breakpoints — the same no-cadence default as Claude Code. The polling hooks below are for unattended multi-day work where the session needs to keep checking the shared files.

Important: Codex hooks are off by default. Enable them first:

                                ~/.codex/config.toml
                                CODEX
                            

[features]
codex_hooks = true

Directory structure

Run builder and auditor in separate worktrees or clones of the same repo — not two sessions in the same directory. Each gets its own AGENTS.md with the correct role. They share state through the plan and comments files via git.

                                project layout (per worktree)
                                CODEX
                            

your-project/
  .codex/
    hooks.json               # hook configuration
    hooks/
      build-a-plan.sh        # or audit-a-plan.sh
  AGENTS.md                  # role instructions for THIS instance
  plan.md
  comments.md
  GOAL.md

AGENTS.md — Builder worktree

Codex uses AGENTS.md (equivalent to Claude Code's CLAUDE.md) for persistent role instructions. Each worktree gets its own AGENTS.md matching its role.

                                AGENTS.md — builder
                                CODEX
                            

# Builder Role

You are the builder. Your job is to build according to the plan.

## Write permissions
- plan.md — update progress, add questions and notes
- All source code files

## Read permissions
- comments.md — the auditor writes feedback here
- GOAL.md — the original goal document

## Rules
- NEVER write to comments.md — that belongs to the auditor
- Update the plan after completing each task
- Check comments.md before starting new work
- Run tests when the auditor specifies how to test

AGENTS.md — Auditor worktree

                                AGENTS.md — auditor
                                CODEX
                            

# Auditor Role

You are the auditor. You review the builder's work.

## Write permissions
- comments.md — this is your ONLY writable file

## Read permissions
- plan.md — track builder progress and status
- All source code files — review implementation quality
- GOAL.md — compare against the original goal

## Rules
- NEVER write to plan.md, code, or any other file
- Be instructive — tell the builder clearly what to do
- Include testing instructions in your comments
- If things stall, decide: wait, instruct, or escalate
- If human approval is needed, say so and wait

hooks.json

Codex hooks use the same shell scripts but the JSON structure nests handlers inside matcher groups.

                                .codex/hooks.json — builder worktree
                                CODEX
                            

{
  "hooks": {
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": ".codex/hooks/build-a-plan.sh",
            "timeout": 420
          }
        ]
      }
    ]
  }
}

                                .codex/hooks.json — auditor worktree
                                CODEX
                            

{
  "hooks": {
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": ".codex/hooks/audit-a-plan.sh",
            "timeout": 420
          }
        ]
      }
    ]
  }
}

The hook scripts are identical to the Claude Code versions — they're standard bash. Just copy them into .codex/hooks/ and make them executable.

8. Running It

If you used the one-liner from the top of this page, your agent already knows all of this. It will walk you through the interview, create the files, set up the hooks, and tell you what to run. You just answer its questions.

If you're setting things up manually, here's the sequence:

Step by step

Start a conversation with Claude or Codex. Tell it what you want to build. Let it interview you and create the goal document.
Confirm access. The agent will ask you what repos, services, and credentials are available. Answer honestly — missing access discovered later is expensive.
Review the plan. The agent writes plan.md. Read it. Push back on anything that doesn't match your vision. This is your last easy chance to course-correct.
The agent helps you decide on cadence. Most modern projects need no recurring loop — start the agent, let it work until its task batch is done, restart when there is new work. If the project benefits from a cadence, the agent sets up /loop (Claude Code) or polling hooks (Claude Code or Codex). The code examples are in Sections 6 and 7.
Open two terminals:

If using no cadence (default for most projects):

Terminal 1 — Builder (no cadence)

# Claude Code
claude "Read GOAL.md and plan.md. Start building. Stop when your current task batch is done."

# Codex
codex "Read GOAL.md and plan.md. Start building. Stop when your current task batch is done."

Terminal 2 — Auditor (no cadence)

# Claude Code
claude "Read GOAL.md and plan.md. You are the auditor. Review and write to comments.md, then stop."

# Codex
codex "Read GOAL.md and plan.md. You are the auditor. Review and write to comments.md, then stop."

If using /loop (Claude Code only):

                                Terminal 1 — Builder (/loop)
                                CLAUDE CODE
                            

claude
> /loop 5m Read comments.md. If there are new instructions from the auditor, execute them and update plan.md. If nothing new, report status briefly and wait.

                                Terminal 2 — Auditor (/loop)
                                CLAUDE CODE
                            

claude
> /loop 5m Read plan.md and review any code changes since your last check. If the builder completed work, review it and write feedback to comments.md. If nothing changed, wait.

If using hooks (Claude Code or Codex):

Terminal 1 — Builder (hooks)

# Claude Code
claude "Read GOAL.md and plan.md. Start building."

# Codex
codex "Read GOAL.md and plan.md. Start building."

Terminal 2 — Auditor (hooks)

# Claude Code
claude "Read GOAL.md and plan.md. You are the auditor. Begin reviewing."

# Codex
codex "Read GOAL.md and plan.md. You are the auditor. Begin reviewing."

Both agents will work, finish, get intercepted by the hook, check the files, and loop. The builder waits for comments. The auditor waits for progress. They coordinate through the files. You intervene only when the auditor escalates.

Tips

--dangerously-skip-permissions makes the builder fully autonomous (Claude Code). It's powerful but means no confirmation before destructive commands. Use it — but control access at the OS user level so the agent can't touch things it shouldn't.
Run the auditor in a read-heavy mode — it mostly reads and only writes to one file.
If you're running both on the same machine, they'll share the filesystem naturally. If on different machines, use a shared git repo and have both agents pull/push.
There is no single right cadence. No cadence is fine for most modern work — the agent stops at natural breakpoints, you restart it when there is new work. If using continuous /loop, 5 minutes is a starting point; go to 2 for fast work, 15 for overnight. If using hooks, adjust the sleep value in the scripts.
For large projects, you can run multiple builders on independent modules. Each gets their own plan section.
If you use git: the builder commits after each completed task with a short message prefixed builder:. Put detailed reasoning in the plan file, not the commit message. The auditor never commits — it only writes to the comments file, which the builder commits as part of its normal flow.

Version history

This methodology is versioned like software. Each meaningful revision gets a dated version — shown in the badge at the top of this page — and a frozen snapshot you can read, cite, or pin. If you have standardised on this method in your organisation, pin a specific version; the snapshots below never change.

2026-06-27 — current. Added the owner-briefing format (Done / Left / Status & risk / Owner tasks) and this version history; refreshed the flagship to Opus 4.8 and aligned the setup steps with the no-cadence default. You are reading it.
2026-05-13 — added the “speak to the owner” rule, reframed the loop as cadence (no recurring loop by default), refreshed the lineup to GPT-5.5. View snapshot →
2026-04-17 — model refresh, cross-family pairing, follow-the-smarter-model thinking, the reusable handoff prompt, and the methodology version badge. View snapshot →
2026-04-09 — introduced /loop, file-change triggers, and abstracted the loop concept. View snapshot →
2026-03-31 — first published version: two roles, two files, evidence-gated steps. View snapshot →

For an independent record, this guide is also archived by the Wayback Machine.

Credits

The core insight: structure replaces supervision. Invest in the plan, enforce role separation through file permissions, and let the agents loop.

Dr. Mikko S. Niemelä — 2026

Last updated: July 2, 2026

Building with AI Agents

Quick Start

Contents

1. The Idea

2. Two Roles, Two Files

Speak to the owner, not to the system

The owner briefing

Done

Left

Status & risk

Owner tasks

Why this split makes you faster

Go further: pair across model families

Who does the thinking: follow the smarter model

This is also a security pattern

3. Plan First, Execute Second

Write questions, not requirements

The process

Why access goes first

The plan is the memory

4. The Cadence

Step states

Handoffs and compaction

5. The Goal Document

Structure

Good vs. weak questions

6. Code: Claude Code

Default: no cadence

Self-paced: /loop dynamic mode

Continuous: /loop with interval

Persistent: Stop hook + polling

Directory structure

settings.local.json

build-a-plan.sh (Builder hook)

audit-a-plan.sh (Auditor hook)

Event-driven: FileChanged hook

7. Code: OpenAI Codex

Directory structure

AGENTS.md — Builder worktree

AGENTS.md — Auditor worktree

hooks.json

8. Running It

Step by step

Tips

Version history

Credits

Self-paced: `/loop` dynamic mode

Continuous: `/loop` with interval