Building with AI Agents
Quick Start
Copy-paste one of these into Claude Code or Codex. The agent reads this page and walks you through everything.
Start from scratch — the agent interviews you and creates the plan:
I want to build with AI agents just like Mikko does https://mikkosniemela.com/build-with-agents
Start the builder — plan already exists:
I want to build with AI agents just like Mikko does https://mikkosniemela.com/build-with-agents — I am the builder, my plan file is ______.md
Start the auditor — plan already exists:
I want to build with AI agents just like Mikko does https://mikkosniemela.com/build-with-agents — I am the auditor, my plan file is ______.md
This page works two ways.
If you're a human reading this — read on. Understand the methodology, then start a conversation with your agent and build the plan together.
If you're an AI agent and your human sent you here without reading this themselves — you run the show. Here's your playbook:
- Interview your human. Ask them what they want to build, who it's for, what the end state looks like. Help them think in capability questions, not feature lists. Keep asking until you have at least 20 strong questions. Write the goal document from the conversation.
- Figure out access. Based on the goal, work out what repos, services, APIs, databases, and credentials the builder and auditor will need. Ask your human to confirm what's available and what's missing. List blockers before writing a single line of code.
- Write the plan. Create plan.md with phases, acceptance criteria, and space for auditor comments. Put the access requirements at the top.
- Set up the hooks. Use the code examples below to create the hook scripts and configuration files in the project directory. Make the scripts executable.
- Start the loop. Tell your human to open two terminals — one for the builder, one for the auditor. Give them the exact commands to run. Then get to work.
Either way, the rest of this page explains the methodology and has all the code you need.
Contents
1. The Idea
A single AI agent asked to build and review its own code will rationalize its own output. This is the same cognitive failure that makes developer self-review unreliable. The fix is structural: separate the builder from the reviewer.
You need two things to make this work:
- A plan file — the shared understanding of what needs to be built and where things stand.
- A comments file — a one-way channel from the reviewer to the builder.
Each agent has strict write permissions. The builder writes the plan and the code. The auditor writes comments. Neither touches the other's file. This is the entire coordination mechanism. No orchestration framework, no message queue, no shared database. Two files.
Important: treat write boundaries as policy + guardrails. Role prompts define intent, but hooks should enforce it (for example: block builder writes to comments.md, block auditor writes to plan.md and code paths).
Hooks don't replace great thinking. Hooks are for execution. Do the plan first — brainstorm with Claude or Codex about the plan. When it's ready, then let the agents do their work.
2. Two Roles, Two Files
| Role | Reads | Writes | Responsibility |
|---|---|---|---|
| Builder | Plan, code, comments | Plan, code | Builds features, updates the plan, acts on auditor comments, runs tests |
| Auditor | Plan, code | Comments only | Reviews plan and code, writes feedback, decides when things stall, escalates to human when needed |
The builder can also write questions, notes, and status updates into the plan file. The plan is a living document — not a static spec.
The auditor is instructive, not passive. When things stall (no progress, blockers, confusion), the auditor decides: wait, give new instructions, or escalate to the human supervisor. If something requires human approval, the auditor says so in the comments and waits.
The auditor also tells the builder how to test. The builder executes the tests and reports results in the plan.
3. Plan First, Execute Second
The most important part of this workflow happens before any agent starts building. You create the goal document — but you don't have to write it alone. The agent interviews you.
You sit down with Claude or Codex and talk about what you want to build. The agent asks questions, challenges your assumptions, and helps you think deeper. Together you produce a goal document — a vision written from the future, describing what life looks like after the software is delivered. No implementation details. No stack decisions. Just: what can a user do, and what questions can the system answer?
Write questions, not requirements
"Which contract renewals are coming in the next 6 months, and which are at risk?" implicitly demands a far richer system than "The system shall have a contract renewal dashboard with alerting."
A question tells the builder what the user needs to know and leaves the implementation open. Questions also set the quality bar implicitly: if the system cannot answer the question, it has failed. No interpretation required.
Write at least 20 questions. The depth of your questions drives the depth of the software. If a question only requires one data source to answer, it's too shallow. The best questions require three or more.
The process
- Interview — The agent interviews you. What are you building? Who is it for? What does the user's day look like after this exists? The agent pushes you to think in capability questions (at least 20) and writes the goal document from the conversation.
- Access check — The agent figures out what repos, services, APIs, databases, and credentials the builder and auditor will need. It asks you what's available and flags anything missing as a blocker before any code is written.
- Plan file — The agent reads the goal and writes a structured plan with phases, acceptance criteria, access requirements at the top, and space for auditor comments.
- Hook setup — The agent creates the hook scripts and config files in your project directory.
- Start the loop — Two terminals. Builder builds, auditor reviews. Both loop every 15 minutes. You intervene only when the auditor escalates.
Why access goes first
An agent that hits a permissions wall mid-build will either stall silently or invent a workaround you didn't ask for. Both are expensive. The agent should figure out what access is required and ask you about it before writing a single line of code.
- Repos and branches — which repositories, which branches, does the builder need to create new ones?
- Services and APIs — does the builder need running services, API keys, database access, deployment credentials?
- External tools — does the auditor need browser access for end-to-end testing? Does the builder need package registries, CI/CD pipelines?
- Permissions mode — is the builder running with
--dangerously-skip-permissionsor will it need approval for each shell command?
If anything is missing, it goes in the plan as a blocker at step zero. Don't discover it at step five.
4. The Loop
Both agents run concurrently. Every time an agent finishes its current work, a hook fires and reminds it to check the shared files. If there's nothing new, it waits 15 minutes and checks again.
The loop is enforced by hooks — shell scripts that fire automatically when the agent tries to stop. The hook blocks the stop, sends the agent a reminder to check the files, and then lets it proceed. After 15 minutes of no changes, the hook fires again.
No orchestration framework needed. The hook IS the loop.
5. The Goal Document
The goal document is the highest-leverage investment in the entire process. For a significant project, the interview that produces it might take hours spread over a day or two. It is worth every minute.
The agent interviews you and writes the goal document from the conversation. You don't need to be a writer. You need to know what you want. The agent's job is to pull that out of you and structure it.
Structure
- Walkthrough — A step-by-step narrative of the user's experience. Written as if the product already exists and is working. The agent drafts this from what you describe.
- Capability questions — At least 20 questions the finished software will answer. These are the specification. The agent helps you go deeper — the first 10 are easy, the next 10 are where the real value lives.
- Scope boundaries — Explicit list of out-of-scope capabilities. Without hard boundaries, an autonomous agent will keep expanding scope. The agent should ask you: "What should this NOT do?"
Good vs. weak questions
Weak: "Can I see a list of my customers?" — produces a database and a list view.
Strong: "Which of my customers are close to their license capacity, have a renewal coming in the next 90 days, and have had no contact from our team in the last 6 weeks?" — requires usage data, contract data, activity tracking, time-based filtering, cross-referencing, and risk surfacing. It will produce all of those things because it has to.
The depth of questions drives the depth of the software. A useful self-check: read each question and count how many independent data sources, processes, or judgements are needed to answer it. If the answer is one, the question is too shallow.
6. Code: Claude Code Hooks
Claude Code hooks are shell scripts that fire on lifecycle events. We use the Stop event to intercept the agent before it finishes and remind it to check the shared files. The script sleeps 15 minutes between checks to create the polling loop.
Directory structure
your-project/
.claude/
settings.local.json # hook configuration
hooks/
build-a-plan.sh # builder hook
audit-a-plan.sh # auditor hook
plan.md # the plan (builder writes, auditor reads)
comments.md # comments (auditor writes, builder reads)
GOAL.md # your goal document
settings.local.json
This file configures which hooks fire and when. Use settings.local.json (gitignored) so each developer can run their own role without conflicts. The timeout is set to 960 seconds (16 minutes) to allow the 15-minute sleep between checks.
{
"hooks": {
"Stop": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": ".claude/hooks/build-a-plan.sh",
"timeout": 960
}
]
}
]
}
}
{
"hooks": {
"Stop": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": ".claude/hooks/audit-a-plan.sh",
"timeout": 960
}
]
}
]
}
}
build-a-plan.sh (Builder hook)
#!/bin/bash
# Build-a-Plan hook — loops on Stop event
# First check is immediate. Subsequent checks wait 15 minutes.
INPUT=$(cat)
SESSION_ID=$(echo "$INPUT" | jq -r '.session_id // "default"' 2>/dev/null)
STOP_ACTIVE=$(echo "$INPUT" | jq -r '.stop_hook_active // false' 2>/dev/null)
MARKER="/tmp/build-a-plan-${SESSION_ID}"
LOOP_GUARD="/tmp/build-a-plan-loops-${SESSION_ID}"
if [ "$STOP_ACTIVE" = "true" ]; then
exit 0
fi
LOOPS=$(cat "$LOOP_GUARD" 2>/dev/null || echo 0)
LOOPS=$((LOOPS + 1))
echo "$LOOPS" > "$LOOP_GUARD"
if [ "$LOOPS" -gt 120 ]; then
echo "[Build-a-Plan] Loop guard reached 120 stop-hook cycles; allowing stop." >&2
rm -f "$LOOP_GUARD" "$MARKER"
exit 0
fi
if [ -f "$MARKER" ]; then
sleep 900
fi
touch "$MARKER"
cat >&2 <<'MSG'
[Build-a-Plan] You are the builder. Do NOT write to the comments
file — that belongs to the auditor. You can write to the plan file
(updates, questions, notes) and to code. Check:
1. Is the plan file up to date? Update it if needed. You can also
add questions or notes in the plan for the auditor.
2. Are there new comments in the comments file? If yes, take action
based on them in the plan and code. If no new comments, no
worries — you'll check again in 15 minutes.
MSG
exit 2
audit-a-plan.sh (Auditor hook)
#!/bin/bash
# Audit-a-Plan hook — loops on Stop event
# First check is immediate. Subsequent checks wait 15 minutes.
INPUT=$(cat)
SESSION_ID=$(echo "$INPUT" | jq -r '.session_id // "default"' 2>/dev/null)
STOP_ACTIVE=$(echo "$INPUT" | jq -r '.stop_hook_active // false' 2>/dev/null)
MARKER="/tmp/audit-a-plan-${SESSION_ID}"
LOOP_GUARD="/tmp/audit-a-plan-loops-${SESSION_ID}"
if [ "$STOP_ACTIVE" = "true" ]; then
exit 0
fi
LOOPS=$(cat "$LOOP_GUARD" 2>/dev/null || echo 0)
LOOPS=$((LOOPS + 1))
echo "$LOOPS" > "$LOOP_GUARD"
if [ "$LOOPS" -gt 120 ]; then
echo "[Audit-a-Plan] Loop guard reached 120 stop-hook cycles; allowing stop." >&2
rm -f "$LOOP_GUARD" "$MARKER"
exit 0
fi
if [ -f "$MARKER" ]; then
sleep 900
fi
touch "$MARKER"
cat >&2 <<'MSG'
[Audit-a-Plan] You are the auditor. Your ONLY writable file is the
comments file. Do NOT write to the plan file, code, or anything else.
1. Check if the plan file or the code has changed since you last
looked. If work is still in progress and it's not the right time
to audit yet, no worries — you'll check again in 15 minutes.
2. If things look stable, review the plan and the relevant code,
then write your feedback in the comments file. Be instructive —
tell the builder clearly what to do.
3. If things are stalled (no progress from the builder, or a
blocker), decide: should you both wait, or does something need
to happen? If the decision requires human supervisor approval,
say so in the comments and wait — do not proceed without it.
MSG
exit 2
Make both scripts executable: chmod +x .claude/hooks/*.sh
Loop safety: keep the stop guard in both scripts (the stop_hook_active check + max-cycle counter) so the Stop hook cannot run forever.
7. Code: OpenAI Codex
Codex CLI supports hooks and persistent instructions through AGENTS.md. The approach is the same: two roles, two files, a polling loop. The configuration is slightly different.
Important: Codex hooks are off by default. Enable them first:
[features] codex_hooks = true
Directory structure
Run builder and auditor in separate worktrees or clones of the same repo — not two sessions in the same directory. Each gets its own AGENTS.md with the correct role. They share state through the plan and comments files via git.
your-project/
.codex/
hooks.json # hook configuration
hooks/
build-a-plan.sh # or audit-a-plan.sh
AGENTS.md # role instructions for THIS instance
plan.md
comments.md
GOAL.md
AGENTS.md — Builder worktree
Codex uses AGENTS.md (equivalent to Claude Code's CLAUDE.md) for persistent role instructions. Each worktree gets its own AGENTS.md matching its role.
# Builder Role You are the builder. Your job is to build according to the plan. ## Write permissions - plan.md — update progress, add questions and notes - All source code files ## Read permissions - comments.md — the auditor writes feedback here - GOAL.md — the original goal document ## Rules - NEVER write to comments.md — that belongs to the auditor - Update the plan after completing each task - Check comments.md before starting new work - Run tests when the auditor specifies how to test
AGENTS.md — Auditor worktree
# Auditor Role You are the auditor. You review the builder's work. ## Write permissions - comments.md — this is your ONLY writable file ## Read permissions - plan.md — track builder progress and status - All source code files — review implementation quality - GOAL.md — compare against the original goal ## Rules - NEVER write to plan.md, code, or any other file - Be instructive — tell the builder clearly what to do - Include testing instructions in your comments - If things stall, decide: wait, instruct, or escalate - If human approval is needed, say so and wait
hooks.json
Codex hooks use the same shell scripts but the JSON structure nests handlers inside matcher groups.
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": ".codex/hooks/build-a-plan.sh",
"timeout": 960
}
]
}
]
}
}
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": ".codex/hooks/audit-a-plan.sh",
"timeout": 960
}
]
}
]
}
}
The hook scripts are identical to the Claude Code versions — they're standard bash. Just copy them into .codex/hooks/ and make them executable.
8. Running It
If you used the one-liner from the top of this page, your agent already knows all of this. It will walk you through the interview, create the files, set up the hooks, and tell you what to run. You just answer its questions.
If you're setting things up manually, here's the sequence:
Step by step
- Start a conversation with Claude or Codex. Tell it what you want to build. Let it interview you and create the goal document.
- Confirm access. The agent will ask you what repos, services, and credentials are available. Answer honestly — missing access discovered later is expensive.
- Review the plan. The agent writes plan.md. Read it. Push back on anything that doesn't match your vision. This is your last easy chance to course-correct.
- The agent sets up hooks. It creates the scripts and config files from the code examples on this page.
- Open two terminals:
# Claude Code claude "Read GOAL.md and plan.md. Start building." # Codex codex "Read GOAL.md and plan.md. Start building."
# Claude Code claude "Read GOAL.md and plan.md. You are the auditor. Begin reviewing." # Codex codex "Read GOAL.md and plan.md. You are the auditor. Begin reviewing."
Both agents will work, finish, get intercepted by the hook, check the files, and loop. The builder waits for comments. The auditor waits for progress. They coordinate through the files. You intervene only when the auditor escalates.
Tips
--dangerously-skip-permissionsmakes the builder fully autonomous (Claude Code). It's powerful but means no confirmation before destructive commands. Use it — but control access at the OS user level so the agent can't touch things it shouldn't.- Run the auditor in a read-heavy mode — it mostly reads and only writes to one file.
- If you're running both on the same machine, they'll share the filesystem naturally. If on different machines, use a shared git repo and have both agents pull/push.
- The 15-minute interval is a starting point. Adjust the
sleepvalue in the hook scripts to match your project's pace. - For large projects, you can run multiple builders on independent modules. Each gets their own plan section.
Credits
The core insight: structure replaces supervision. Invest in the plan, enforce role separation through file permissions, and let the agents loop.
Dr. Mikko S. Niemelä — 2026
Last updated: March 31, 2026