Skip to content

Going Autonomous

Flow-next’s whole pipeline — spec → plan → review → work → review → PR — runs interactively today. Autonomy is the same pipeline with the human moved to the edges: you concentrate judgment in the spec and the readiness gate, and a loop executes the mechanical middle. Nothing about the quality bar changes — the same adversarial review gates fire, the same receipts get written — the loop just stops waiting for you between stages.

There are three loops, each owning a different slice of the lifecycle:

LoopOwnsLoop driverStatus
Pilot — the build loopready spec → plan → plan-review → work → draft PRyour host’s /loop or /goalshipped (1.13.0)
Land — the ship loopopen PR → CI green → reviews resolved → merge → releaseyour host’s /loop cadenceshipped (1.14.0)
Ralph — the hardened harnessa fully planned spec → work → reviews, at unattended scale (Ralph never plans — planning stays with you or pilot)external ralph.sh shell loopshipped

Pilot + land are the default autonomy path — together they cover the whole lifecycle from blessed spec to merged release, driven entirely by your host’s loop primitives. Ralph is the hardened alternative for the work segment: it consumes specs that are already planned, trades in-session convenience for fresh-session isolation and enforced guard hooks, and is never nested with pilot (pilot refuses to run under FLOW_RALPH). Land picks up where either stops: at the open PR.

No loop selects work you haven’t blessed. The entry gate is the spec-level ready flag:

  • Local reposflowctl spec ready fn-12 is the blessing; flowctl spec unready revokes it.
  • Tracker-connected repos — the board is the control plane: a Linear issue in your configured ready state (or a GitHub issue carrying the ready label) blesses the spec; moving it out starves the loop. See Readiness as the control plane.

A half-baked draft is never executed unattended, and a spec that stops advancing is automatically un-blessed (pilot’s two-strike guard) rather than retried forever.

Pilot — drive the build loop from your session

Section titled “Pilot — drive the build loop from your session”

One pilot invocation is one tick: select one ready spec, advance it one stage, end with a machine-greppable verdict line. Your host’s loop primitive owns repetition — pilot is the tick, not the runner.

Claude Code — /goal (drain the backlog, then stop)

Section titled “Claude Code — /goal (drain the backlog, then stop)”

Requires Claude Code v2.1.139+. /goal validators are transcript-blind, so the stop condition is phrased against the verdict grammar:

/goal keep running /flow-next:pilot until it prints PILOT_VERDICT=NO_WORK, or stop after 20 turns

Variants worth knowing:

/goal keep running /flow-next:pilot --review=codex until PILOT_VERDICT=NO_WORK or PILOT_VERDICT=NEEDS_HUMAN
/goal run /flow-next:pilot --spec fn-12 until it prints PILOT_VERDICT, then stop # one spec, one tick

Claude Code — /loop (cadence, keeps watch)

Section titled “Claude Code — /loop (cadence, keeps watch)”

Requires Claude Code v2.1.72+ (loops expire after 7 days):

/loop 10m /flow-next:pilot

Every 10 minutes pilot takes one tick: if a spec is ready it advances one stage; if nothing qualifies it reports NO_WORK and the loop idles until you bless more work — which makes /loop + the tracker board a standing pipeline: drag an issue to Todo in Linear, and the next tick picks it up.

Opt-in: add [features] goals = true to your Codex config (CLI ≥ 0.128.0). Codex has no $skill-in-goal syntax — write a plain-text objective that names the behavior and the grammar:

/goal Run the flow-next pilot skill repeatedly: each run advances one ready spec by one
pipeline stage and ends with a PILOT_VERDICT line. Stop when it prints
PILOT_VERDICT=NO_WORK or PILOT_VERDICT=NEEDS_HUMAN.
  • Review backend: the rp backend runs headlessly via rp-cli — it just needs the Repo Prompt app running on the same Mac (cold start: open -ga "Repo Prompt", responsive within seconds; a stopped app fails fast with a clear error). On machines without the app — remote sessions, CI — use --review=codex, --review=copilot, or --review=none.
  • Budgets and caps live in the driver: /goal stop clauses, --tokens, /loop cadence. A pilot tick has no timeout machinery of its own.
  • Output contract: every tick ends with PILOT_VERDICT=<ADVANCED|NO_WORK|BLOCKED|NEEDS_HUMAN> spec=<id> stage=<stage> reason="…" as the last line, with the verification evidence (flowctl state transitions, the gh-confirmed PR URL) echoed above it — auditable from the transcript alone.

Ralph predates pilot and remains the hardened option for the work segment: an external shell loop (ralph.sh) that spawns a fresh agent session per iteration, with PreToolUse guard hooks (enforced rails, not prose), receipt-based proof-of-work, and auto-block on stuck tasks. Two scope differences from pilot: Ralph consumes specs that are already planned (it iterates plan-review → work → impl-review → completion review; it never runs the planning fan-out), and it doesn’t depend on host loop primitives at all — it’s cron-able on a headless server. Where pilot lives inside your session and reports to the transcript, Ralph runs while you sleep and reports to disk.

Reach for Ralph when the run is long enough that fresh-session isolation matters (a multi-day backlog; /loop jobs also expire after 7 days), when you want hook-enforced guardrails rather than prose ones, or when there’s no interactive host to own the loop.

Terminal window
/flow-next:ralph-init # scaffold scripts/ralph/ once
./scripts/ralph/ralph.sh # loop until the spec ships or the cap hits

Pick by shape of the work:

PilotRalph
Scopeready spec → plan → reviews → work → draft PRfully planned spec → work → reviews (no planning)
Loop ownerHost /loop / /goalExternal ralph.sh
SessionIn-session ticksFresh per iteration
Proof-of-workPILOT_VERDICT lines in the transcriptReceipts on disk
Guard hooksNone (FLOW_AUTONOMOUS, not FLOW_RALPH)ralph-guard, DCG
Stuck handlingTwo strikes → spec unreadyAuto-block after N failures
Best forIn-session backlog draining, standing /loop pipelinesOvernight, unattended scale

Full setup and guardrails: Ralph Overview, Autonomous Mode, Guardrails.

Pilot and Ralph stop at the draft PR — deliberately. Land is the third loop (shipped 1.14.0 — the first spec pilot drove end-to-end): a /loop-cadence babysitter for the PRs the build loop authored.

/loop 30m /flow-next:land

Per tick, for each open PR it owns:

  1. CI — red? Diagnose, fix, push (FIXING_CI). Bounded attempts (land.ciFixBudget); exhaustion durably labels the PR flow-next:needs-human and reports NEEDS_HUMAN.
  2. Reviews — wait out a patience window for automated reviewers (AWAITING_REVIEW), anchored to the last push.
  3. Resolve — new valid threads route through /flow-next:resolve-pr running autonomously, looping until no new reviews arrive.
  4. Merge — CI green + the configured review signal satisfied + threads addressed → flip the draft to ready and merge explicitly (--squash --match-head-commit, never --auto). No automated review and no signal configured? It never merges unreviewed.
  5. Close + release — close the spec, fire the opt-in tracker touchpoint, then follow the project’s own release instructions if they exist; otherwise stop at merge. No invented versioning, ever.

Land is opt-in and isolated — it’s a separate skill, touches only PRs the build loop authored (branch match AND the make-pr breadcrumb — both signals required), and is the only place in flow-next licensed to auto-merge. Projects that don’t run it are unaffected. Like pilot, every tick ends with a terminal verdict line: LAND_VERDICT=<MERGED|RELEASED|FIXING_CI|AWAITING_REVIEW|RESOLVING|BLOCKED|NEEDS_HUMAN|NO_WORK> prs=<n> pr=<url|-> reason="…". Full gate tree, config keys, and the merge-gate license: Land — the Ship Loop.

Together the loops close the full lifecycle: board → pilot → draft PR → land → merged + released — with your judgment concentrated where it compounds: the spec and the blessing.

Pilot and land are designed to run concurrently — that’s the fully orchestrated pipeline: pilot builds spec N while land babysits spec N−1’s PR. Two topologies, with one rule that matters:

Same session, two loops — simplest, zero setup:

/loop 10m /flow-next:pilot --review=codex
/loop 30m /flow-next:land

Ticks serialize (a loop fires only while the session is idle), so a long pilot work-tick delays land’s cadence. Perfect for draining a small backlog in one sitting.

Two instances — the assembly line. Run pilot in one Claude Code / Codex instance and land in another, on a cadence, indefinitely. Each instance needs its own clone (or git worktree) of the repo — both loops mutate the working tree (pilot checks out spec branches; land checks out PR branches to fix CI), and two loops sharing one checkout would trip each other’s dirty-tree guards into NEEDS_HUMAN noise. With separate clones, GitHub is the shared state: land pushes the spec close after merging, pilot pulls the base branch before planning, and the strike ledgers are per-clone by design (they live under .git/, never committed).

clone A: /loop 10m /flow-next:pilot --review=codex # builds: ready spec → draft PR
clone B: /loop 30m /flow-next:land # ships: draft PR → merged + released
board: drag issues to your ready state to feed the front of the line

The loops never fight over work: land only touches PRs whose authoring spec has all tasks done (in-flight specs stay pilot’s), authorship needs both the branch match and the make-pr breadcrumb, and pilot skips specs that already have an open PR. The board (or flowctl spec ready) is the only throttle you need.

Hands-free is only useful if it can’t go off the rails. The same discipline applies across all three loops:

  • Readiness gate — loops select blessed work only; the human decision is structural, not skippable.
  • Same review gates — plan-review, impl-review, and spec-completion-review fire exactly as they do interactively; autonomy suppresses questions, never gates.
  • Draft-born PRs — autonomous runs always open PRs as drafts; flipping to ready is land’s gated job or yours.
  • Don’t-thrash — pilot’s two-strike spec unready, Ralph’s auto-block, land’s bounded CI fixes: every loop has a stop-digging reflex that hands the problem back instead of burning tokens.
  • Never nested — pilot hard-errors under the Ralph harness; the autonomy signal (mode:autonomous / FLOW_AUTONOMOUS) is deliberately distinct from FLOW_RALPH and activates none of Ralph’s hooks.
  • Evidence over narration — advancement is judged on observed state (flowctl fields, gh-confirmed PR URLs), echoed into the transcript or written as receipts. A loop never grades its own homework.
Terminal window
flowctl spec ready fn-12 # bless a spec (or move its issue to Todo on the board)
/loop 10m /flow-next:pilot --review=codex

Walk away. Come back to ADVANCED ticks in the transcript, a draft PR on the branch, and the board moved along — or an honest NEEDS_HUMAN telling you exactly where your judgment is needed.