Going Autonomous
Flow-next’s whole pipeline — spec → plan → review → work → review → PR — runs interactively today. Autonomy is the same pipeline with the human moved to the edges: you concentrate judgment in the spec and the readiness gate, and a loop executes the mechanical middle. Nothing about the quality bar changes — the same adversarial review gates fire, the same receipts get written — the loop just stops waiting for you between stages.
There are three loops, each owning a different slice of the lifecycle:
| Loop | Owns | Loop driver | Status |
|---|---|---|---|
| Pilot — the build loop | ready spec → plan → plan-review → work → draft PR | your host’s /loop or /goal | shipped (1.13.0) |
| Land — the ship loop | open PR → CI green → reviews resolved → merge → release | your host’s /loop cadence | shipped (1.14.0) |
| Ralph — the hardened harness | a fully planned spec → work → reviews, at unattended scale (Ralph never plans — planning stays with you or pilot) | external ralph.sh shell loop | shipped |
Pilot + land are the default autonomy path — together they cover the whole lifecycle from blessed spec to merged release, driven entirely by your host’s loop primitives. Ralph is the hardened alternative for the work segment: it consumes specs that are already planned, trades in-session convenience for fresh-session isolation and enforced guard hooks, and is never nested with pilot (pilot refuses to run under FLOW_RALPH). Land picks up where either stops: at the open PR.
The consent boundary
Section titled “The consent boundary”No loop selects work you haven’t blessed. The entry gate is the spec-level ready flag:
- Local repos —
flowctl spec ready fn-12is the blessing;flowctl spec unreadyrevokes it. - Tracker-connected repos — the board is the control plane: a Linear issue in your configured ready state (or a GitHub issue carrying the ready label) blesses the spec; moving it out starves the loop. See Readiness as the control plane.
A half-baked draft is never executed unattended, and a spec that stops advancing is automatically un-blessed (pilot’s two-strike guard) rather than retried forever.
Pilot — drive the build loop from your session
Section titled “Pilot — drive the build loop from your session”One pilot invocation is one tick: select one ready spec, advance it one stage, end with a machine-greppable verdict line. Your host’s loop primitive owns repetition — pilot is the tick, not the runner.
Claude Code — /goal (drain the backlog, then stop)
Section titled “Claude Code — /goal (drain the backlog, then stop)”Requires Claude Code v2.1.139+. /goal validators are transcript-blind, so the stop condition is phrased against the verdict grammar:
/goal keep running /flow-next:pilot until it prints PILOT_VERDICT=NO_WORK, or stop after 20 turnsVariants worth knowing:
/goal keep running /flow-next:pilot --review=codex until PILOT_VERDICT=NO_WORK or PILOT_VERDICT=NEEDS_HUMAN/goal run /flow-next:pilot --spec fn-12 until it prints PILOT_VERDICT, then stop # one spec, one tickClaude Code — /loop (cadence, keeps watch)
Section titled “Claude Code — /loop (cadence, keeps watch)”Requires Claude Code v2.1.72+ (loops expire after 7 days):
/loop 10m /flow-next:pilotEvery 10 minutes pilot takes one tick: if a spec is ready it advances one stage; if nothing qualifies it reports NO_WORK and the loop idles until you bless more work — which makes /loop + the tracker board a standing pipeline: drag an issue to Todo in Linear, and the next tick picks it up.
Codex — /goal
Section titled “Codex — /goal”Opt-in: add [features] goals = true to your Codex config (CLI ≥ 0.128.0). Codex has no $skill-in-goal syntax — write a plain-text objective that names the behavior and the grammar:
/goal Run the flow-next pilot skill repeatedly: each run advances one ready spec by onepipeline stage and ends with a PILOT_VERDICT line. Stop when it printsPILOT_VERDICT=NO_WORK or PILOT_VERDICT=NEEDS_HUMAN.Unattended runs
Section titled “Unattended runs”- Review backend: the
rpbackend runs headlessly via rp-cli — it just needs the Repo Prompt app running on the same Mac (cold start:open -ga "Repo Prompt", responsive within seconds; a stopped app fails fast with a clear error). On machines without the app — remote sessions, CI — use--review=codex,--review=copilot, or--review=none. - Budgets and caps live in the driver:
/goalstop clauses,--tokens,/loopcadence. A pilot tick has no timeout machinery of its own. - Output contract: every tick ends with
PILOT_VERDICT=<ADVANCED|NO_WORK|BLOCKED|NEEDS_HUMAN> spec=<id> stage=<stage> reason="…"as the last line, with the verification evidence (flowctl state transitions, the gh-confirmed PR URL) echoed above it — auditable from the transcript alone.
Ralph — the hardened harness
Section titled “Ralph — the hardened harness”Ralph predates pilot and remains the hardened option for the work segment: an external shell loop (ralph.sh) that spawns a fresh agent session per iteration, with PreToolUse guard hooks (enforced rails, not prose), receipt-based proof-of-work, and auto-block on stuck tasks. Two scope differences from pilot: Ralph consumes specs that are already planned (it iterates plan-review → work → impl-review → completion review; it never runs the planning fan-out), and it doesn’t depend on host loop primitives at all — it’s cron-able on a headless server. Where pilot lives inside your session and reports to the transcript, Ralph runs while you sleep and reports to disk.
Reach for Ralph when the run is long enough that fresh-session isolation matters (a multi-day backlog; /loop jobs also expire after 7 days), when you want hook-enforced guardrails rather than prose ones, or when there’s no interactive host to own the loop.
/flow-next:ralph-init # scaffold scripts/ralph/ once./scripts/ralph/ralph.sh # loop until the spec ships or the cap hitsPick by shape of the work:
| Pilot | Ralph | |
|---|---|---|
| Scope | ready spec → plan → reviews → work → draft PR | fully planned spec → work → reviews (no planning) |
| Loop owner | Host /loop / /goal | External ralph.sh |
| Session | In-session ticks | Fresh per iteration |
| Proof-of-work | PILOT_VERDICT lines in the transcript | Receipts on disk |
| Guard hooks | None (FLOW_AUTONOMOUS, not FLOW_RALPH) | ralph-guard, DCG |
| Stuck handling | Two strikes → spec unready | Auto-block after N failures |
| Best for | In-session backlog draining, standing /loop pipelines | Overnight, unattended scale |
Full setup and guardrails: Ralph Overview, Autonomous Mode, Guardrails.
Land — the ship loop
Section titled “Land — the ship loop”Pilot and Ralph stop at the draft PR — deliberately. Land is the third loop (shipped 1.14.0 — the first spec pilot drove end-to-end): a /loop-cadence babysitter for the PRs the build loop authored.
/loop 30m /flow-next:landPer tick, for each open PR it owns:
- CI — red? Diagnose, fix, push (
FIXING_CI). Bounded attempts (land.ciFixBudget); exhaustion durably labels the PRflow-next:needs-humanand reportsNEEDS_HUMAN. - Reviews — wait out a patience window for automated reviewers (
AWAITING_REVIEW), anchored to the last push. - Resolve — new valid threads route through
/flow-next:resolve-prrunning autonomously, looping until no new reviews arrive. - Merge — CI green + the configured review signal satisfied + threads addressed → flip the draft to ready and merge explicitly (
--squash --match-head-commit, never--auto). No automated review and no signal configured? It never merges unreviewed. - Close + release — close the spec, fire the opt-in tracker touchpoint, then follow the project’s own release instructions if they exist; otherwise stop at merge. No invented versioning, ever.
Land is opt-in and isolated — it’s a separate skill, touches only PRs the build loop authored (branch match AND the make-pr breadcrumb — both signals required), and is the only place in flow-next licensed to auto-merge. Projects that don’t run it are unaffected. Like pilot, every tick ends with a terminal verdict line: LAND_VERDICT=<MERGED|RELEASED|FIXING_CI|AWAITING_REVIEW|RESOLVING|BLOCKED|NEEDS_HUMAN|NO_WORK> prs=<n> pr=<url|-> reason="…". Full gate tree, config keys, and the merge-gate license: Land — the Ship Loop.
Together the loops close the full lifecycle: board → pilot → draft PR → land → merged + released — with your judgment concentrated where it compounds: the spec and the blessing.
Running the full pipeline
Section titled “Running the full pipeline”Pilot and land are designed to run concurrently — that’s the fully orchestrated pipeline: pilot builds spec N while land babysits spec N−1’s PR. Two topologies, with one rule that matters:
Same session, two loops — simplest, zero setup:
/loop 10m /flow-next:pilot --review=codex/loop 30m /flow-next:landTicks serialize (a loop fires only while the session is idle), so a long pilot work-tick delays land’s cadence. Perfect for draining a small backlog in one sitting.
Two instances — the assembly line. Run pilot in one Claude Code / Codex instance and land in another, on a cadence, indefinitely. Each instance needs its own clone (or git worktree) of the repo — both loops mutate the working tree (pilot checks out spec branches; land checks out PR branches to fix CI), and two loops sharing one checkout would trip each other’s dirty-tree guards into NEEDS_HUMAN noise. With separate clones, GitHub is the shared state: land pushes the spec close after merging, pilot pulls the base branch before planning, and the strike ledgers are per-clone by design (they live under .git/, never committed).
clone A: /loop 10m /flow-next:pilot --review=codex # builds: ready spec → draft PRclone B: /loop 30m /flow-next:land # ships: draft PR → merged + releasedboard: drag issues to your ready state to feed the front of the lineThe loops never fight over work: land only touches PRs whose authoring spec has all tasks done (in-flight specs stay pilot’s), authorship needs both the branch match and the make-pr breadcrumb, and pilot skips specs that already have an open PR. The board (or flowctl spec ready) is the only throttle you need.
The safety model
Section titled “The safety model”Hands-free is only useful if it can’t go off the rails. The same discipline applies across all three loops:
- Readiness gate — loops select blessed work only; the human decision is structural, not skippable.
- Same review gates — plan-review, impl-review, and spec-completion-review fire exactly as they do interactively; autonomy suppresses questions, never gates.
- Draft-born PRs — autonomous runs always open PRs as drafts; flipping to ready is land’s gated job or yours.
- Don’t-thrash — pilot’s two-strike
spec unready, Ralph’s auto-block, land’s bounded CI fixes: every loop has a stop-digging reflex that hands the problem back instead of burning tokens. - Never nested — pilot hard-errors under the Ralph harness; the autonomy signal (
mode:autonomous/FLOW_AUTONOMOUS) is deliberately distinct fromFLOW_RALPHand activates none of Ralph’s hooks. - Evidence over narration — advancement is judged on observed state (flowctl fields, gh-confirmed PR URLs), echoed into the transcript or written as receipts. A loop never grades its own homework.
Start here
Section titled “Start here”flowctl spec ready fn-12 # bless a spec (or move its issue to Todo on the board)/loop 10m /flow-next:pilot --review=codexWalk away. Come back to ADVANCED ticks in the transcript, a draft PR on the branch, and the board moved along — or an honest NEEDS_HUMAN telling you exactly where your judgment is needed.