Skip to content

Pilot

/flow-next:pilot is the in-session autonomous conductor: each invocation (a tick) advances exactly one ready spec by one pipeline stage — plan → plan-review → work → make-pr — and ends with a terminal PILOT_VERDICT line. Your agent host’s loop primitive (/loop or /goal) owns iteration; pilot is the tick, not the runner.

Human judgment lives before pilot: the spec content, its dependencies, and the human-owned ready flag are the consent boundary. Pilot executes the mechanical pipeline and reports ambiguity as NEEDS_HUMAN — it never asks questions.

Terminal window
/flow-next:pilot # one tick over the backlog
/flow-next:pilot --spec fn-12 # scope-lock to one spec
/flow-next:pilot --dry-run # selection + stage report, no dispatch
/flow-next:pilot --review=codex # passthroughs: --review, --research, --depth

Defaults: configured review backend, --research=grep, --depth=short. Branch handling is pilot-owned (it resolves the spec’s branch across ticks) — there is no branch flag.

  1. Select — first open + ready spec whose depends_on_epics are all done and that carries no other-actor task claims.
  2. Classify — derive one stage from flowctl state: no tasks → plan; plan not shipped → plan-review; ready/in-progress tasks → work; all done → probe PR state, then make-pr.
  3. Dispatch — invoke exactly one existing stage skill (plan, plan-review, work, make-pr) with mode:autonomous. Pilot never re-implements their logic.
  4. Verify — re-read flowctl review-status fields and task/spec transitions; for make-pr, a gh-confirmed new OPEN PR URL is the advancement evidence.
  5. Report — echo the evidence into the transcript and print the terminal verdict.

Sub-skills run autonomously: the mode:autonomous token (plus FLOW_AUTONOMOUS=1 env for process-level drivers) suppresses questions and picks safe defaults — work branches deterministically, make-pr forces a draft PR and hard-errors instead of prompting. The signal is deliberately distinct from FLOW_RALPH, so none of the Ralph harness hooks activate.

Every tick ends with exactly one machine-greppable line — the last line of output, nothing after it:

PILOT_VERDICT=<ADVANCED|NO_WORK|BLOCKED|NEEDS_HUMAN> spec=<id> stage=<stage> reason="<one line>"
VerdictMeaning
ADVANCEDThe selected spec moved one stage; evidence echoed above the line
NO_WORKNo ready spec qualifies — the driver should stop
BLOCKEDHealthy tick, but the stage did not advance (strike=1/2 or 2/2)
NEEDS_HUMANDirty tree, closed-not-merged PR, gh failure, or genuine ambiguity — state left untouched

This shape exists because /goal validators are transcript-blind: they read conversation output only and never run tools. Pilot therefore echoes its verification evidence (flowctl fields, task counts, PR URL) into the transcript, and you phrase stop conditions against the grammar.

Claude Code /goal (v2.1.139+):

/goal keep running /flow-next:pilot until it prints PILOT_VERDICT=NO_WORK, or stop after 20 turns

Claude Code /loop (v2.1.72+; loops expire after 7 days):

/loop 10m /flow-next:pilot

Codex /goal — opt-in: add [features] goals = true to your Codex config (CLI ≥ 0.128.0). There is no $skill-in-goal syntax on Codex — write a plain-text objective that names the behavior and the grammar:

/goal Run the flow-next pilot skill repeatedly: each run advances one ready spec by one
pipeline stage and ends with a PILOT_VERDICT line. Stop when it prints
PILOT_VERDICT=NO_WORK or PILOT_VERDICT=NEEDS_HUMAN.

Iteration caps, budgets, and wall-clock limits belong to the driver (/goal stop clauses, --tokens, /loop cadence) — a tick has no timeout machinery.

A spec that fails to advance on two healthy ticks is taken out of selection: pilot clears its ready flag (flowctl spec unready) and the BLOCKED verdict carries the reason. Strikes live in a ledger under .git/ (shared across worktrees, never committable). Re-blessing the spec clears its strikes — an explicit human reset (flowctl spec ready, or the board move below on tracker-connected repos).

Pilot consumes exactly one gate: the ready flag. Where that flag comes from depends on your setup, and it changes how you steer a running loop:

Local repos (no tracker, or tracker.readyState unset). flowctl spec ready / unready is authoritative. You bless work on the command line; pilot’s two-strike unready sticks until you re-bless the same way.

Tracker-connected repos (tracker.readyState configured). The board is the control plane. The readiness projection pulls one-way from the tracker: a Linear issue in the configured workflow state (or a GitHub issue carrying the configured label) means ready=true; anything else means ready=false. Move an issue into that state to feed pilot work; move it out to starve the loop. Local spec ready writes are overwritten on the next pull — bless on the board, not the CLI.

One interplay to know when driving pilot against a board: pilot’s two-strike spec unready is a local write, advisory until the board reflects it. If the issue still sits in the ready state, the next tracker pull re-readies the spec — and pilot reads a ready-again spec as human re-blessed, clearing its strikes and retrying. That’s the designed re-bless path when it’s deliberate; when it isn’t, it’s a slow retry loop. So when pilot strikes a spec out (BLOCKED … strike 2/2, spec unreadied), move the issue out of the ready state on the board. After fixing whatever blocked it, the re-bless is the reverse move — board back to ready, strikes clear on the next selection.

This keeps a clean division of labor: the board decides what the loop may touch; pilot decides how far each tick advances it; verdicts report back in the transcript (and the tracker-sync lifecycle events mirror progress onto the issue, so the board view stays current while the loop runs).

Pilot and Ralph are alternative autonomous drivers — never nested (pilot refuses to run under FLOW_RALPH).

RalphPilot
Scopefully planned spec → work → reviews (never plans)ready spec → plan → reviews → work → draft PR
Loop ownerExternal shell script (ralph.sh)Host /loop / /goal
SessionFresh per iterationIn-session ticks
Proof-of-workReceipts on diskPILOT_VERDICT lines in the transcript
Guard hooksralph-guard, DCGNone (FLOW_AUTONOMOUS, not FLOW_RALPH)
Stuck handlingAuto-block after N failuresTwo strikes → spec unready
Best forOvernight, unattended scaleIn-session backlog draining

The rp review backend runs headlessly via rp-cli — it just needs the Repo Prompt app running on the same Mac (cold start: open -ga "Repo Prompt", responsive within seconds; a stopped app fails fast, never hangs). On machines without the app (remote/CI), use --review=codex, --review=copilot, or --review=none.

Pilot stops at the draft PR. From there: let land — the ship loop babysit it to merge and release (/loop 30m /flow-next:land — run both loops concurrently in separate clones for the full pipeline), or flip it to ready yourself and resolve feedback with:

Terminal window
/flow-next:resolve-pr