Pilot
/flow-next:pilot is the in-session autonomous conductor: each invocation (a tick) advances exactly one ready spec by one pipeline stage — plan → plan-review → work → make-pr — and ends with a terminal PILOT_VERDICT line. Your agent host’s loop primitive (/loop or /goal) owns iteration; pilot is the tick, not the runner.
Human judgment lives before pilot: the spec content, its dependencies, and the human-owned ready flag are the consent boundary. Pilot executes the mechanical pipeline and reports ambiguity as NEEDS_HUMAN — it never asks questions.
Invocation
Section titled “Invocation”/flow-next:pilot # one tick over the backlog/flow-next:pilot --spec fn-12 # scope-lock to one spec/flow-next:pilot --dry-run # selection + stage report, no dispatch/flow-next:pilot --review=codex # passthroughs: --review, --research, --depthDefaults: configured review backend, --research=grep, --depth=short. Branch handling is pilot-owned (it resolves the spec’s branch across ticks) — there is no branch flag.
What a tick does
Section titled “What a tick does”- Select — first
open+readyspec whosedepends_on_epicsare all done and that carries no other-actor task claims. - Classify — derive one stage from flowctl state: no tasks →
plan; plan not shipped →plan-review; ready/in-progress tasks →work; all done → probe PR state, thenmake-pr. - Dispatch — invoke exactly one existing stage skill (
plan,plan-review,work,make-pr) withmode:autonomous. Pilot never re-implements their logic. - Verify — re-read flowctl review-status fields and task/spec transitions; for make-pr, a gh-confirmed new OPEN PR URL is the advancement evidence.
- Report — echo the evidence into the transcript and print the terminal verdict.
Sub-skills run autonomously: the mode:autonomous token (plus FLOW_AUTONOMOUS=1 env for process-level drivers) suppresses questions and picks safe defaults — work branches deterministically, make-pr forces a draft PR and hard-errors instead of prompting. The signal is deliberately distinct from FLOW_RALPH, so none of the Ralph harness hooks activate.
The verdict grammar
Section titled “The verdict grammar”Every tick ends with exactly one machine-greppable line — the last line of output, nothing after it:
PILOT_VERDICT=<ADVANCED|NO_WORK|BLOCKED|NEEDS_HUMAN> spec=<id> stage=<stage> reason="<one line>"| Verdict | Meaning |
|---|---|
ADVANCED | The selected spec moved one stage; evidence echoed above the line |
NO_WORK | No ready spec qualifies — the driver should stop |
BLOCKED | Healthy tick, but the stage did not advance (strike=1/2 or 2/2) |
NEEDS_HUMAN | Dirty tree, closed-not-merged PR, gh failure, or genuine ambiguity — state left untouched |
This shape exists because /goal validators are transcript-blind: they read conversation output only and never run tools. Pilot therefore echoes its verification evidence (flowctl fields, task counts, PR URL) into the transcript, and you phrase stop conditions against the grammar.
Driving pilot
Section titled “Driving pilot”Claude Code /goal (v2.1.139+):
/goal keep running /flow-next:pilot until it prints PILOT_VERDICT=NO_WORK, or stop after 20 turnsClaude Code /loop (v2.1.72+; loops expire after 7 days):
/loop 10m /flow-next:pilotCodex /goal — opt-in: add [features] goals = true to your Codex config (CLI ≥ 0.128.0). There is no $skill-in-goal syntax on Codex — write a plain-text objective that names the behavior and the grammar:
/goal Run the flow-next pilot skill repeatedly: each run advances one ready spec by onepipeline stage and ends with a PILOT_VERDICT line. Stop when it printsPILOT_VERDICT=NO_WORK or PILOT_VERDICT=NEEDS_HUMAN.Iteration caps, budgets, and wall-clock limits belong to the driver (/goal stop clauses, --tokens, /loop cadence) — a tick has no timeout machinery.
Don’t-thrash guard
Section titled “Don’t-thrash guard”A spec that fails to advance on two healthy ticks is taken out of selection: pilot clears its ready flag (flowctl spec unready) and the BLOCKED verdict carries the reason. Strikes live in a ledger under .git/ (shared across worktrees, never committable). Re-blessing the spec clears its strikes — an explicit human reset (flowctl spec ready, or the board move below on tracker-connected repos).
Readiness as the control plane
Section titled “Readiness as the control plane”Pilot consumes exactly one gate: the ready flag. Where that flag comes from depends on your setup, and it changes how you steer a running loop:
Local repos (no tracker, or tracker.readyState unset). flowctl spec ready / unready is authoritative. You bless work on the command line; pilot’s two-strike unready sticks until you re-bless the same way.
Tracker-connected repos (tracker.readyState configured). The board is the control plane. The readiness projection pulls one-way from the tracker: a Linear issue in the configured workflow state (or a GitHub issue carrying the configured label) means ready=true; anything else means ready=false. Move an issue into that state to feed pilot work; move it out to starve the loop. Local spec ready writes are overwritten on the next pull — bless on the board, not the CLI.
One interplay to know when driving pilot against a board: pilot’s two-strike spec unready is a local write, advisory until the board reflects it. If the issue still sits in the ready state, the next tracker pull re-readies the spec — and pilot reads a ready-again spec as human re-blessed, clearing its strikes and retrying. That’s the designed re-bless path when it’s deliberate; when it isn’t, it’s a slow retry loop. So when pilot strikes a spec out (BLOCKED … strike 2/2, spec unreadied), move the issue out of the ready state on the board. After fixing whatever blocked it, the re-bless is the reverse move — board back to ready, strikes clear on the next selection.
This keeps a clean division of labor: the board decides what the loop may touch; pilot decides how far each tick advances it; verdicts report back in the transcript (and the tracker-sync lifecycle events mirror progress onto the issue, so the board view stays current while the loop runs).
Pilot vs Ralph
Section titled “Pilot vs Ralph”Pilot and Ralph are alternative autonomous drivers — never nested (pilot refuses to run under FLOW_RALPH).
| Ralph | Pilot | |
|---|---|---|
| Scope | fully planned spec → work → reviews (never plans) | ready spec → plan → reviews → work → draft PR |
| Loop owner | External shell script (ralph.sh) | Host /loop / /goal |
| Session | Fresh per iteration | In-session ticks |
| Proof-of-work | Receipts on disk | PILOT_VERDICT lines in the transcript |
| Guard hooks | ralph-guard, DCG | None (FLOW_AUTONOMOUS, not FLOW_RALPH) |
| Stuck handling | Auto-block after N failures | Two strikes → spec unready |
| Best for | Overnight, unattended scale | In-session backlog draining |
Unattended runs
Section titled “Unattended runs”The rp review backend runs headlessly via rp-cli — it just needs the Repo Prompt app running on the same Mac (cold start: open -ga "Repo Prompt", responsive within seconds; a stopped app fails fast, never hangs). On machines without the app (remote/CI), use --review=codex, --review=copilot, or --review=none.
Next step
Section titled “Next step”Pilot stops at the draft PR. From there: let land — the ship loop babysit it to merge and release (/loop 30m /flow-next:land — run both loops concurrently in separate clones for the full pipeline), or flip it to ready yourself and resolve feedback with:
/flow-next:resolve-pr