Orchestration & Model Routing
flow-next is an orchestration layer, not a single-agent workflow. The host agent conducts: it fans work out to tiered subagents, routes reviews to a different model family than the writer, optionally delegates implementation to a second CLI agent, and runs autonomous build/ship loops. Which model does what is a routing decision — and every routing decision in flow-next is either a parameter or a sentence of intent away. The second kind carries judgment.
Already an orchestra
Section titled “Already an orchestra”The trend of mid-2026 — a frontier model orchestrating while cheaper/faster models do the token-heavy work — is flow-next’s default shape:
| What | Default routing |
|---|---|
| Orchestration, judgment, git | Host session model (your pick — e.g. Fable 5) |
| Planning scouts, gap analysis | Fast judgment tier (subagent tiers) |
| Readiness scanners | Cheapest tier — mechanical scan-and-report |
| Implementation workers | Inherit the session model |
| Plan/impl/completion reviews | A different model family via pluggable backends |
| Autonomous loops | Pilot + land ticks driven by /loop / /goal |
Two ways to route
Section titled “Two ways to route”Skills are prompts executed by the host agent, not compiled code. That gives you two genuinely different routing methodologies — use both:
| Deterministic — parameters | Prompted — agentic intelligence | |
|---|---|---|
| What it is | Config keys, flags, per-spec/per-task fields. Machine-resolved, same answer every time | Policy described in natural language. The host judges per item — conditionally, mid-run |
| Example | flowctl config set review.backend codex | ”Work the three ready specs — decide per spec, by complexity, whether implementation is delegated or stays on the session model” |
| Reach | Exactly the surfaces that ship | Anything the host can do — including capabilities that don’t exist as parameters |
| When it wins | Headless/Ralph runs, stable team defaults, reproducibility | Per-item complexity calls, conditional escalation, inventing a routing the registry doesn’t have |
The two compose: parameters set the floor, prompting steers above it. Either can be made durable in CLAUDE.md / AGENTS.md — the host reads your instruction files every session, and flow-next skills inherit them automatically.
Route the reviews
Section titled “Route the reviews”The review subsystem is the most routable surface — spec grammar backend[:model[:effort]] over rp | codex | copilot | cursor | none:
flowctl config set review.backend codex # project defaultflowctl config set review.backend cursor:composer-2.5 # Cursor-billed, folds effort into the modelflowctl config set review.backend codex:gpt-5.4:xhigh # explicit model + effortPrecedence, highest first: per-task review: / per-spec default_review → FLOW_REVIEW_BACKEND → config → backend env → default. A single task can pin its own reviewer and the override routes end-to-end. The cursor backend reaches reviewer models the others can’t (gpt-5.5-high at 1M context, composer-2.5, claude-opus-4-8-thinking-high) on your existing subscription. Details: review workflow, configuration.
Delegate the implementation
Section titled “Delegate the implementation”Opt-in: /flow-next:work offloads the token-heavy part (writing code) to a local codex exec while the host keeps every judgment call — gating, classification, git, review, commit:
flowctl config set work.delegate codex # or per-run: /flow-next:work fn-12 delegate:codexflowctl config set work.delegateModel gpt-5.5OFF by default, one-time consent-gated, circuit-breakered, and independently verified — a delegated diff is never trusted on the delegate’s own summary. See work.
Prompted orchestration — routing with judgment
Section titled “Prompted orchestration — routing with judgment”This is the mode parameters can’t reach: routing policy that’s conditional and per-item, decided against the actual work rather than fixed up front.
Per-item complexity routing — the host classifies, then routes:
Work through the three ready specs. Decide per spec, based on complexity,how the work stage runs: anything touching auth or the migration youimplement yourself on the session model; plain CRUD is delegated to codex(delegate:codex). Reviews come from codex either way.Focus and scope steering — instruction the skill never anticipated, read as intent:
/flow-next:plan fn-12 --depth=deep — focus the research on the migration path; I care about rollback/flow-next:interview fn-12 — push hard on failure modes and operational edges, skip UI polish/flow-next:work fn-12 — the UI tasks stay with you; delegate the API plumbing to codexConditional escalation — routing that reacts to outcomes:
Run /flow-next:work fn-12 with delegate:codex. If a task's review comes backNEEDS_WORK twice, stop delegating that task and implement it yourself on thesession model.Prompting a capability into existence — there is no fable review backend in the registry; describe it and it exists:
/flow-next:plan-review fn-12 — don't use the configured backend; spawn afresh-context subagent on the session model with the same review criteria,and feed its verdict into the fix loop like any other reviewer.Backends, reviewers, and delegates are prompts plus plumbing — when a rung you want is missing, the host builds the arrangement on the spot.
Field patterns, mapped to flow-next
Section titled “Field patterns, mapped to flow-next”The orchestration patterns that emerged in the wild through mid-2026 all have a direct flow-next expression — most need one config key or one sentence:
| Pattern from the field | The idea | flow-next expression |
|---|---|---|
| Orchestrator → executor | The frontier model plans and judges; a cheaper, highly steerable model (GPT-5.5 via the Codex CLI, on the sub you already pay for) writes the code | flowctl config set work.delegate codex — or per-run delegate:codex. Host keeps gating/git/review |
| Orchestrator → reader | Token-hungry, low-judgment reads run on fast models that report summaries back | Already the default: scouts run on fast tiers and return digests |
| Cross-family reviewer | The model that writes is never the model that reviews | review.backend codex / cursor:composer-2.5; per-task review: pins exceptions |
| Effort discipline | Orchestrator at high, not max — top effort tiers are token furnaces on routine work | Session effort is yours; work.delegateEffort floors the delegate |
| Token-hungry offload | Computer use and live-app verification go to other agents; results come back as evidence | /flow-next:qa drives the app in its own context; workers run fresh-context and return receipts |
A model table in CLAUDE.md
Section titled “A model table in CLAUDE.md”The emergent durable pattern: a standing “which model for what” section in your agent instructions — a ranking of the models you can reach plus routing rules. This is prompted orchestration made durable: the table is interpreted by intelligence, not parsed by a config loader. The host applies it with judgment whenever it dispatches subagents, picks reviewers, or decides to delegate — which is exactly why the rules grant standing permission to escalate. A complete, copy-paste starting point, adapted to the flow-next pipeline:
## Picking models for flow-next workflows and subagents
Rankings, higher = better. Cost reflects what I actually pay (existingsubscriptions), not list price. Intelligence is how hard a problem you canhand the model unsupervised. Taste covers UI/UX, code quality, API design, copy.
| model | cost | intelligence | taste ||---------------------------|------|--------------|-------|| gpt-5.5 (codex CLI) | 9 | 8 | 5 || composer-2.5 (cursor CLI) | 9 | 6 | 6 || sonnet-5 | 5 | 7 | 7 || fable-5 (session model) | 2 | 10 | 9 |
How to apply:- These are defaults, not limits. Standing permission to override: if a cheaper model's output doesn't meet the bar, rerun or redo with a smarter model without asking. Judge the output, not the price tag — escalating costs less than shipping mediocre work.- Cost is a tie-breaker only; for anything that ships, intelligence > taste > cost.- Orchestration, planning, review verdicts, anything ambiguous: session model. /flow-next:plan, /flow-next:interview, and pilot/land driving stay here — never delegate judgment.- Bulk/mechanical implementation (clear spec, low ambiguity): delegate to gpt-5.5 — /flow-next:work <id> delegate:codex. Config: work.delegateModel=gpt-5.5, work.delegateEffort=medium.- Anything user-facing (UI, copy, API design) needs taste >= 7 — keep those tasks on the session model even when they look mechanical.- Reviews route to a different family than the writer: review.backend=codex (or cursor:composer-2.5 for speed). Escalate NEEDS_WORK disagreements between reviewer and worker to the session model.- Token-hungry, low-judgment work (codebase analysis, live-app QA driving): subagents and flow-next scouts — summaries come back, the orchestrator never holds the raw tokens.- Mechanics: gpt-5.5 is reached through the Codex CLI (delegate:codex spawns codex exec); composer-2.5 through cursor-agent (review backend cursor:composer-2.5). Claude-family models run natively as subagent tiers.Role labels are durable; model IDs are volatile. Write the table in terms of roles, re-rank as the frontier moves, and the routing rules survive every model generation.
Chain the loops
Section titled “Chain the loops”Pilot and land end every tick with machine-readable verdicts so a driver can compose them — a multi-model spec-to-merged-PR pipeline in one prompt:
/loop 30m — one tick: run /flow-next:pilot --review=codex --depth=deep. If PILOT_VERDICT=DEFERRED_TO_LAND, run /flow-next:land in the same tick. Delegation is on (work.delegate=codex): mechanical tasks go to gpt-5.5, UI tasks stay on the session model, reviews come from codex. Stop when pilot prints NO_WORK and land prints LAND_VERDICT=NO_WORK, or on any NEEDS_HUMAN.DEFERRED_TO_LAND exists exactly for this hand-off. For the hardened overnight harness, see Ralph.
In your repo
Section titled “In your repo”Newer flow-next versions ship this page’s substance into the repo you’re working in, so the host agent finds it at use time without visiting these docs:
.flow/usage.md(installed by setup, read every session) carries an## Orchestration & model steeringsection: dogfooded headless-bridge recipes forcodex exec,cursor-agent, and the reverse direction —claude -p, plus the flow-next shortcuts (delegate:codex,review.backend, per-taskreview:) and prompted-orchestration examples. The bridges run in every direction: from Claude Code the bridges arecodex exec/cursor-agent; from Codex or Cursor they areclaude -p/the other CLI. Any harness that can run Bash can be the conductor.CLAUDE.md/AGENTS.mdcan hold the durable model-routing table above:/flow-next:setupoffers, as an optional ceremony step, to scaffold it live — probe-annotated for the CLIs you actually have installed, shown in full before writing, yours to edit after, marker-fenced so/flow-next:uninstallremoves it cleanly.
On older installs, copy the table from this page instead — the behavior is identical; the scaffold is a convenience.
What never routes away
Section titled “What never routes away”- Judgment stays with the host — a delegated
codex execwrites code; it never owns git, task state, or decisions. - Consent gates don’t route away — delegation stays opt-in; blast radius surfaced before first use.
- Merge is human-gated everywhere except the explicitly opted-in land loop.
- Verification is independent — tests re-run on every delegated diff.
- Escalation beats thrift — when you downgrade a role, watch the first outputs and revert on the first quality miss.