Orchestration & Model Routing

flow-next is an orchestration layer, not a single-agent workflow. The host agent conducts: it fans work out to tiered subagents, routes reviews to a different model family than the writer, optionally delegates implementation to a second CLI agent, and runs autonomous build/ship loops. Which model does what is a routing decision — and every routing decision in flow-next is either a parameter or a sentence of intent away. The second kind carries judgment.

Already an orchestra

The trend of mid-2026 — a frontier model orchestrating while cheaper/faster models do the token-heavy work — is flow-next’s default shape:

What	Default routing
Orchestration, judgment, git	Host session model (your pick — e.g. Fable 5)
Planning scouts, gap analysis	Fast judgment tier (subagent tiers)
Readiness scanners	Cheapest tier — mechanical scan-and-report
Implementation workers	Inherit the session model
Plan/impl/completion reviews	A different model family via pluggable backends
Autonomous loops	Pilot + land ticks driven by `/loop` / `/goal`

Two ways to route

Skills are prompts executed by the host agent, not compiled code. That gives you two genuinely different routing methodologies — use both:

	Deterministic — parameters	Prompted — agentic intelligence
What it is	Config keys, flags, per-spec/per-task fields. Machine-resolved, same answer every time	Policy described in natural language. The host judges per item — conditionally, mid-run
Example	`flowctl config set review.backend codex`	”Work the three ready specs — decide per spec, by complexity, whether implementation is delegated or stays on the session model”
Reach	Exactly the surfaces that ship	Anything the host can do — including capabilities that don’t exist as parameters
When it wins	Headless/Ralph runs, stable team defaults, reproducibility	Per-item complexity calls, conditional escalation, inventing a routing the registry doesn’t have

The two compose: parameters set the floor, prompting steers above it. Either can be made durable in CLAUDE.md / AGENTS.md — the host reads your instruction files every session, and flow-next skills inherit them automatically.

Route the reviews

The review subsystem is the most routable surface — spec grammar backend[:model[:effort]] over rp | codex | copilot | cursor | none:

flowctl config set review.backend codex                  # project default
flowctl config set review.backend cursor:composer-2.5   # Cursor-billed, folds effort into the model
flowctl config set review.backend codex:gpt-5.4:xhigh   # explicit model + effort

Precedence, highest first: per-task review: / per-spec default_review → FLOW_REVIEW_BACKEND → config → backend env → default. A single task can pin its own reviewer and the override routes end-to-end. The cursor backend reaches reviewer models the others can’t (gpt-5.5-high at 1M context, composer-2.5, claude-opus-4-8-thinking-high) on your existing subscription. Details: review workflow, configuration.

Delegate the implementation

Opt-in: /flow-next:work offloads the token-heavy part (writing code) to a local codex exec while the host keeps every judgment call — gating, classification, git, review, commit:

flowctl config set work.delegate codex        # or per-run: /flow-next:work fn-12 delegate:codex
flowctl config set work.delegateModel gpt-5.5

OFF by default, one-time consent-gated, circuit-breakered, and independently verified — a delegated diff is never trusted on the delegate’s own summary. See work.

Prompted orchestration — routing with judgment

This is the mode parameters can’t reach: routing policy that’s conditional and per-item, decided against the actual work rather than fixed up front.

Per-item complexity routing — the host classifies, then routes:

Work through the three ready specs. Decide per spec, based on complexity,
how the work stage runs: anything touching auth or the migration you
implement yourself on the session model; plain CRUD is delegated to codex
(delegate:codex). Reviews come from codex either way.

Focus and scope steering — instruction the skill never anticipated, read as intent:

/flow-next:plan fn-12 --depth=deep — focus the research on the migration path; I care about rollback
/flow-next:interview fn-12 — push hard on failure modes and operational edges, skip UI polish
/flow-next:work fn-12 — the UI tasks stay with you; delegate the API plumbing to codex

Conditional escalation — routing that reacts to outcomes:

Run /flow-next:work fn-12 with delegate:codex. If a task's review comes back
NEEDS_WORK twice, stop delegating that task and implement it yourself on the
session model.

Prompting a capability into existence — there is no fable review backend in the registry; describe it and it exists:

/flow-next:plan-review fn-12 — don't use the configured backend; spawn a
fresh-context subagent on the session model with the same review criteria,
and feed its verdict into the fix loop like any other reviewer.

Backends, reviewers, and delegates are prompts plus plumbing — when a rung you want is missing, the host builds the arrangement on the spot.

Field patterns, mapped to flow-next

The orchestration patterns that emerged in the wild through mid-2026 all have a direct flow-next expression — most need one config key or one sentence:

Pattern from the field	The idea	flow-next expression
Orchestrator → executor	The frontier model plans and judges; a cheaper, highly steerable model (GPT-5.5 via the Codex CLI, on the sub you already pay for) writes the code	`flowctl config set work.delegate codex` — or per-run `delegate:codex`. Host keeps gating/git/review
Orchestrator → reader	Token-hungry, low-judgment reads run on fast models that report summaries back	Already the default: scouts run on fast tiers and return digests
Cross-family reviewer	The model that writes is never the model that reviews	`review.backend codex` / `cursor:composer-2.5`; per-task `review:` pins exceptions
Effort discipline	Orchestrator at high, not max — top effort tiers are token furnaces on routine work	Session effort is yours; `work.delegateEffort` floors the delegate
Token-hungry offload	Computer use and live-app verification go to other agents; results come back as evidence	`/flow-next:qa` drives the app in its own context; workers run fresh-context and return receipts

A model table in CLAUDE.md

The emergent durable pattern: a standing “which model for what” section in your agent instructions — a ranking of the models you can reach plus routing rules. This is prompted orchestration made durable: the table is interpreted by intelligence, not parsed by a config loader. The host applies it with judgment whenever it dispatches subagents, picks reviewers, or decides to delegate — which is exactly why the rules grant standing permission to escalate. A complete, copy-paste starting point, adapted to the flow-next pipeline:

## Picking models for flow-next workflows and subagents

Rankings, higher = better. Cost reflects what I actually pay (existing
subscriptions), not list price. Intelligence is how hard a problem you can
hand the model unsupervised. Taste covers UI/UX, code quality, API design, copy.

| model                     | cost | intelligence | taste |
|---------------------------|------|--------------|-------|
| gpt-5.5 (codex CLI)       | 9    | 8            | 5     |
| composer-2.5 (cursor CLI) | 9    | 6            | 6     |
| sonnet-5                  | 5    | 7            | 7     |
| fable-5 (session model)   | 2    | 10           | 9     |

How to apply:
- These are defaults, not limits. Standing permission to override: if a
  cheaper model's output doesn't meet the bar, rerun or redo with a smarter
  model without asking. Judge the output, not the price tag — escalating
  costs less than shipping mediocre work.
- Cost is a tie-breaker only; for anything that ships, intelligence > taste > cost.
- Orchestration, planning, review verdicts, anything ambiguous: session
  model. /flow-next:plan, /flow-next:interview, and pilot/land driving stay
  here — never delegate judgment.
- Bulk/mechanical implementation (clear spec, low ambiguity): delegate to
  gpt-5.5 — /flow-next:work <id> delegate:codex. Config:
  work.delegateModel=gpt-5.5, work.delegateEffort=medium.
- Anything user-facing (UI, copy, API design) needs taste >= 7 — keep those
  tasks on the session model even when they look mechanical.
- Reviews route to a different family than the writer:
  review.backend=codex (or cursor:composer-2.5 for speed). Escalate
  NEEDS_WORK disagreements between reviewer and worker to the session model.
- Token-hungry, low-judgment work (codebase analysis, live-app QA driving):
  subagents and flow-next scouts — summaries come back, the orchestrator
  never holds the raw tokens.
- Mechanics: gpt-5.5 is reached through the Codex CLI (delegate:codex spawns
  codex exec); composer-2.5 through cursor-agent (review backend
  cursor:composer-2.5). Claude-family models run natively as subagent tiers.

Role labels are durable; model IDs are volatile. Write the table in terms of roles, re-rank as the frontier moves, and the routing rules survive every model generation.

Chain the loops

Pilot and land end every tick with machine-readable verdicts so a driver can compose them — a multi-model spec-to-merged-PR pipeline in one prompt:

/loop 30m — one tick: run /flow-next:pilot --review=codex --depth=deep.
  If PILOT_VERDICT=DEFERRED_TO_LAND, run /flow-next:land in the same tick.
  Delegation is on (work.delegate=codex): mechanical tasks go to gpt-5.5,
  UI tasks stay on the session model, reviews come from codex.
  Stop when pilot prints NO_WORK and land prints LAND_VERDICT=NO_WORK,
  or on any NEEDS_HUMAN.

DEFERRED_TO_LAND exists exactly for this hand-off. For the hardened overnight harness, see Ralph.

In your repo

Newer flow-next versions ship this page’s substance into the repo you’re working in, so the host agent finds it at use time without visiting these docs:

.flow/usage.md (installed by setup, read every session) carries an ## Orchestration & model steering section: dogfooded headless-bridge recipes for codex exec, cursor-agent, and the reverse direction — claude -p, plus the flow-next shortcuts (delegate:codex, review.backend, per-task review:) and prompted-orchestration examples. The bridges run in every direction: from Claude Code the bridges are codex exec/cursor-agent; from Codex or Cursor they are claude -p/the other CLI. Any harness that can run Bash can be the conductor.
CLAUDE.md / AGENTS.md can hold the durable model-routing table above: /flow-next:setup offers, as an optional ceremony step, to scaffold it live — probe-annotated for the CLIs you actually have installed, shown in full before writing, yours to edit after, marker-fenced so /flow-next:uninstall removes it cleanly.

On older installs, copy the table from this page instead — the behavior is identical; the scaffold is a convenience.

What never routes away

Judgment stays with the host — a delegated codex exec writes code; it never owns git, task state, or decisions.
Consent gates don’t route away — delegation stays opt-in; blast radius surfaced before first use.
Merge is human-gated everywhere except the explicitly opted-in land loop.
Verification is independent — tests re-run on every delegated diff.
Escalation beats thrift — when you downgrade a role, watch the first outputs and revert on the first quality miss.