Skip to content

Flow-Next Drive

The flow-next-drive skill drives any UI surface the way a real user would — a web app, a Chromium-backed desktop app (Electron / Windows WebView2), or a genuinely native app (macOS AppKit/SwiftUI, or a webview exposing no CDP). It detects the surface, picks the highest available driver on a ladder, and degrades gracefully when a richer driver is absent.

It is a router, not a single driver. The default rung — Vercel’s agent-browser CLI — is the only driver assumed present; every other rung is detected and optional. A pass succeeds with whatever the environment actually has: most cloud VMs, Linux, and CI have no Computer Use, so it is never a hard dependency and never on a headless/no-display path.

  • Verifying a deployed UI change matches the spec.
  • Driving or testing a web app, an Electron / WebView2 desktop app, or a native desktop app.
  • Reading documentation that has no clean text version.
  • Capturing baseline screenshots before a redesign.
  • Logging into a service and pulling structured data.
  • Light e2e probes that do not warrant a full test framework.

It orchestrates drivers — it does not reimplement them. The full native-desktop QA workflow (scenario authoring, bug filing, verdict) is a downstream /flow-next:qa concern; this skill provides the driver/actuation + the surface conditional.

Step 1 — Detect the surface, then branch

Section titled “Step 1 — Detect the surface, then branch”
SurfaceWhat it isPath
Web appA URL in a browser (localhost dev server, staging, production)Web ladder
Chromium-backed desktop appElectron / Windows WebView2 — Chromium under the hood, exposes a CDP debug portWeb ladder, attaching over CDP to the app’s remote-debugging port
True-native / non-CDP surfacemacOS AppKit/SwiftUI, Catalyst, or a webview exposing no CDP (macOS WKWebView, which Tauri uses on macOS)Native rung → Computer Use

Per-platform caveat: Windows WebView2 is CDP-drivable (web ladder); macOS WKWebView generally is not (native rung). When unsure whether a desktop app exposes CDP, probe for the web ladder first (try to launch/attach with a debug port); if no port is reachable, fall to the native rung.

Step 2 — The universal flow (all surfaces)

Section titled “Step 2 — The universal flow (all surfaces)”

Whatever driver the environment has, the work is the same shape:

observe / navigate to the target
snapshot → fresh element refs (REQUIRED before each act)
act → click / fill / type / press / scroll
verify → confirm the expected text / state appeared
capture → screenshot + console/errors (and on failure)
release → close the tab / end the session when fully done

Refs (@e1, @e2, …) go stale after any navigation, click, or form submit — always re-snapshot. “ref not found” or “pointer-events: none” almost always means a stale snapshot, not a real bug.

Step 3 — Web ladder (web apps + Chromium-backed desktop apps)

Section titled “Step 3 — Web ladder (web apps + Chromium-backed desktop apps)”

Probe availability top-down and use the highest rung that passes; fail soft to the next; the terminal rung is manual.

RungDriverUse when
1 (default)agent-browser CLIAlways assumed present. CDP-based, headless-safe, no extra install. Drives web apps; drives Electron / WebView2 over CDP (--cdp <port> / --auto-connect).
2chrome-devtools-mcpYou want built-in auto-wait, DevTools-grade network/console inspection, Lighthouse, or to attach to your real signed-in Chrome (--browser-url) so bot defenses don’t challenge an automated profile.
3PlaywrightThe repo already has Playwright configured, or you need a headless CI-style / cross-browser regression run.
4cursor-ide-browser MCPRunning inside Cursor with this MCP installed and you want its snapshot YAML + browser_cdp control.
5 (terminal)Manual + screenshot relayNo browser driver available — drive yourself, paste console errors and screenshots into chat.

The same ladder drives Electron / WebView2 apps by attaching to the app’s remote-debugging port. Launch the app with a dedicated debug port and user-data-dir; treat the open debug port as a security exposure (any local app can drive that session).

A genuinely native app (or a non-CDP webview) has no browser tab to attach to — the only way to drive it is Computer Use: the model looks at the screen, moves a cursor, clicks, and types. Driver-agnostic across what the host offers:

  • Codex Computer Use (macOS / Windows).
  • Anthropic “Claude” Computer Use — the API computer tool, run via its own harness (a controlled display/sandbox or an MCP wrapper).

Detect availability and use whichever the environment provides; verify the tool/beta-header version at build (it drifts). The actuation differs from the web ladder but the universal flow is identical.

When no Computer Use is present:

  • A Chromium-backed app still drives via the web-ladder CDP attach, or by driving its local dev-server URL in a browser. (Shell-level integration — system tray, native menus, OS dialogs — can’t be reached this way; surface that limitation.)
  • A genuinely native app with no Computer Use → document the limitation rather than fail.

agent-browser stays the only assumed-present driver. No MCP server or Computer Use is ever a hard install dependency.

  • Not a test framework. There is no assertion DSL, no parallel runner, no flake retry.
  • Not a scraper for restricted sites. Respect robots.txt and the target’s terms of service.
  • It orchestrates drivers — it does not reimplement Playwright or Computer Use.
  • Forgetting to re-snapshot after a click or form submission — refs become stale.
  • Routing an Electron / WebView2 app to Computer Use — it’s Chromium, drive it over CDP via the web ladder.
  • Treating Computer Use as a default — most environments lack it; it’s the native-rung fallback, not the common case.

Driver ladder and universal-flow structure inspired by Ray Fernando’s running-bug-review-board skill (Apache-2.0).