feat: mobile QA via Revyl cloud devices + browse-mobile Appium driver#604
Open
TenzinDhonyoe wants to merge 34 commits intogarrytan:mainfrom
Open
feat: mobile QA via Revyl cloud devices + browse-mobile Appium driver#604TenzinDhonyoe wants to merge 34 commits intogarrytan:mainfrom
TenzinDhonyoe wants to merge 34 commits intogarrytan:mainfrom
Conversation
New module that implements the same HTTP command protocol as browse/ but backed by Appium WebDriver for mobile app automation. Enables /qa to test Expo/React Native apps on iOS Simulator. Key components: - ref-system.ts: Parse Appium XML accessibility tree into @e refs - mobile-driver.ts: WebDriverIO wrapper with click, fill, screenshot, snapshot - server.ts: HTTP server (same protocol as browse — bearer auth, state file) - cli.ts: CLI entry point + setup-check for dependency validation - platform/ios.ts: iOS Simulator boot, device listing, app management Tested against real Expo app (Gluco) — snapshot, click, fill, screenshot all working. 43 tests passing, 0 failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
QA skills now auto-detect Expo/React Native projects and switch to mobile mode. When app.json is found and browse-mobile is available: - Automatically starts Appium if not running - Boots iOS Simulator if needed - Builds/installs app if not on simulator - Navigates through Expo dev launcher to actual app - Uses $BM instead of $B for all browse commands - Falls back to ~"Label" selector for RN components missing accessibilityRole - Flags missing accessibility props as QA findings Web QA behavior is completely unchanged — mobile branches are gated on detection. Files changed: - scripts/gen-skill-docs.ts: BROWSE_MOBILE_SETUP placeholder + mobile detection in QA methodology + Expo/RN framework guidance - qa/SKILL.md.tmpl: mobile setup block + platform parameter - qa-only/SKILL.md.tmpl: same mobile additions (report-only) - SKILL.md.tmpl: Mobile Testing section with $BM command reference - TODOS.md: 3 new items from eng review Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The compiled binary is 58MB (bundles entire Bun runtime + webdriverio). Same pattern as browse/dist/ which is already gitignored. Users build it locally via: bun build --compile browse-mobile/src/cli.ts --outfile browse-mobile/dist/browse-mobile Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The compiled binary couldn't find server.ts when deployed outside the gstack repo. Now the CLI spawns itself with --server flag to run the server in-process, same pattern as browse/. Works both in dev mode (bun run cli.ts) and as compiled binary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…iled binary Three fixes: 1. Switch from bun --compile (can't resolve webdriverio transitive deps) to bun build (JS bundle) + shell launcher script. 3.2MB bundle vs 58MB binary, and all npm deps resolve correctly at runtime. 2. Filter --server from process.argv in server.ts so bundle ID isn't clobbered when CLI spawns itself in server mode. 3. CLI finds the bundled cli.js relative to itself, works from any directory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug 1: handleCommand() threw immediately if not connected. Now it auto-reconnects to Appium when the first command arrives, handling the common case where WDA takes 30-60s to compile on first session. Bug 2: CLI didn't pass BROWSE_MOBILE_BUNDLE_ID env var when spawning the server subprocess. Now extracts bundle ID from goto app://... and forwards it so the Appium session is created with the correct app. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrote mobile-driver.ts to use raw fetch() for all Appium WebDriver protocol calls instead of webdriverio. This eliminates the transitive dependency bundling problem permanently. Results: - Bundle: 119KB (was 3.2MB with webdriverio) - Dependencies: 0 npm packages (was webdriverio + 230 transitive deps) - All Appium commands work via W3C WebDriver REST protocol over HTTP Also fixed: - CLI timeout: 180s for goto (Appium connect), 60s for other commands - Removed webdriverio from package.json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
/execute returns 404 on Appium — the correct W3C route is /execute/sync. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When /qa detects a mobile project for the first time, it checks if browse-mobile bash permissions exist in the user's settings.json. If not, offers to add them — one-time setup that enables fully automated mobile QA without per-command approval prompts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Expanded permission patterns to cover inline bash (SID=..., curl -X POST, JAVA_HOME=...) that the QA skill generates. Previous patterns only matched commands starting with $BM. 2. Added speed guidance: batch multiple $BM commands in single bash calls using && instead of separate tool calls. Take screenshots at milestones only, not after every tap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
browseDir is ~/.claude/skills/gstack/browse/dist — need ../../ to reach the gstack root, not ../ which only goes up to browse/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…p command Three fixes: 1. Changed ~"Label" to label:Label syntax — the ~ was being interpreted by zsh as home directory expansion, breaking accessibility label clicks. 2. Added tap <x> <y> command for coordinate-based tapping when elements can't be found by ref or label. 3. Updated all skill templates and help text to use new label: syntax. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds Revyl as a second mobile QA backend alongside browse-mobile (Appium). When Revyl is authenticated, /qa and /qa-only prefer cloud devices over local simulator — no Xcode/Appium/Java setup needed. Changes: - Revyl auth detection in browse-mobile setup - Full Revyl QA path: init → app detection → dev loop (with tunnel verification + 30s timeout) → static fallback → build caching → device provisioning → command mapping - YAML validation + auto-fix after revyl init (known CLI bug) - App-id auto-detection with AskUserQuestion for ambiguous matches - Mobile auth strategy (sign-up attempt, credential request, Apple Sign-In scope limitation) - Mobile exploration checklist (8 items: transitions, scroll, keyboard, back nav, empty/loading states, orientation, accessibility) - Fix Rule 5 contradiction: scoped "never read source" to testing phases - Batch re-verification for mobile fixes (rebuild once after all fixes) - Mobile QA timing expectations in setup section - 3 new TODOs: Revyl E2E test, /browse Revyl integration, Android support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Revyl is available as MCP tools (start_device_session, screenshot, device_tap, etc.), not a CLI binary. The bash-based `revyl auth status` check always failed because there's no `revyl` in PATH. Now the skill tells Claude to check for Revyl MCP tool availability directly — if the tools exist in the conversation context, always use Revyl for mobile QA. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The revyl CLI is installed on the user's machine — detection should check `command -v revyl` in bash. Previous commit wrongly switched to MCP tool detection which doesn't work in bash context. Now: if `revyl` CLI exists in PATH → REVYL_READY, always preferred over Appium. Auth status printed for diagnostics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a mobile project is detected but revyl CLI isn't installed, AskUserQuestion now tells the user how to install it and offers three options: install now, use local Appium, or skip mobile QA. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The skill templates used `revyl screenshot` but the actual CLI command is `revyl device screenshot --out <path>`. All device interaction lives under the `device` subcommand. Also adds --out flag for explicit output path control. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Static mode fallback works perfectly — this is a DX improvement for reusing an existing Metro process instead of starting a conflicting one. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bundle IDs and simulator UDIDs are passed to shell commands via string interpolation. Validate they don't contain shell metacharacters to prevent command injection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…llowing - DRY: pointer action construction was duplicated 4x (performClick, tapCoordinates, fill coordinate fallback, scroll). Extract tapAction() and swipeAction() helpers. - findElement() now distinguishes "no such element" (returns null) from actual errors like timeouts and network failures (rethrows). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Alive - server.ts: tap command now validates args are valid numbers before passing to tapCoordinates, preventing silent NaN propagation. - cli.ts: isPidAlive now returns true for EPERM (process exists but different user), false only for ESRCH (process doesn't exist). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
browse-mobile source changes now trigger QA evals and the new browse-mobile-basic test category. Rebuilt dist with all fixes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- swipe: add --x 220 --y 500 (required start coordinates) - type: add --target param (single command, no separate tap needed) - dev loop: detect existing Metro on :8081, verify it's node/metro before killing to avoid port conflict with Revyl - Update all command references across gen-skill-docs.ts and both qa/qa-only templates for consistency - Add TODO for Revyl command table validation test (P2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Auto-detect Revyl auth status and run `revyl auth login` if needed instead of passive prose instruction - Add Revyl permissions to Claude Code settings (Step 0) so commands don't trigger 30-50 permission prompts per QA session - Detect Xcode before attempting local build; try EAS cloud build as fallback; give clear guidance if neither is available - Add cost/billing note for Revyl cloud device sessions - Add TODO for headless/CI auth environments (P3) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…l QA Real-world QA testing revealed 6 issues: 1. revyl dev start reports "ready" with broken tunnel — now parse HMR diagnostics and fall back to static mode if all checks fail 2. App loads from cached build with no hot reload — now detect and warn 3. Background process polling was undocumented — add explicit 5s poll loop 4. revyl dev stop doesn't exist — document kill procedure 5. Session times out during fix phases — add keepalive guidance 6. Permission check was weak (grep count) — now checks specific patterns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… cache When HMR diagnostics fail but the app still launches, compare the on-device build's git SHA against HEAD. If they differ, explicitly warn that testing is on stale code and force static mode rebuild. This catches the most dangerous failure mode: app appears to work but recent changes are invisible. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cloudflare tunnel DNS is inherently racy — first attempt often fails. Now the skill retries once (kill → wait 5s → restart) before falling back to static mode. Also adds direct DNS resolution check via nslookup before HTTP polling, which catches the root cause faster than waiting for curl timeouts. The flow is now: attempt 1 → verify HMR + DNS → if broken, retry → attempt 2 → if still broken, stale build check → static fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace AskUserQuestion permission prompts with automatic setup. Both /qa and /qa-only now auto-add a comprehensive set of allow rules to ~/.claude/settings.json on first run, covering browse, revyl, appium, git, curl, and all other commands used during QA. Uses a marker comment to only run once. Also expanded the Revyl permission list to include nslookup, xcode-select, npx eas, and other commands added in recent fixes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes from live mobile QA testing: 1. Priority flip: check local simulator first (0s setup), then DerivedData Debug build (~30s), then Revyl cloud devices. Solo devs with the app already running skip Revyl entirely. 2. Fast-fail tunnel DNS: single 15s DNS check instead of 120s x2 retry loop. If tunnel is dead, fall back immediately instead of burning 4+ minutes. 3. Debug builds instead of Release: much faster to build, likely already cached in DerivedData from normal dev work. Release builds are unnecessary for QA testing. Net effect: mobile QA setup drops from ~10 min to ~30s for devs with local tooling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reverts priority flip (local sim first) — Revyl's AI-grounded targeting is too valuable to skip. Keeps fast-fail DNS (15s) and Debug builds. Also fixes ~/.claude/ path leaking into Codex-generated SKILL.md files: - Settings path now transformed to ~/.codex/ during codex generation - Browse-mobile permission uses ctx.paths.skillRoot - Single host-aware cat permission entry Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The polling grep matched "failed" in HMR diagnostic lines like "[hmr] Metro health: FAILED" and treated them as fatal errors, killing a working dev loop that was still provisioning the device. Now only fatal errors (panic, process died, ENOSPC) trigger DEV_LOOP_FAILED. HMR warnings emit DEV_LOOP_HMR_WARNING instead — the device continues provisioning and loads from the cached build. Hot reload is degraded but QA testing can proceed immediately. This was the root cause of the 10-minute wasted setup: the skill killed the process twice over non-fatal warnings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The QA skill's auto-configure step was missing permissions for variable assignments (METRO_PID=, TUNNEL_URL=, etc.), shell constructs (for, if, [), and common tools (echo, ps, sed, head, etc.). Commands starting with these prefixes would prompt for approval, breaking automation. Added ~60 new permission patterns covering all commands used in the QA and Revyl mobile flows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…solver system Upstream refactored gen-skill-docs.ts into scripts/resolvers/ modules. Port our mobile QA code into the new architecture: - Create scripts/resolvers/mobile-qa.ts (BROWSE_MOBILE_SETUP + mobile QA sections) - Inject mobile sections into generateQAMethodology via generateMobileQASections() - Register BROWSE_MOBILE_SETUP in resolver index - Fix codex path leak: add catch-all ~/.claude/ → ~/.codex/ replacement - Fix zsh glob safety: use find instead of ls for variant-*.png - Sync package.json version to 0.13.2.0 matching VERSION file - Add browse-mobile-basic to E2E_TIERS - Resolve .gitignore, package.json, touchfiles.ts conflicts (both sides) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Conflicts: - design-shotgun/SKILL.md.tmpl: both sides fixed the same zsh glob safety issue differently (we used `find`, upstream used `setopt` guard). Kept `find` — avoids needing the setopt workaround entirely. - design-shotgun/SKILL.md: generated file, same resolution as template. - package.json: version 0.13.2.0 (ours) vs 0.13.3.0 (upstream). Took upstream's 0.13.3.0 since it's the newer release. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds full mobile QA support to the
/qaand/qa-onlyskills. Claude Code can now test iOS and Android apps on real cloud-hosted devices via Revyl, with a local Appium + iOS Simulator fallback.What's new for users:
/qaon any React Native / mobile project and it automatically detects mobile, provisions a cloud device, builds your app, uploads it, and runs a full QA pass — zero manual setup--target "the Sign In button") instead of brittle element selectorsKey Components
browse-mobile/— New Appium-backed mobile driver/execute/syncendpoint)@e1,@e2, etc.)scripts/gen-skill-docs.ts— QA template generator updatesgenerateBrowseMobileSetup()— detects mobile projects, checks Revyl CLI, provisions devicesgenerateQAMethodology()— full Revyl interaction flow with AI-grounded targeting[hmr] Metro health: FAILEDwarnings)Template changes
qa/SKILL.md.tmpl+qa-only/SKILL.md.tmpl— mobile QA integrationSKILL.md.tmpl(root) — documents mobile QA capabilitiesSKILL.mdfiles regeneratedBug Fixes (from 7 rounds of live QA testing)
swipe,type)revyl device swipe/typefatal|panic|exited with)Also Included (from merged PRs on this branch)
/csov2 — infrastructure-first security audit (secrets archaeology, supply chain, CI/CD)/review→/shiphandoff fixFiles Changed
browse-mobile/(5 source files, 5 test files, built binary)scripts/gen-skill-docs.ts(+600 lines), allSKILL.mdfiles regeneratedtest/skill-e2e-cso.test.ts,test/gen-skill-docs.test.tsadditionsHow to Try It
Requires: Revyl CLI for cloud devices, or Xcode + Appium for local fallback.
Test Plan
bun testpasses (skill validation, gen-skill-docs quality, browse integration)~/.claude/leaks in Codex output)bun run test:evals(LLM judge + E2E)🤖 Generated with Claude Code