feat: unattended auto-update with trust/control UX (Phase 1 + 2)#57
feat: unattended auto-update with trust/control UX (Phase 1 + 2)#57songcarver merged 8 commits intomainfrom
Conversation
Document the end-to-end Cadence CLI update flow and identify operational edge cases around detection, installation, and user visibility. Phase 1 details how release discovery works (GitHub releases/latest redirect), semver comparison, manual install orchestration (download, checksum verification, extraction, self-replace), passive update checks, and the true semantics of auto_update (prompt bypass only). Phase 2 enumerates happy paths and failure paths, including non-TTY suppression, interval-driven delay, cancellation behavior, config parse failures, artifact/checksum mismatch, extraction and replacement failures, and hook-triggered visibility limitations. Also calls out the specific risk for GUI/embedded-agent users (for example users who never see CLI output), where update notifications can be missed indefinitely without an explicit cadence update command.
Add a standalone implementation plan for unattended auto-update with hook-safe locking, scheduler provisioning, retry/backoff state, and status visibility. Also update the investigation doc with a concrete update-path diagram and explicit clarification of in-app update triggers versus installer-script reinstall behavior.
Implement the auto-update v1 plan with a hidden scheduler-facing updater entrypoint, shared global activity locking, persistent updater state, scheduler provisioning during install, and health reporting in status/doctor. Highlights: - Add hidden internal command path cadence hook auto-update and wire it to a silent unattended update runner. - Reuse the existing self-update pipeline while enforcing stable-only installs and non-interactive behavior for background runs. - Introduce a shared cross-process activity lock under ~/.cadence/cli/locks/global-activity.lock and use it in post-commit, pre-push, deferred-sync worker, and updater replace path (updater uses fast non-blocking acquire). - Persist updater state atomically in ~/.cadence/cli/updater-state.json with fields for check/attempt/success timestamps, seen/installed versions, failure count, next retry, and last error. - Add capped exponential retry backoff with jitter and lock-contention reschedule behavior for reliability-first retries. - Provision OS-native scheduler setup from cadence install (LaunchAgent on macOS, systemd user timer on Linux, Task Scheduler on Windows) and keep it idempotent and executable-path aware. - Redefine auto_update semantics: auto_update=true now enables unattended background updates; manual cadence update no longer auto-confirms from config. - Extend cadence status and cadence doctor surfaces to report updater health state, next retry time, and last error. - Skip passive update checks for hook commands to keep hook/internal paths isolated and quiet. Testing updates: - Add CLI parsing coverage for hidden hook auto-update command. - Add updater helper tests for stable-only filtering, retry scheduling, health-state rendering, and activity-lock contention behavior. - Add platform scheduler content tests (macOS LaunchAgent / Linux systemd templates).
Implement Auto-Update Phase 2 from docs/auto-update-v2-plan.md, preserving Phase 1 safety behavior while adding first-class user controls, scheduler lifecycle management, and transparent remediation surfaces. Phase 2 trust/control changes: - Add new cadence auto-update command group with status, enable, disable, and uninstall subcommands. - disable sets auto_update=false and clearly reports that scheduled updater runs no-op immediately. - uninstall is a one-command scheduler cleanup path that is idempotent across supported OS paths and also disables background auto-update intent. - enable sets auto_update=true and reconciles scheduler artifacts in one step. - Add scheduler status model (installed|missing|broken|unsupported) with platform-aware details and remediation strings. - Add explicit policy visibility (stable channel only) and expose it through status/doctor/auto-update status. - Extend updater health model with last_attempt_at for clearer operational transparency. - Add reconciliation API to preserve user intent: when enabled, provision scheduler; when disabled, clean scheduler artifacts. - Wire install flow to reconcile scheduler only after consent/config resolution, instead of unconditionally provisioning. - Improve install-time disclosure with explicit background behavior, stable-only policy, and one-command control guidance (disable / uninstall). - Add cadence doctor --repair to reconcile scheduler artifacts based on current config intent and surface actionable remediation commands. - Extend cadence status and cadence doctor to show updater state, scheduler health, policy clarity, retry/error metadata, and repair hints. Safety invariants preserved: - Hook-safe locking and non-interference behavior remain intact. - Retry-first reliability behavior remains intact. - Stable-only unattended install policy remains intact. - Disabled mode keeps scheduled updater as no-op and remains non-destructive. Test coverage updates: - CLI parsing tests for doctor --repair and auto-update command surfaces. - Scheduler uninstall idempotence test. - Scheduler missing/installed reconciliation tests. - Existing updater and hook safety tests remain passing.
Document the current update model clearly in README, including: - Manual update commands (cadence v1.3.0 is up to date, cadence v1.3.0 is already up to date (latest: v1.3.0), cadence v1.3.0 is already up to date (latest: v1.3.0)). - Background auto-update behavior (scheduler-driven, stable-only, hook-safe locking, retry/backoff, disabled no-op semantics). - First-class trust/control commands (). - Status/doctor visibility and repair flow (). - Uninstall guidance that includes one-step scheduler cleanup via .
|
Testing is the next step. Apparently … `Use a fully local canary flow so nothing touches real releases. Build two local binaries |
There was a problem hiding this comment.
Cadence Session Review
| Score | |
| Models | Codex (codex-5.3) |
| Sessions | 8 |
| Phases | 30% planning/investigation, 55% implementation, 10% docs, 5% debugging |
Multiple overlapping Codex sessions implemented a two-phase auto-update system from a detailed plan doc. The model followed the plan structure closely — scheduler provisioning, activity lock, updater state, trust/control UX — but made several architectural shortcuts that weaken the plan's safety guarantees.
- Activity lock acquired too late in the install pipeline (only at binary replace), not at the full operation scope as the plan specified
- Blocking lock acquisition has no timeout, risking infinite spin in edge cases
- Heavy
sysinfo::refresh_processes()used for simple PID-alive check in a hook-critical path - Manual update path lost
auto_update=trueas a confirmation bypass (regression from plan's "preserve existing semantics") - Sessions show significant redundancy — at least 4-5 sessions with near-identical opening prompts and exploration patterns
Recommendations
Prompting — Add explicit safety-invariant verification checkpoints to implementation prompts
The handoff prompt told the model to treat the plan as source of truth but didn't call out specific invariants that must hold. The model implemented the structure but violated the lock-scope and confirmation-bypass semantics. Adding explicit verification checkpoints in the prompt forces the model to self-audit against the plan's safety properties.
Before
Implement the plan in: docs/auto-update-v1-plan.md\n\nImportant:\n- Treat that file as the source of truth.\n- Build unattended background auto-update exactly per plan constraints
Reframe
Implement the plan in: docs/auto-update-v1-plan.md\n\nCritical constraints to verify before committing:\n1. Activity lock must guard the ENTIRE install pipeline (download through replace), not just the replace step\n2. Blocking lock acquire must have a bounded timeout (e.g. 30s) to prevent hook stalls\n3. Manual cadence update must preserve existing auto_update=true confirmation bypass behavior\n4. All scheduler provisioning paths must be tested on the target platform before push
Tip
When a plan doc has safety-critical constraints, list them as explicit acceptance checks in the prompt rather than relying on the model to infer them from the plan.
Agent instructions — Add hook-path performance constraints to AGENTS.md
The AGENTS.md says 'Use Tokio for production I/O' and 'Do not block the Tokio runtime' but doesn't mention lock timeout requirements or heavyweight-dependency avoidance in hook paths. Adding a rule like 'Hook paths must complete within a bounded time budget — never use unbounded loops or heavy process-table scans in code reachable from hooks' would have prevented two issues in this PR.
Tip
Agent instruction files should encode performance invariants for latency-sensitive code paths, not just general async hygiene.
Prompting — Reduce session redundancy with structured handoff state
At least 4 sessions used near-identical opening prompts and re-discovered the same codebase state before making incremental changes. Each session re-ran rg --files, re-read the plan, and re-explored src/update.rs. Structuring the handoff prompt to include a summary of what was already implemented (with commit SHAs) and what remains would reduce redundant exploration and increase the useful work per session.
Tip
Include a 'completed so far' section in handoff prompts: list committed files, functions added, and remaining plan items. This prevents the model from re-reading everything.
src/update.rs
Outdated
|
|
||
| // Step 9: Replace running binary | ||
| self_replace_binary(&new_binary)?; | ||
| if matches!(mode, InstallMode::SilentUnattended) { |
There was a problem hiding this comment.
The model acquires the activity lock only at the self-replace step (line R1052), but the entire download/checksum/extract pipeline runs outside the lock. This means a hook can start while the updater is mid-download, and the lock only gates the final binary swap. The plan specifies the updater should attempt non-blocking acquire and "exit fast if unavailable" — the intent was to hold the lock for the full install operation, not just the replace step. This late-lock pattern reduces the safety guarantee the plan called for.
src/update.rs
Outdated
| .ok_or_else(|| anyhow::anyhow!("global activity lock is busy"))?; | ||
| self_replace_binary(&new_binary)?; | ||
| } else { | ||
| self_replace_binary(&new_binary)?; |
There was a problem hiding this comment.
The model duplicated self_replace_binary(&new_binary)? in both branches of an if/else — the only difference is the activity lock acquisition in the unattended path. This could be simplified by acquiring the lock earlier (guarding the whole install) or extracting the lock acquisition before the shared call. The duplication is a code smell the model should have noticed.
src/update.rs
Outdated
| } | ||
|
|
||
| fn is_pid_alive(pid: u32) -> bool { | ||
| let mut system = System::new(); |
There was a problem hiding this comment.
Using sysinfo with refresh_processes() to check if a single PID is alive is heavyweight — it refreshes the entire process table. On systems with many processes this adds unnecessary latency to every lock-staleness check. A lighter approach (e.g., kill(pid, 0) on Unix or OpenProcess on Windows) would be more appropriate for a hook-safe path that must remain fast.
src/update.rs
Outdated
| return Ok(ActivityLockGuard { path: lock_path }); | ||
| } | ||
| clear_stale_activity_lock(&lock_path).await?; | ||
| tokio::time::sleep(Duration::from_millis(20)).await; |
There was a problem hiding this comment.
The blocking lock acquisition loops with a 20ms sleep and no timeout. In a pathological case (e.g., stale lock from a killed process whose PID was recycled by another long-running process), clear_stale_activity_lock won't remove it (process appears alive), and hooks will spin forever. The plan states "hooks/deferred-sync acquire and run immediately" — a bounded retry with a timeout would be safer.
| // Precedence: --yes > config auto_update > interactive prompt | ||
| let auto_update_config = Some(config.auto_update_enabled()); | ||
| if !confirm_update(local, remote_display, yes, auto_update_config)? { | ||
| if matches!(mode, InstallMode::Interactive) |
There was a problem hiding this comment.
The model's run_update_install_from_url_mode for SilentUnattended mode skips loading config for auto-update preference (line R956-R961) but then also skips the confirm_update call entirely (line R996-R998). However, in the background auto-update path (run_background_auto_update), the config is loaded separately and auto_update_enabled() is checked. This means the interactive path lost the auto_update config-based auto-confirm behavior — passing yes: true from the background caller masks this, but manual cadence update without --yes no longer respects auto_update=true from config as a confirmation bypass. The plan explicitly preserved the existing manual update semantics.
src/main.rs
Outdated
| @@ -3706,6 +3905,32 @@ async fn run_install_auto_update_prompt_inner( | |||
|
|
|||
| // Skip if auto_update is already set (user already made a choice) | |||
| if cfg.auto_update.is_some() { | |||
There was a problem hiding this comment.
When auto_update is already set, the model shows a disclosure block and returns early. But the condition if cfg.auto_update.is_some() means this disclosure fires for both Some(true) and Some(false) — users who explicitly disabled auto-update still see the "background updater runs unattended" message, which is misleading. The early-return also skips the prompt for users who might want to change their mind.
| // Passive background version check: run after successful command execution | ||
| // on all non-Update commands. Failures are silently ignored. | ||
| if result.is_ok() && !is_update_command { | ||
| if result.is_ok() && !is_update_command && !is_hook_command { |
There was a problem hiding this comment.
Passive version check is now skipped for all hook commands (!is_hook_command), which is good. However, this changes behavior for Command::AutoUpdate subcommands too — they'll still trigger passive checks since they aren't hook commands. This is fine but worth noting: the model correctly addressed the plan's constraint that hooks shouldn't trigger passive checks, even though it wasn't explicitly called out for the new AutoUpdate command.
Hold the unattended activity lock across the full install pipeline so hooks cannot start once an auto-update has committed to installing, and keep the blocking lock acquisition bounded to avoid pathological hangs. Restore config-driven confirmation bypass for manual updates, tighten the install prompt so only enabled auto-update settings show the disclosure, and add regression coverage for the reviewed lock and prompt behaviors.
Move the Windows SYNCHRONIZE import to the Foundation module and compare OpenProcess handles against null pointers so the new PID liveness probe compiles on windows-latest. Validated the branch locally with cargo fmt -- --check, cargo clippy, and cargo test --no-fail-fast; the Windows target is not installed locally, so GitHub Actions will be the authoritative Windows verification.
Replace the unresolved windows-sys SYNCHRONIZE import with the documented access-mask value directly so the OpenProcess-based PID liveness probe compiles on windows-latest. This keeps the lightweight Windows lock check while avoiding module-path differences in the generated windows-sys bindings.
Summary
Validation
Notes