Skip to content

fix: concurrency/correctness fixes in refresh pipeline + auto-refresh improvements#95

Merged
devnullvoid merged 7 commits intodevnullvoid:developfrom
al-bashkir:master
Mar 8, 2026
Merged

fix: concurrency/correctness fixes in refresh pipeline + auto-refresh improvements#95
devnullvoid merged 7 commits intodevnullvoid:developfrom
al-bashkir:master

Conversation

@al-bashkir
Copy link
Contributor

@al-bashkir al-bashkir commented Mar 6, 2026

Changes

Interactive startup profile chooser

  • Added an interactive TUI chooser shown at startup when no default profile is configured
  • Added the ability to toggle off a previously set default profile from within the profile menu
  • New internal/bootstrap/bootstrap.go package encapsulates startup profile resolution logic

Refresh pipeline rework

  • Replaced atomic.Bool refreshInProgress with atomic.Uint64 refreshGen generation token; added startRefresh / forceNewRefresh / endRefresh helpers
  • Added sync.RWMutex connMu protecting client, groupManager, isGroupMode, isClusterMode; all background goroutines use snapConn() for race-free reads
  • enrichGroupNodesParallel now accumulates results in a local slice and swaps into GlobalState in a single QueueUpdateDraw callback
  • autoRefreshDataWithFooter captures UI selection state (captureSelections() / selSnap) on the UI goroutine before spawning the background goroutine
  • VM enrichment callback stale guard changed to current != 0 && current != token so it aborts only on a newer refresh, not on normal completion
  • Retry errors from the guest-agent retry loop in GetFastFreshClusterStatus are now aggregated into enrichErr and passed to onEnrichmentComplete
  • Profile and cluster-group switches now use forceNewRefresh + doFastRefresh instead of manualRefresh
  • doManualRefresh and doFastRefresh split into separate functions; doEnrichNodes and enrichGroupNodesParallel accept a generation token

Auto-refresh

  • Added preserveVMEnrichmentData() helper that carries guest agent data (IPs, network interfaces, filesystems) and config details (boot order, CPU cores, storage devices, OS type, description) forward from GlobalState onto freshly-fetched VMs on each auto-refresh cycle
  • Removed enrichGroupNodesParallel from the group mode auto-refresh path; auto-refresh is now lightweight in both modes and releases the refresh guard synchronously
  • Auto-refresh countdown ticker replaced with time.NewTicker; spinner goroutine listens on autoRefreshCountdownStop instead of a separate channel

When no default_profile is configured and no --profile flag or
PVETUI_PROFILE env var is provided, the bootstrap now prompts the user
interactively instead of hard-coding a fallback to the literal name
'default' (which would error if no such profile existed).

- Remove the implicit 'default' name fallback in ResolveProfile; empty
  string is returned to signal that interactive selection is needed.
- Add promptProfileSelection: numbered terminal list (groups first, then
  individual profiles) with re-prompt on bad input and clean exit on
  EOF/Ctrl+C. Single profile with no groups is auto-selected silently.
- Pre-check in bootstrap: if default_profile resolves to a name that no
  longer exists (stale after rename/delete), warn and fall back to the
  interactive chooser instead of hard-erroring.
- setDefaultProfile now toggles: pressing s on the already-default entry
  clears default_profile to '' so the chooser runs on next startup.
- Connection Profiles dialog re-opens after s so the updated star
  indicator is immediately visible.
- Help bar updated: 's:toggle default'.
profiles: add interactive startup chooser and toggle-off default
Replace atomic.Bool refresh guard with generation token (atomic.Uint64)
to prevent stale callbacks from clearing another refresh's guard.
Add sync.RWMutex (connMu) protecting connection-state fields (client,
groupManager, isGroupMode, isClusterMode) that were read by background
goroutines without synchronization.

Specific issues fixed:

- WP1: refreshInProgress atomic.Bool replaced by refreshGen atomic.Uint64
  with startRefresh/forceNewRefresh/endRefresh helpers; generation token
  ensures only the owning callback can release the guard.

- WP2: connMu RWMutex guards all connection-state field writes; background
  goroutines use snapConn() to get a consistent lock-safe snapshot.
  deactivateGroupModes restructured to close resources outside the lock.

- WP3: enrichGroupNodesParallel accumulates enriched nodes in a local
  slice; single swap into GlobalState inside QueueUpdateDraw eliminates
  the concurrent write-to-shared-slice data race.

- WP4a: autoRefreshDataWithFooter captures UI selections (GetSelectedVM,
  GetSelectedNode, GetItemCount) on the UI goroutine before spawning the
  background goroutine; passed as selSnap struct to avoid tview reads
  off-thread.

- WP4b: doFastRefresh VM enrichment callback stale guard changed from
  'refreshGen != token' to 'current != 0 && current != token' so it
  aborts only on a newer refresh, not on normal completion; this fixes
  the header loading spinner never clearing and VM details not updating
  for the selected item after profile switch.

- WP4c: Retry errors in GetFastFreshClusterStatus guest agent retry loop
  are now aggregated into enrichErr and surfaced to the
  onEnrichmentComplete callback instead of being silently dropped.
Auto-refresh called GetLightClusterStatus() / GetGroupClusterResources()
which return VMs with basic metrics only. The resulting VM objects were
written directly into GlobalState, silently zeroing guest agent data
(IPs, network interfaces, filesystems) and config details (boot order,
CPU cores, storage devices, OS type, description) on every cycle.

Add preserveVMEnrichmentData() helper that builds an O(N) lookup map
from existing GlobalState VMs and carries all enrichment fields forward
onto freshly-fetched VMs. Applied in both single-profile and group mode
paths, matching the node-detail preservation pattern already in place.

Group mode auto-refresh called enrichGroupNodesParallel(), which makes
individual API calls to every node (version, disks, load avg, etc.) and
takes 10-20+ seconds on multi-node setups. The refresh guard was held
for the full duration, so the countdown timer always saw isRefreshActive
== true at expiry, reset to 10, and skipped — then on the next expiry
triggered another full enrichment. Result: perpetual spinner, countdown
never completed, auto-refresh effectively broken.

Remove enrichGroupNodesParallel from the auto-refresh path. Auto-refresh
is now lightweight in both modes: metrics update, enriched data is
preserved, UI lists and details update immediately, and the guard is
released synchronously inside QueueUpdateDraw. Full node enrichment
continues to run on manual refresh (R key) and profile switches only.
@devnullvoid devnullvoid self-assigned this Mar 7, 2026
@devnullvoid devnullvoid self-requested a review March 7, 2026 23:27
@devnullvoid devnullvoid changed the base branch from master to develop March 8, 2026 06:17
@devnullvoid
Copy link
Owner

Reviewed and tested locally against develop.

Local result:

Follow-up fixes applied locally:

  • fixed startup profile chooser behavior when default_profile is unset and a profile is literally named default
  • changed non-interactive startup chooser failure to return an error instead of exiting successfully
  • removed the dead fastRefresh wrapper that was failing lint
  • kept the mock VM config form body-size fix in place

Validation run locally on the merged develop result:

  • make code-quality
  • make test
  • make build

Local commits on top of origin/develop:

This PR should target develop, not master. It has been retargeted accordingly.

@devnullvoid devnullvoid merged commit cb274ab into devnullvoid:develop Mar 8, 2026
4 of 5 checks passed
@devnullvoid
Copy link
Owner

develop has now been pushed and includes both the PR merge and the follow-up fix commit.

Included commits on develop:

This will be included in the next release. Thanks for the contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants