Skip to content

Downgrade SQL Server AOAG failover logs from error to warn#35868

Open
bosconi wants to merge 1 commit intomainfrom
claude/fix-issue-9986-s9qbe
Open

Downgrade SQL Server AOAG failover logs from error to warn#35868
bosconi wants to merge 1 commit intomainfrom
claude/fix-issue-9986-s9qbe

Conversation

@bosconi
Copy link
Copy Markdown
Member

@bosconi bosconi commented Apr 4, 2026

Summary

  • Downgrade tracing::error!("found uninitialized LSN!") to tracing::warn! in src/sql-server-util/src/cdc.rs
  • Replace mz_ore::soft_panic_or_log!("upstream SQL Server went backwards in time...") with tracing::warn! in src/storage/src/source/sql_server/progress.rs

During SQL Server Always On Availability Group (AOAG) failovers, these two conditions can occur and are expected/recoverable, but were previously logged at error level — triggering unnecessary Sentry alerts. Both code paths already handle the conditions gracefully (skip and continue).

Fixes MaterializeInc/database-issues#9986

Test plan

  • cargo check -p mz-sql-server-util -p mz-storage compiles cleanly
  • Verify Sentry alerts stop firing for AOAG failover events after deployment
  • Existing SQL Server source tests continue to pass (no behavioral change — only log level changed)

https://claude.ai/code/session_015rRQ5puqtBufPEQ5RoUtjt

During SQL Server Always On Availability Group failovers, two conditions
can occur that are expected and recoverable but were previously logged at
error level, triggering unnecessary Sentry alerts:

1. Uninitialized LSN in CDC stream - when a capture instance temporarily
   has no LSN after failover
2. Upstream LSN going backwards - when the new primary reports a lower
   max LSN than previously seen

Both cases are already handled gracefully (skip and continue), so
downgrade from error/soft_panic_or_log to warn level.

Fixes MaterializeInc/database-issues#9986

https://claude.ai/code/session_015rRQ5puqtBufPEQ5RoUtjt
@bosconi bosconi requested a review from a team as a code owner April 4, 2026 02:14
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants