Skip to content

Fix classification of disposal-during-reconnect as cancelled, not failed#592

Open
jpablo2002 wants to merge 3 commits intomainfrom
dev/acostajuan/reconnect-cancelled-logging
Open

Fix classification of disposal-during-reconnect as cancelled, not failed#592
jpablo2002 wants to merge 3 commits intomainfrom
dev/acostajuan/reconnect-cancelled-logging

Conversation

@jpablo2002
Copy link
Copy Markdown

Problem

We currently see multiple host_reconnect_failed errors with ObjectDisposedException: RelayTunnelConnector is disposed. These are logged at error severity and pollute dashboards, but they are not actual reconnect failures — the host application called DisposeAsync() while the SDK's reconnect retry loop was in flight, cancelling it.

The causal chain is:

  1. Relay pod restarts → WebSocket drops → SSH Closed event fires
  2. SDK starts ReconnectAsync(DisposeToken) as a background task
  3. MaybeStartReconnecting sets ConnectionStatus = Connecting, which fires ConnectionStatusChanged synchronously to the host app
  4. Host app (e.g. Codespaces Agent) reacts to the status change by cancelling a token or exiting an await using scope → DisposeAsync() is called
  5. DisposeAsync() cancels disposeCtsDisposeToken cancelled → reconnect retry loop aborted
  6. OperationCanceledException is wrapped as ObjectDisposedException by TryAdjustCancellation
  7. ReconnectAsync catches it and reports host_reconnect_failed at error severity — indistinguishable from a genuine failure

Fix

In ReconnectAsync, check whether the exception was caused by disposal-cancellation before choosing the event name and severity:

  • DisposeToken.IsCancellationRequested AND ex is ObjectDisposedException or OperationCanceledException → report host_reconnect_cancelled at warning severity
  • Otherwise → report host_reconnect_failed at error severity (unchanged)

Both conditions must be true to classify as cancelled. If DisposeAsync() was called but the reconnect had already failed with a real error (e.g. WebSocketException, UnauthorizedAccessException), the exception type won't match and it's correctly reported as a failure.

Files Changed

  • cs/src/Connections/TunnelRelayConnection.csReconnectAsync catch block now distinguishes disposal-cancelled from genuine failure when reporting telemetry events
  • cs/test/TunnelsSDK.Test/Mocks/MockTunnelManagementClient.csReportEvent now captures events in a ReportedEvents list for test assertions
  • cs/test/TunnelsSDK.Test/TunnelHostAndClientTests.cs — two new tests:
    • DisposeDuringReconnectReportsCancelledEvent — verifies disposal during reconnect reports _reconnect_cancelled (warning)
    • ReconnectFailureReportsFailedEvent — verifies a genuine reconnect failure reports _reconnect_failed (error)

juanpacostaaa added 2 commits March 27, 2026 14:30
…and fixed event logging of reconnect failed vs cancelled to warn correctly.
@jpablo2002 jpablo2002 self-assigned this Mar 27, 2026
@jpablo2002
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree company="Microsoft"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant