Skip to content

iam: iamclient: fix iamclient reconnection timeout#185

Open
mykola-kobets-epam wants to merge 1 commit intoaosedge:developfrom
mykola-kobets-epam:fix-iam-client-reconnection-timeout
Open

iam: iamclient: fix iamclient reconnection timeout#185
mykola-kobets-epam wants to merge 1 commit intoaosedge:developfrom
mykola-kobets-epam:fix-iam-client-reconnection-timeout

Conversation

@mykola-kobets-epam
Copy link
Collaborator

No description provided.

@mykola-kobets-epam mykola-kobets-epam force-pushed the fix-iam-client-reconnection-timeout branch from 479b8fc to 0a62582 Compare March 20, 2026 08:47

namespace {

constexpr auto cKeepAliveTime = std::chrono::seconds(10);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are default parameters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added commentds with default arguments

Copy link
Member

@mlohvynenko mlohvynenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed-by: Mykhailo Lohvynenko <mykhailo_lohvynenko@epam.com>

constexpr auto cKeepAliveTime = std::chrono::seconds(10);
constexpr auto cKeepAliveTimeout = std::chrono::seconds(3);
constexpr auto cInitialReconnectBackoffMs = 1000;
constexpr auto cMaxReconnectBackoffMs = 3000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using aos::Duration for these plain consts

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@codecov
Copy link

codecov bot commented Mar 20, 2026

Codecov Report

❌ Patch coverage is 92.68293% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@de726fb). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/common/pbconvert/sm.cpp 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop     #185   +/-   ##
==========================================
  Coverage           ?   82.55%           
==========================================
  Files              ?      325           
  Lines              ?    31098           
  Branches           ?     3221           
==========================================
  Hits               ?    25673           
  Misses             ?     5425           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

args.SetInt(GRPC_ARG_KEEPALIVE_TIMEOUT_MS,
std::chrono::duration_cast<std::chrono::milliseconds>(cKeepAliveTimeout).count());
args.SetInt(GRPC_ARG_KEEPALIVE_PERMIT_WITHOUT_CALLS, 1);
args.SetInt(GRPC_ARG_INITIAL_RECONNECT_BACKOFF_MS, cInitialReconnectBackoffMs);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set INITIAL_RECONNECT_BACKOFF_MS and MAX_RECONNECT_BACKOFF_MS, but MIN_RECONNECT_BACKOFF_MS is not set. In gRPC, MIN_RECONNECT_BACKOFF_MS maps to MIN_CONNECT_TIMEOUT (default ~20s), so reconnect attempts can still take much longer than the intended 1-3s window. Could we also set GRPC_ARG_MIN_RECONNECT_BACKOFF_MS (e.g. aligned with 1000-3000ms) to make reconnect behavior predictable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


args.SetInt(
GRPC_ARG_KEEPALIVE_TIME_MS, std::chrono::duration_cast<std::chrono::milliseconds>(cKeepAliveTime).count());
args.SetInt(GRPC_ARG_KEEPALIVE_TIMEOUT_MS,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keepalive tuning here is aggressive (10s/3s + PERMIT_WITHOUT_CALLS=1), but GRPC_ARG_HTTP2_MAX_PINGS_WITHOUT_DATA is not configured (default is 2). On idle channels this can effectively stop keepalive after 2 pings, and if server-side ping policy is stricter it may close the transport with GOAWAY/ENHANCE_YOUR_CALM. Could we set GRPC_ARG_HTTP2_MAX_PINGS_WITHOUT_DATA explicitly (often 0) and confirm server-side keepalive settings are compatible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mykola-kobets-epam mykola-kobets-epam force-pushed the fix-iam-client-reconnection-timeout branch 2 times, most recently from 87963f2 to e1af06d Compare March 20, 2026 10:11
Copy link
Collaborator

@al1img al1img left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>

Copy link

@MykolaSuperman MykolaSuperman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed-by: Mykola Solianko <mykola_solianko@epam.com>

@mykola-kobets-epam mykola-kobets-epam force-pushed the fix-iam-client-reconnection-timeout branch from e1af06d to 9c82f9a Compare March 20, 2026 10:32
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
50.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

KEEPALIVE_TIME_MS: Send a ping every 10 seconds to detect broken
connections faster.
KEEPALIVE_TIMEOUT_MS: Treat the connection as dead after 3 seconds without
a ping response.
KEEPALIVE_PERMIT_WITHOUT_CALLS: Keep sending keepalive pings even when
there are no gRPC messages.
INITIAL_RECONNECT_BACKOFF_MS: After a disconnect, keep the first reconnect
delay short (1 second).
MAX_RECONNECT_BACKOFF_MS: Even after many failures, cap reconnect delay at
3 seconds so retries remain frequent.

Signed-off-by: Mykola Kobets <mykola_kobets@epam.com>
Reviewed-by: Mykhailo Lohvynenko <mykhailo_lohvynenko@epam.com>
Reviewed-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Reviewed-by: Mykola Solianko <mykola_solianko@epam.com>
@mykola-kobets-epam mykola-kobets-epam force-pushed the fix-iam-client-reconnection-timeout branch from 9c82f9a to 57a0e8c Compare March 20, 2026 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants