Fix all metrics by aditya1702 · Pull Request #545 · stellar/wallet-backend

aditya1702 · 2026-03-19T14:55:48Z

What

[TODO: Short statement about what is changing.]

Why

[TODO: Why this change is being made. Include any context required to understand the why.]

Known limitations

[TODO or N/A]

Issue that this PR addresses

[TODO: Attach the link to the GitHub issue or task. Include the priority of the task here in addition to the link.]

Checklist

PR Structure

It is not possible to break this PR down into smaller PRs.
This PR does not mix refactoring changes with feature changes.
This PR's title starts with name of package that is most changed in the PR, or all if the changes are broad or impact many packages.

Thoroughness

This PR adds tests for the new functionality or fixes.
All updated queries have been tested (refer to this check if the data set returned by the updated query is expected to be same as the original one).

Release

This is not a breaking change.
This is ready to be tested in development.
The new functionality is gated with a feature flag if this is not ready for production.

Introduce operation-level Prometheus collectors (operation duration histogram, operations counter, in-flight gauge, response size histogram) and rename the constructor to NewGraphQLMetrics. Replace heavy per-field timing/counters with a lightweight deprecated-field counter and complexity/response histograms to reduce cardinality and provide SLO-friendly metrics. Add GraphQLOperationMetrics middleware to record duration, throughput, errors and response size; add tests for operation and field middleware and update existing tests and registrations. Wire the new operation and field middlewares into the server handler.

Refactors Prometheus ingestion metrics and updates instrumentation across ingestion code. Duration was changed from a HistogramVec to a Histogram (calls updated), several metric names were renamed (ledgers/transactions/operations totals), BatchSize removed, and new metrics added: LagLedgers, LedgerFetchDuration, RetriesTotal, RetryExhaustionsTotal, ErrorsTotal (and adjusted Participants metric name/buckets). Instrumentation now observes ledger fetch duration, increments retry and exhaustion counters in fetch/flush/persist paths, reports errors on live ingestion failures, and updates lag when available. Tests updated to match new metric types, bucket counts, and include unit tests for the new metrics.

Refactor and expand RPC Prometheus instrumentation for better SLOs and observability. - Replace per-endpoint summary metrics and separate success/failure counters with: - wallet_rpc_request_duration_seconds (HistogramVec by method) - wallet_rpc_request_duration_seconds and wallet_rpc_method_duration_seconds use explicit rpcDurationBuckets - wallet_rpc_requests_total now has (method,status) labels for success/failure - Add wallet_rpc_in_flight_requests (Gauge) and wallet_rpc_response_size_bytes (HistogramVec) - Convert MethodDuration to a histogram and keep MethodErrorsTotal and MethodCallsTotal counters - Update registration to include new collectors and remove deprecated ones. - Update tests to assert new metrics, add histogram and bucket checks, and adjust transport counter tests to use (method,status) labels. - RPC service changes: - Remove heartbeat channel accessor from the interface and implementation - GetHealth now sets ServiceHealth and LatestLedger based on response and marks health=0 on errors - sendRPCRequest now tracks InFlightRequests, observes RequestDuration, records ResponseSizeBytes, and increments RequestsTotal with success/failure labels instead of old endpoint counters These changes improve latency and size visibility, simplify error/success accounting, and provide gauges useful for detecting RPC node stalls or connection exhaustion.

Replace the pond pool "channel" label with a clearer "pool_name" label and rename the RegisterPoolMetrics parameter accordingly. Update pool metrics (use wallet_pool_tasks_dropped_total instead of tasks_completed) and tests to reflect the label/name changes. Add extensive documentation comments and new Prometheus metrics for pgxpool (constructing_conns gauge, acquire/empty-acquire counters, wait time counters, new_conns/canceled/max_lifetime/max_idle destroy counters) and improve help text for several metrics to provide better observability of pool and DB connection behavior.

Expose pgx.QueryExecMode on PoolConfig and apply it when opening the connection pool. If non-zero, the value is copied into cfg.ConnConfig.DefaultQueryExecMode so callers can override pgx's default (cached prepared statements). The serve config now sets QueryExecMode to Exec to avoid server-side prepared statement caching which conflicts with PgBouncer in transaction pooling mode (SQLSTATE 42P05), and imports github.com/jackc/pgx/v5.

aditya1702 added 12 commits March 19, 2026 09:36

refactor db metrics

3d65ee8

Merge branch 'feature/remove-metrics-interface' into feature/db-metrics

4c6d429

fix db test

14661b9

Create graphql_field_metrics_test.go

bc117ef

make check

12f661a

Add comments for DB metrics

40655b2

Update rpc.go

9908e41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix all metrics#545

Fix all metrics#545
aditya1702 wants to merge 12 commits intofeature/remove-metrics-interfacefrom
feature/fix-all-metrics

aditya1702 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aditya1702 commented Mar 19, 2026

What

Why

Known limitations

Issue that this PR addresses

Checklist

PR Structure

Thoroughness

Release

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant