To improve the observability of the NRI framework and its plugins, we need to provide basic runtime metrics. This is a requirement for v1.0. Currently this data is not collected which makes troubleshooting difficult.
This is a tracking issue to discuss the design and implementation of basic runtime metrics for NRI plugins.
Specifically, the NRI framework needs to collect and provide data such as:
- Plugin counts (registered, active, etc.)
- Success and failure counts for plugin invocations/adjustments
- General plugin health
In the v1.0 requirements doc it's mentioned that the NRI framework itself should not expose these metrics directly. Instead, it should be passed to the container runtime, which can then expose it using its existing mechanisms (Prometheus, OpenTelemetry, etc.) for consumption.
To improve the observability of the NRI framework and its plugins, we need to provide basic runtime metrics. This is a requirement for v1.0. Currently this data is not collected which makes troubleshooting difficult.
This is a tracking issue to discuss the design and implementation of basic runtime metrics for NRI plugins.
Specifically, the NRI framework needs to collect and provide data such as:
In the v1.0 requirements doc it's mentioned that the NRI framework itself should not expose these metrics directly. Instead, it should be passed to the container runtime, which can then expose it using its existing mechanisms (Prometheus, OpenTelemetry, etc.) for consumption.