Conversation
Contributor
|
|
4f56c52 to
9e9d075
Compare
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🔗 Commit SHA: 8b5b106 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
29265cb to
ca9bf94
Compare
64c1562 to
bc391b7
Compare
bc391b7 to
8a1102e
Compare
6971663 to
9240286
Compare
11b5f02 to
b825fd9
Compare
sameerank
commented
Mar 24, 2026
tests/ffe/test_flag_eval_metrics.py
Outdated
Comment on lines
+542
to
+548
| # INVALID_CONTEXT behavioral differences: | ||
| # - Python: Returns for nested dict/list attributes (PyO3 conversion failure) | ||
| # - Go: Flattens nested objects to dot notation instead | ||
| # - Ruby: Silently skips unsupported attribute types | ||
| # - Java: Returns only for null context, not nested attributes | ||
| # - .NET: Relies on native library; not yet standardized | ||
| # - JS: Does not use INVALID_CONTEXT at all |
Author
There was a problem hiding this comment.
I am unclear on how to reconcile the variety of ways that the SDKs handle invalid evaluation contexts. For now I'm just noting it down in the system tests code, and hopefully we can chip away at the @irrelevant decorators it as we keep working on the SDKs
I like the Python approach of returning the default value with reason "error" and code "invalid context", but my hunch is that the varying ways of binding with the Rust evaluator might mean that this isn't straightforward in other languages.
b825fd9 to
290334b
Compare
ffcec93 to
adc1a8d
Compare
d56dfe5 to
96216e9
Compare
6542ece to
5bf6fb3
Compare
f881c10 to
4277ea5
Compare
gh-worker-dd-mergequeue-cf854d bot
pushed a commit
to DataDog/dd-trace-py
that referenced
this pull request
Mar 26, 2026
## Description Adds `feature_flag.evaluations` OTel counter metric emitted on every flag evaluation, following the Go implementation pattern. **Requirements:** - `DD_METRICS_OTEL_ENABLED=true` must be set for metrics to emit - `openfeature-sdk>=0.8.0` (required for `finally_after` hook to receive evaluation details) **Changes:** - Implements FlagEvalMetrics class and FlagEvalHook for metrics tracking - Fixes flag not found behavior to return `Reason.ERROR` with `ErrorCode.FLAG_NOT_FOUND` when flag is not in existing config (aligns with Go/iOS SDKs) - Returns `Reason.DEFAULT` only when no configuration is loaded (preserving existing behavior) ## Testing ### Unit test parity with Go SDK The following tests mirror the Go SDK `flageval_metrics_test.go`: | Go Test | Python Test | |---------|-------------| | `TestRecord` (targeting match) | `test_record_basic_attributes` | | `TestRecord` (allocation key) | `test_record_with_allocation_key` | | `TestRecord` (empty allocation key) | `test_record_empty_allocation_key_not_included` | | `TestRecord` (error flag not found) | `test_record_with_error` | | `TestRecord` (disabled flag) | `test_record_disabled_reason` | | `TestRecordMultipleEvaluations` | `test_record_multiple_evaluations` | | `TestRecordDifferentFlags` | `test_record_different_flags` | | `TestRecordAllErrorTypes` | `test_record_all_error_types` | | `TestIntegrationEvaluate` (type mismatch) | `test_type_conversion_error_records_type_mismatch` | **Note:** Go tests use a real OTel test meter provider. Python unit tests (`TestFlagEvalMetrics`) use mocks for faster isolated testing, while `TestMetricsWithRealOTel` validates behavior with real OTel runtime. ### Python-specific tests (not in Go) - **`TestFlagEvalMetrics`**: OTel initialization tests (graceful handling when OTel not available), metrics disabled when `DD_METRICS_OTEL_ENABLED=false`, shutdown behavior - **`TestFlagEvalHook`**: Tests the hook mechanism (`finally_after` calls `metrics.record` with correct arguments) - **`TestProviderHooksIntegration`**: Tests provider hook registration, `get_provider_hooks()` returns correct hooks, cleanup on shutdown - **`TestMetricsWithRealOTel`**: Integration tests with real OTel runtime ### System tests - System tests PR: DataDog/system-tests#6545 (python tests passing) ## Risks - **API behavior change**: Flag evaluations for non-existent flags now return `Reason.ERROR` instead of `Reason.DEFAULT` when configuration is available. Release note added. - **Dependency upgrade**: Minimum `openfeature-sdk` version increased from 0.6.0 to 0.8.0. Users on older versions will need to upgrade. Release note added. ## Additional Notes Reference files in Go SDK: [flageval_metrics.go](https://github.com/DataDog/dd-trace-go/blob/main/openfeature/flageval_metrics.go), [flageval_metrics_test.go](https://github.com/DataDog/dd-trace-go/blob/main/openfeature/flageval_metrics_test.go), [provider.go](https://github.com/DataDog/dd-trace-go/blob/main/openfeature/provider.go), [provider_test.go](https://github.com/DataDog/dd-trace-go/blob/main/openfeature/provider_test.go) Co-authored-by: sameeran.kunche <sameeran.kunche@datadoghq.com>
Add comprehensive system tests for FFE (Feature Flagging and Experimentation) flag evaluation metrics. These tests verify that tracers emit correct feature_flag.evaluations OTel metrics with proper tags for: - Basic flag evaluation (flag key, variant, reason, allocation_key) - Multiple evaluations (correct count aggregation) - Different flags (separate metric series) - All resolution reasons (static, targeting_match, split, default, disabled) - Error codes (flag_not_found, type_mismatch, parse_error, provider_not_ready) - Lowercase consistency for tag values Also adds the feature_flags_eval_metrics feature declaration for tracer compatibility tracking.
…pe_mismatch NUMERIC and INTEGER are distinct types; evaluating a NUMERIC flag as INTEGER should return type_mismatch (not parse_error) to align with libdatadog FFE.
4277ea5 to
27cebe1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Add FFE (Feature Flagging & Experimentation) evaluation metrics tests to verify the
feature_flag.evaluationsOTel counter metric works correctly across SDKs.Related:
Changes
Test Infrastructure
tests/ffe/test_flag_eval_metrics.pyfor Python and Go in manifestOTEL_EXPORTER_OTLP_METRICS_PROTOCOL: "http/protobuf"to FFE scenario configopentelemetry-exporter-otlp-proto-http==1.40.0to Python weblogsTest Coverage
Tests for OpenFeature evaluation reasons:
STATIC- catch-all allocation with no rules/shardsTARGETING_MATCH- rules match the contextSPLIT- shards determine variantDEFAULT- rules don't match, fallback usedDISABLED- flag is disabledTests for OpenFeature error codes:
FLAG_NOT_FOUND- config exists but flag missingTYPE_MISMATCH- STRING→BOOLEAN, NUMERIC→INTEGER conversionsPARSE_ERROR- invalid regex pattern (Python only; Go validates at config load)PROVIDER_NOT_READY- no config loadedINVALID_CONTEXT- nested attributes (Python only)TARGETING_KEY_MISSING- verifies it's NOT returned (JS excluded)Cross-SDK Consistency
@irrelevantdecorators where behavior intentionally differsReviewer checklist
tests/ormanifests/is modified? I have the approval from R&P teambuild-XXX-imagelabel is present