Skip to content

[FFL-1945] Add flag evaluation metrics via OpenFeature hook#5599

Merged
sameerank merged 32 commits intomasterfrom
sameerank/FFL-1945/add-flag-eval-metrics
Apr 30, 2026
Merged

[FFL-1945] Add flag evaluation metrics via OpenFeature hook#5599
sameerank merged 32 commits intomasterfrom
sameerank/FFL-1945/add-flag-eval-metrics

Conversation

@sameerank
Copy link
Copy Markdown
Contributor

@sameerank sameerank commented Apr 16, 2026

What does this PR do?
Adds feature_flag.evaluations counter metric via OpenTelemetry to track flag evaluations.

Motivation:
Enable observability of feature flag usage through standardized OTel metrics.

Change log entry
Yes. Add flag evaluation metrics (feature_flag.evaluations) via OpenTelemetry for OpenFeature provider.

Additional Notes:

  • Uses OpenFeature hooks pattern (consistent with Python/Go SDKs)
  • Metric attributes: feature_flag.key, feature_flag.result.variant, feature_flag.result.reason
  • Conditional attributes: feature_flag.result.allocation_key, error.type
  • Gracefully degrades when OTel SDK is unavailable
  • Requires DD_METRICS_OTEL_ENABLED=true to enable metrics (consistent with the other dd-trace-* SDKs)
  • Requires openfeature-sdk >= 0.5.1 for provider hooks support

How to test the change?

% EXTRA_DOCKER_ARGS="--no-cache" ./build.sh ruby --weblog-variant rails72
# Debug logs match commit hash: DEBUG -- datadog: [datadog] (/usr/local/bundle/bundler/gems/dd-trace-rb-60aa06ee27f8/

% TEST_LIBRARY=ruby WEBLOG_VARIANT=rails72 ./run.sh FEATURE_FLAGGING_AND_EXPERIMENTATION
========================================================== test context ===========================================================
Scenario: FEATURE_FLAGGING_AND_EXPERIMENTATION
Logs folder: ./logs_feature_flagging_and_experimentation
Starting containers...
Agent: 7.77.2
Backend: datad0g.com
Library: ruby@2.32.0-dev
Weblog variant: rails72
Weblog system: Linux weblog 6.12.76-linuxkit #1 SMP Fri Mar  6 10:10:19 UTC 2026 aarch64 GNU/Linux

======================================================= test session starts =======================================================
collected 2454 items / 2422 deselected / 32 selected                                                                              
----------------------------------------------------------- tests setup -----------------------------------------------------------

tests/ffe/test_dynamic_evaluation.py ..
tests/ffe/test_exposures.py ...........
tests/ffe/test_flag_eval_metrics.py .................

------------------------------------------------- Wait for library interface (0s) -------------------------------------------------
------------------------------------------------- Wait for agent interface (30s) --------------------------------------------------
------------------------------------------------- Wait for backend interface (0s) -------------------------------------------------

tests/ffe/test_dynamic_evaluation.py ..                                                                                     [  6%]
tests/ffe/test_exposures.py ...........                                                                                     [ 40%]
tests/ffe/test_flag_eval_metrics.py .................                                                                       [ 93%]
tests/schemas/test_schemas.py ..                                                                                            [100%]

========================================= 32 passed, 2422 deselected in 221.51s (0:03:41) =========================================

Environment variables:

DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED=true  # Enable FFE
DD_METRICS_OTEL_ENABLED=true                     # Enable OTel metrics (required for flag eval metrics)

@sameerank sameerank added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Apr 16, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 16, 2026

Typing analysis

Note: Ignored files are excluded from the next sections.

Untyped methods

This PR introduces 2 partially typed methods, and clears 1 partially typed method. It increases the percentage of typed methods from 61.38% to 61.64% (+0.26%).

Partially typed methods (+2-1)Introduced:
sig/datadog/open_feature/hooks/flag_eval_hook.rbs:13
└── def finally: (
          hook_context: ::OpenFeature::SDK::Hooks::HookContext,
          evaluation_details: ::OpenFeature::SDK::EvaluationDetails,
          **untyped _opts
        ) -> void
sig/datadog/open_feature/provider.rbs:46
└── def fetch_object_value: (
        flag_key: ::String,
        default_value: ::Array[untyped] | ::Hash[untyped, untyped],
        ?evaluation_context: ::OpenFeature::SDK::EvaluationContext?
      ) -> ::OpenFeature::SDK::Provider::ResolutionDetails
Cleared:
sig/datadog/open_feature/provider.rbs:44
└── def fetch_object_value: (
        flag_key: ::String,
        default_value: ::Array[untyped] | ::Hash[untyped, untyped],
        ?evaluation_context: ::OpenFeature::SDK::EvaluationContext?
      ) -> ::OpenFeature::SDK::Provider::ResolutionDetails

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept on the line before the definition to remove it from the stats.

@datadog-prod-us1-6
Copy link
Copy Markdown

datadog-prod-us1-6 Bot commented Apr 16, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 96.38%
Overall Coverage: 97.20% (-0.02%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: e269408 | Docs | Datadog PR Page | Give us feedback!

@sameerank sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch 2 times, most recently from 8a1f290 to 46c5c56 Compare April 16, 2026 17:47
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Apr 16, 2026

Benchmarks

Benchmark execution time: 2026-04-30 21:20:50

Comparing candidate commit e269408 in PR branch sameerank/FFL-1945/add-flag-eval-metrics with baseline commit 56a5be1 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 1 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

@sameerank sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch 9 times, most recently from a0d96ae to dca0c3d Compare April 21, 2026 07:17
@github-actions github-actions Bot added the core Involves Datadog core libraries label Apr 21, 2026
@sameerank sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch from dca0c3d to 576c7e4 Compare April 21, 2026 08:35
@sameerank sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch 8 times, most recently from 58fbef9 to ae9c63f Compare April 22, 2026 17:22
Comment thread lib/datadog/open_feature/hooks/flag_eval_hook.rb Outdated
@sameerank sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch from ae9c63f to 60aa06e Compare April 22, 2026 17:31
@sameerank sameerank added the otel OpenTelemetry-related changes label Apr 22, 2026
Cache the DD_METRICS_OTEL_ENABLED config value once at initialization
rather than checking it on every record() call.
@sameerank sameerank requested review from Strech, vpellan and y9v April 29, 2026 05:35
Comment thread lib/datadog/open_feature/metrics/flag_eval_metrics.rb
Comment thread lib/datadog/open_feature/metrics/flag_eval_metrics.rb Outdated
Comment thread lib/datadog/open_feature/metrics/flag_eval_metrics.rb Outdated
Comment thread lib/datadog/open_feature/metrics/flag_eval_metrics.rb Outdated
Comment thread lib/datadog/open_feature/metrics/flag_eval_metrics.rb Outdated
Comment thread lib/datadog/open_feature/component.rb Outdated
Comment thread lib/datadog/open_feature/component.rb Outdated
Comment thread lib/datadog/open_feature/hooks/flag_eval_hook.rb Outdated
Comment thread lib/datadog/open_feature/metrics/flag_eval_metrics.rb Outdated
Comment thread lib/datadog/open_feature/metrics/flag_eval_metrics.rb Outdated
Copy link
Copy Markdown
Contributor

@vpellan vpellan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after applying the suggestions from other reviewers

Comment thread lib/datadog/open_feature/provider.rb Outdated
Comment thread lib/datadog/open_feature/provider.rb
Comment on lines +29 to +30
'PROVIDER_FATAL' => DEFAULT_ERROR_TYPE,
'UNKNOWN_TYPE' => DEFAULT_ERROR_TYPE,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: is there any reason we don't handle these two?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map serves as an allowlist of known error codes and lowercases them per the spec

The error.type and feature_flag.result.reason enumerations use a lowercase 'snake_case' convention (see OpenTelemetry feature-flag event records).

PROVIDER_FATAL is actually among the known error codes, so actually that can be handled. UNKNOWN_TYPE mapping to 'general' makes sense because it's not from the OpenFeature spec and acts as a catch-all. For comparison, in dd-trace-go we could lowercase the error code directly (no map required) because we used the OpenFeature error codes directly.

0126525

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, makes sense. I confused reason and error code (UNKNOWN is a pre-defined reason but not error code).

Shall we remove it from the map here as it only causes confusion? (We still have defaulting to DEFAULT_ERROR_TYPE in normalize_error_type.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I'm open to that change #5658

PROVIDER_FATAL is a standard OpenFeature error code and should map to
its lowercase form 'provider_fatal' per the telemetry spec, not 'general'.

Cross-SDK comparison:
- Go: strings.ToLower() → 'provider_fatal'
- Python: .lower() → 'provider_fatal'
- .NET: explicit switch → 'provider_fatal'
- Ruby (before): ERROR_TYPE_MAP → 'general' (incorrect)
- Ruby (after): ERROR_TYPE_MAP → 'provider_fatal' (correct)

UNKNOWN_TYPE remains mapped to 'general' since it's a Datadog-specific
error code (not in OpenFeature spec) used for unknown flag types.
@sameerank sameerank merged commit 6cf558f into master Apr 30, 2026
598 of 600 checks passed
@sameerank sameerank deleted the sameerank/FFL-1945/add-flag-eval-metrics branch April 30, 2026 21:27
@dd-octo-sts dd-octo-sts Bot added this to the 2.32.0 milestone Apr 30, 2026
gh-worker-dd-mergequeue-cf854d Bot pushed a commit to DataDog/libdatadog that referenced this pull request May 1, 2026
# What does this PR do?

Add `__dd_allocation_key` metadata to `ResolutionDetails`.

# Motivation

Follow-up from DataDog/dd-trace-rb#5599 (comment)

# Additional Notes

# How to test the change?



Co-authored-by: oleksii.shmalko <oleksii.shmalko@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos core Involves Datadog core libraries openfeature A new component that provider an ability to configure feature flags otel OpenTelemetry-related changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants