[FFL-1945] Add flag evaluation metrics via OpenFeature hook by sameerank · Pull Request #5599 · DataDog/dd-trace-rb

sameerank · 2026-04-16T15:24:48Z

What does this PR do?
Adds feature_flag.evaluations counter metric via OpenTelemetry to track flag evaluations.

Motivation:
Enable observability of feature flag usage through standardized OTel metrics.

Change log entry
Yes. Add flag evaluation metrics (feature_flag.evaluations) via OpenTelemetry for OpenFeature provider.

Additional Notes:

Uses OpenFeature hooks pattern (consistent with Python/Go SDKs)
Metric attributes: feature_flag.key, feature_flag.result.variant, feature_flag.result.reason
Conditional attributes: feature_flag.result.allocation_key, error.type
Gracefully degrades when OTel SDK is unavailable
Requires DD_METRICS_OTEL_ENABLED=true to enable metrics (consistent with the other dd-trace-* SDKs)
Requires openfeature-sdk >= 0.5.1 for provider hooks support

How to test the change?

Unit tests: bundle exec rspec spec/datadog/open_feature/metrics/ spec/datadog/open_feature/hooks/
Type check: bundle exec steep check lib/datadog/open_feature/metrics lib/datadog/open_feature/hooks
System tests:
- In CI [ruby] Enable FFE flag evaluation metrics tests system-tests#6781
  - Dev mode (using this branch) - All passing ✓: System Tests (ruby, dev) / End-to-end #10 / rails72 10
  - Prod mode (released gem) - Expected to fail: System Tests (ruby, prod) / End-to-end #10 / rails72 10
- Locally, 32 tests passed with the latest commit 60aa06e

% EXTRA_DOCKER_ARGS="--no-cache" ./build.sh ruby --weblog-variant rails72
# Debug logs match commit hash: DEBUG -- datadog: [datadog] (/usr/local/bundle/bundler/gems/dd-trace-rb-60aa06ee27f8/

% TEST_LIBRARY=ruby WEBLOG_VARIANT=rails72 ./run.sh FEATURE_FLAGGING_AND_EXPERIMENTATION
========================================================== test context ===========================================================
Scenario: FEATURE_FLAGGING_AND_EXPERIMENTATION
Logs folder: ./logs_feature_flagging_and_experimentation
Starting containers...
Agent: 7.77.2
Backend: datad0g.com
Library: ruby@2.32.0-dev
Weblog variant: rails72
Weblog system: Linux weblog 6.12.76-linuxkit #1 SMP Fri Mar  6 10:10:19 UTC 2026 aarch64 GNU/Linux

======================================================= test session starts =======================================================
collected 2454 items / 2422 deselected / 32 selected                                                                              
----------------------------------------------------------- tests setup -----------------------------------------------------------

tests/ffe/test_dynamic_evaluation.py ..
tests/ffe/test_exposures.py ...........
tests/ffe/test_flag_eval_metrics.py .................

------------------------------------------------- Wait for library interface (0s) -------------------------------------------------
------------------------------------------------- Wait for agent interface (30s) --------------------------------------------------
------------------------------------------------- Wait for backend interface (0s) -------------------------------------------------

tests/ffe/test_dynamic_evaluation.py ..                                                                                     [  6%]
tests/ffe/test_exposures.py ...........                                                                                     [ 40%]
tests/ffe/test_flag_eval_metrics.py .................                                                                       [ 93%]
tests/schemas/test_schemas.py ..                                                                                            [100%]

========================================= 32 passed, 2422 deselected in 221.51s (0:03:41) =========================================

FFE dogfooding dashboard: https://github.com/DataDog/ffe-dogfooding/pull/60

Environment variables:

DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED=true  # Enable FFE
DD_METRICS_OTEL_ENABLED=true                     # Enable OTel metrics (required for flag eval metrics)

github-actions · 2026-04-16T15:25:16Z

Typing analysis

Note: Ignored files are excluded from the next sections.

Untyped methods

This PR introduces 2 partially typed methods, and clears 1 partially typed method. It increases the percentage of typed methods from 61.38% to 61.64% (+0.26%).

Partially typed methods (+2-1)

❌ Introduced:

sig/datadog/open_feature/hooks/flag_eval_hook.rbs:13
└── def finally: (
          hook_context: ::OpenFeature::SDK::Hooks::HookContext,
          evaluation_details: ::OpenFeature::SDK::EvaluationDetails,
          **untyped _opts
        ) -> void
sig/datadog/open_feature/provider.rbs:46
└── def fetch_object_value: (
        flag_key: ::String,
        default_value: ::Array[untyped] | ::Hash[untyped, untyped],
        ?evaluation_context: ::OpenFeature::SDK::EvaluationContext?
      ) -> ::OpenFeature::SDK::Provider::ResolutionDetails

✅ Cleared:

sig/datadog/open_feature/provider.rbs:44
└── def fetch_object_value: (
        flag_key: ::String,
        default_value: ::Array[untyped] | ::Hash[untyped, untyped],
        ?evaluation_context: ::OpenFeature::SDK::EvaluationContext?
      ) -> ::OpenFeature::SDK::Provider::ResolutionDetails

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept on the line before the definition to remove it from the stats.

datadog-prod-us1-6 · 2026-04-16T15:46:05Z

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
• Patch Coverage: 96.38%
• Overall Coverage: 97.20% (-0.02%)

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: e269408 | Docs | Datadog PR Page | Give us feedback!}

pr-commenter · 2026-04-16T18:15:28Z

Benchmarks

Benchmark execution time: 2026-04-30 21:20:50

Comparing candidate commit e269408 in PR branch sameerank/FFL-1945/add-flag-eval-metrics with baseline commit 56a5be1 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 1 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

Cache the DD_METRICS_OTEL_ENABLED config value once at initialization rather than checking it on every record() call.

vpellan

LGTM after applying the suggestions from other reviewers

Co-authored-by: Sergey Fedorov <oni.strech@gmail.com>

Match other SDKs (Go, Python, Java) by creating the hook once at initialization time instead of lazily. This removes the retry-on-nil behavior and simplifies the code.

…ider

dd-oleksii · 2026-04-30T15:52:19Z

+          'PROVIDER_FATAL' => DEFAULT_ERROR_TYPE,
+          'UNKNOWN_TYPE' => DEFAULT_ERROR_TYPE,


minor: is there any reason we don't handle these two?

This map serves as an allowlist of known error codes and lowercases them per the spec

The error.type and feature_flag.result.reason enumerations use a lowercase 'snake_case' convention (see OpenTelemetry feature-flag event records).

PROVIDER_FATAL is actually among the known error codes, so actually that can be handled. UNKNOWN_TYPE mapping to 'general' makes sense because it's not from the OpenFeature spec and acts as a catch-all. For comparison, in dd-trace-go we could lowercase the error code directly (no map required) because we used the OpenFeature error codes directly.

0126525

oh, makes sense. I confused reason and error code (UNKNOWN is a pre-defined reason but not error code).

Shall we remove it from the map here as it only causes confusion? (We still have defaulting to DEFAULT_ERROR_TYPE in normalize_error_type.)

Sure I'm open to that change #5658

PROVIDER_FATAL is a standard OpenFeature error code and should map to its lowercase form 'provider_fatal' per the telemetry spec, not 'general'. Cross-SDK comparison: - Go: strings.ToLower() → 'provider_fatal' - Python: .lower() → 'provider_fatal' - .NET: explicit switch → 'provider_fatal' - Ruby (before): ERROR_TYPE_MAP → 'general' (incorrect) - Ruby (after): ERROR_TYPE_MAP → 'provider_fatal' (correct) UNKNOWN_TYPE remains mapped to 'general' since it's a Datadog-specific error code (not in OpenFeature spec) used for unknown flag types.

# What does this PR do? Add `__dd_allocation_key` metadata to `ResolutionDetails`. # Motivation Follow-up from DataDog/dd-trace-rb#5599 (comment) # Additional Notes # How to test the change? Co-authored-by: oleksii.shmalko <oleksii.shmalko@datadoghq.com>

sameerank added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Apr 16, 2026

sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch 2 times, most recently from 8a1f290 to 46c5c56 Compare April 16, 2026 17:47

sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch 9 times, most recently from a0d96ae to dca0c3d Compare April 21, 2026 07:17

github-actions Bot added the core Involves Datadog core libraries label Apr 21, 2026

sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch from dca0c3d to 576c7e4 Compare April 21, 2026 08:35

sameerank mentioned this pull request Apr 21, 2026

[ruby] Enable FFE flag evaluation metrics tests DataDog/system-tests#6781

Merged

4 tasks

sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch 8 times, most recently from 58fbef9 to ae9c63f Compare April 22, 2026 17:22

sameerank commented Apr 22, 2026

View reviewed changes

Comment thread lib/datadog/open_feature/hooks/flag_eval_hook.rb Outdated

Add flag evaluation metrics via OpenTelemetry hook

60aa06e

sameerank force-pushed the sameerank/FFL-1945/add-flag-eval-metrics branch from ae9c63f to 60aa06e Compare April 22, 2026 17:31

sameerank added the otel OpenTelemetry-related changes label Apr 22, 2026

sameerank added 2 commits April 28, 2026 08:02

refactor: add FlagEvalHook.available? to encapsulate SDK check

ec92eff

refactor: cache otel_metrics_enabled config in constructor

b54d0b4

Cache the DD_METRICS_OTEL_ENABLED config value once at initialization rather than checking it on every record() call.

sameerank requested review from Strech, vpellan and y9v April 29, 2026 05:35

Strech reviewed Apr 29, 2026

View reviewed changes

vpellan approved these changes Apr 29, 2026

View reviewed changes

sameerank and others added 12 commits April 29, 2026 14:43

Update lib/datadog/open_feature/hooks/flag_eval_hook.rb

2aa3775

Co-authored-by: Sergey Fedorov <oni.strech@gmail.com>

Update lib/datadog/open_feature/component.rb

852fd09

Co-authored-by: Sergey Fedorov <oni.strech@gmail.com>

Update lib/datadog/open_feature/hooks/flag_eval_hook.rb

19113d9

Co-authored-by: Sergey Fedorov <oni.strech@gmail.com>

Update lib/datadog/open_feature/metrics/flag_eval_metrics.rb

c8841ab

Co-authored-by: Sergey Fedorov <oni.strech@gmail.com>

Update lib/datadog/open_feature/metrics/flag_eval_metrics.rb

237bd53

Co-authored-by: Sergey Fedorov <oni.strech@gmail.com>

refactor: create flag_eval_hook eagerly in constructor

52ecd61

Match other SDKs (Go, Python, Java) by creating the hook once at initialization time instead of lazily. This removes the retry-on-nil behavior and simplifies the code.

refactor: use variable shadowing in normalize_reason

dee1d64

refactor: add DEFAULT_ERROR_TYPE constant

9d9c77d

refactor: rename fetch_meter_provider to get_or_initialize_meter_prov…

a6ce7ab

…ider

refactor: rename meter_provider_available? to sdk_meter_provider?

4e15594

refactor: check @enabled in record() instead of get_or_create_counter

9afca14

test: allow logger.debug in component_spec for eager hook creation

c67ba8f

Strech approved these changes Apr 30, 2026

View reviewed changes

dd-oleksii approved these changes Apr 30, 2026

View reviewed changes

sameerank added 3 commits April 30, 2026 13:12

refactor: use __dd_allocation_key for internal metadata

a38b4bd

Merge branch 'master' into sameerank/FFL-1945/add-flag-eval-metrics

e269408

sameerank merged commit 6cf558f into master Apr 30, 2026
598 of 600 checks passed

sameerank deleted the sameerank/FFL-1945/add-flag-eval-metrics branch April 30, 2026 21:27

dd-octo-sts Bot added this to the 2.32.0 milestone Apr 30, 2026

dd-oleksii mentioned this pull request May 1, 2026

feat(ffe): add __dd_allocation_key metadata to ResolutionDetails DataDog/libdatadog#1940

Merged

sameerank mentioned this pull request May 1, 2026

[FFL-1945] Remove redundant UNKNOWN_TYPE from ERROR_TYPE_MAP #5658

Open

		'PROVIDER_FATAL' => DEFAULT_ERROR_TYPE,
		'UNKNOWN_TYPE' => DEFAULT_ERROR_TYPE,

Conversation

sameerank commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 16, 2026 • edited by dd-octo-sts Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Typing analysis

Untyped methods

Uh oh!

datadog-prod-us1-6 Bot commented Apr 16, 2026 • edited by datadog-prod-us1-5 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pr-commenter Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Explanation

More details about the CI and significant changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vpellan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dd-oleksii Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

sameerank Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

dd-oleksii May 1, 2026

Choose a reason for hiding this comment

Uh oh!

sameerank May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sameerank commented Apr 16, 2026 •

edited

Loading

github-actions Bot commented Apr 16, 2026 •

edited by dd-octo-sts Bot

Loading

datadog-prod-us1-6 Bot commented Apr 16, 2026 •

edited by datadog-prod-us1-5 Bot

Loading

pr-commenter Bot commented Apr 16, 2026 •

edited

Loading