feat: RFC Appendix B evaluation-reason system tests for Java FFE [java@typo/full-spec-evaluator] by typotter · Pull Request #6793 · DataDog/system-tests

typotter · 2026-04-22T13:56:26Z

Motivation

RFC Appendix B defines 26 test cases covering all valid (return value type, reason, error code) combinations for Feature Flag Evaluation. This adds system tests validating that SDK implementations emit the correct evaluation reason and error details in span tags — work that was previously untested at this level.

Changes

tests/ffe/test_flag_eval_reasons.py (new, 1466 lines)

Covers REASON-2 through REASON-26 from RFC Appendix B
REASON-1, 3 (missing flag), 4, 5, 6, 12, 15 are already covered in test_flag_eval_metrics.py (cross-referenced in the file header)
8 tests pass against Java ≥ v1.61.0
12 tests are missing_feature pending Java RFC compliance gaps (see manifest comment block for details)
Uses REASON-## naming throughout (self-documenting vs opaque B-## references)

manifests/java.yml

Replaces a file-level missing_feature entry (which defeated all per-test version pins due to additive manifest semantics) with 20 individual per-test entries
Fixes stale class names carried over from pre-rename state
Upgrades several test_flag_eval_metrics.py entries from bug (FFL-1972) → v1.61.0 where the underlying Java issues are now resolved

Workflow

⚠️ Create your PR as draft ⚠️
Work on you PR until the CI passes
Mark it as ready for review
- Test logic is modified? -> Get a review from RFC owner.
- Framework is modified, or non obvious usage of it -> get a review from R&P team

🚀 Once your PR is reviewed and the CI green, you can merge it!

🛟 #apm-shared-testing 🛟

Reviewer checklist

Anything but tests/ or manifests/ is modified ? I have the approval from R&P team
A docker base image is modified?
- the relevant build-XXX-image label is present
A scenario is added, removed or renamed?
- Get a review from R&P team

…va FFE Adds test_flag_eval_reasons.py covering REASON-2 through REASON-26 from RFC Appendix B (evaluation reason + error code correctness). 8 tests pass against Java ≥ v1.61.0; 12 are marked missing_feature pending Java RFC compliance work. Updates java.yml: replaces file-level missing_feature (which defeated per-test version pins) with 20 individual entries. Fixes stale class names from pre-rename. Upgrades several test_flag_eval_metrics.py entries from bug→v1.61.0 now that the underlying Java issues are resolved.

github-actions · 2026-04-22T13:56:57Z

CODEOWNERS have been resolved as:

ffe-reason-ufc-examples.md                                              @DataDog/system-tests-core
tests/ffe/test_flag_eval_reasons.py                                     @DataDog/feature-flagging-and-experimentation-sdk @DataDog/system-tests-core
manifests/java.yml                                                      @DataDog/asm-java @DataDog/apm-java

Fixes in dd-trace-java master (PRs #11036, #11037, #11071) resolve the FFL-1972 bugs. Use v1.62.0-SNAPSHOT so these tests run as active (not xfail) in CI against Java master. Also removes the file-level missing_feature entry and fixes two YAML explicit-key entries for the parse-error tests.

Manifest validator requires string-sorted keys. REASON_10 < REASON_2_ etc. because '0' (48) < '_' (95) in ASCII. Reorders all 20 REASON entries.

- Add feature_flag.key tag assertion to all 20 test methods (H-1) - Add variant=on assertion to REASON-11 test (H-3) - Move rc.tracer_rc_state.reset().apply() inside try block in REASON-2 setup so it is covered by the finally cleanup (H-4) - Wrap REASON-2 setup body in try/finally for exception-safe mock restoration (M-1); wrap finally cleanup in try/except so cleanup failures do not mask the primary exception (M-2) - Fix ruff formatting on long assert in REASON-22

…onfig)

…ests - Replace try/except/pass with contextlib.suppress (SIM105, S110) - Activate all 12 remaining missing_feature reason tests at v1.62.0-SNAPSHOT

REASON-11 (StaticNoSplit) passes at v1.62.0-SNAPSHOT. 11 tests remain missing_feature with failure-mode comments.

Surface SDKs that fail to parse or evaluate an empty splits array. Previously used vacuous-split form which masked this class of bug.

An allocation with splits:[] cannot produce a variant. Correct expected result is coded default / DEFAULT (not STATIC). Update test assertions, fixture docstring, and UFC examples doc accordingly.

'No matching split' means the allocation has splits with shards but the targeting key falls outside every shard range — allocation skipped → DEFAULT. The empty splits:[] form is an unexpected structural edge case covered by a separate spec point. Fixture: salt 'b11-static-no-match' + range [3921,10000) guarantees user-1 (shard 3920) deterministically misses. Manifest reset to missing_feature pending a verified test run against Java master.

"No split" means no matching split — the split entry exists but the subject's hash lands outside all shard ranges. Allocation skipped, waterfall exhausted → coded default / DEFAULT. Distinct from REASON-12 (vacuous split → STATIC) and from the empty-splits-array edge case (separate spec point). Rename fixture to make_shard_miss_static_fixture, class to Test_FFE_REASON_11_StaticNoMatchingSplit. Update manifest and UFC examples doc.

Shard-miss fixture (no matching split → DEFAULT) verified passing against Java 1.62.0-SNAPSHOT. 9 REASON tests now active.

typotter added 2 commits April 22, 2026 00:28

feat: new ffe tests for reason/evaluation details correctness

88ad284

typotter changed the title ~~feat: RFC Appendix B evaluation-reason system tests for Java FFE~~ feat: RFC Appendix B evaluation-reason system tests for Java FFE [java@master] Apr 22, 2026

typotter added 11 commits April 22, 2026 16:04

fix(ffe): sort REASON manifest entries in alphabetical key order

011c174

Manifest validator requires string-sorted keys. REASON_10 < REASON_2_ etc. because '0' (48) < '_' (95) in ASCII. Reorders all 20 REASON entries.

fix(lint): remove unused noqa directive (BLE001 not enabled in ruff c…

e6e46ab

…onfig)

fix(lint): use contextlib.suppress for cleanup; activate all reason t…

5409ee6

…ests - Replace try/except/pass with contextlib.suppress (SIM105, S110) - Activate all 12 remaining missing_feature reason tests at v1.62.0-SNAPSHOT

chore(ffe): update reason test manifest from Java 1.62.0-SNAPSHOT run

77941d1

REASON-11 (StaticNoSplit) passes at v1.62.0-SNAPSHOT. 11 tests remain missing_feature with failure-mode comments.

fix(ffe): use RFC canonical splits:[] in REASON-11 fixture

4fc6b95

Surface SDKs that fail to parse or evaluate an empty splits array. Previously used vacuous-split form which masked this class of bug.

fix(ffe): REASON-11 splits:[] → allocation skipped → DEFAULT

7aeae2c

An allocation with splits:[] cannot produce a variant. Correct expected result is coded default / DEFAULT (not STATIC). Update test assertions, fixture docstring, and UFC examples doc accordingly.

chore(ffe): update manifest comment for REASON-11 semantics

0ec9c6d

chore(ffe): activate REASON-11 at v1.62.0-SNAPSHOT

1e776e5

Shard-miss fixture (no matching split → DEFAULT) verified passing against Java 1.62.0-SNAPSHOT. 9 REASON tests now active.

typotter changed the title ~~feat: RFC Appendix B evaluation-reason system tests for Java FFE [java@master]~~ feat: RFC Appendix B evaluation-reason system tests for Java FFE [java@typo/full-spec-evaluator] Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: RFC Appendix B evaluation-reason system tests for Java FFE [java@typo/full-spec-evaluator]#6793

feat: RFC Appendix B evaluation-reason system tests for Java FFE [java@typo/full-spec-evaluator]#6793
typotter wants to merge 14 commits intomainfrom
typo/ffe-reason-tests

typotter commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

typotter commented Apr 22, 2026

Motivation

Changes

Workflow

Reviewer checklist

Uh oh!

github-actions Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Apr 22, 2026 •

edited

Loading