feat: RFC Appendix B evaluation-reason system tests for Java FFE [java@typo/full-spec-evaluator]#6793
Draft
feat: RFC Appendix B evaluation-reason system tests for Java FFE [java@typo/full-spec-evaluator]#6793
Conversation
…va FFE Adds test_flag_eval_reasons.py covering REASON-2 through REASON-26 from RFC Appendix B (evaluation reason + error code correctness). 8 tests pass against Java ≥ v1.61.0; 12 are marked missing_feature pending Java RFC compliance work. Updates java.yml: replaces file-level missing_feature (which defeated per-test version pins) with 20 individual entries. Fixes stale class names from pre-rename. Upgrades several test_flag_eval_metrics.py entries from bug→v1.61.0 now that the underlying Java issues are resolved.
Contributor
|
|
Fixes in dd-trace-java master (PRs #11036, #11037, #11071) resolve the FFL-1972 bugs. Use v1.62.0-SNAPSHOT so these tests run as active (not xfail) in CI against Java master. Also removes the file-level missing_feature entry and fixes two YAML explicit-key entries for the parse-error tests.
Manifest validator requires string-sorted keys. REASON_10 < REASON_2_ etc. because '0' (48) < '_' (95) in ASCII. Reorders all 20 REASON entries.
- Add feature_flag.key tag assertion to all 20 test methods (H-1) - Add variant=on assertion to REASON-11 test (H-3) - Move rc.tracer_rc_state.reset().apply() inside try block in REASON-2 setup so it is covered by the finally cleanup (H-4) - Wrap REASON-2 setup body in try/finally for exception-safe mock restoration (M-1); wrap finally cleanup in try/except so cleanup failures do not mask the primary exception (M-2) - Fix ruff formatting on long assert in REASON-22
…ests - Replace try/except/pass with contextlib.suppress (SIM105, S110) - Activate all 12 remaining missing_feature reason tests at v1.62.0-SNAPSHOT
REASON-11 (StaticNoSplit) passes at v1.62.0-SNAPSHOT. 11 tests remain missing_feature with failure-mode comments.
Surface SDKs that fail to parse or evaluate an empty splits array. Previously used vacuous-split form which masked this class of bug.
An allocation with splits:[] cannot produce a variant. Correct expected result is coded default / DEFAULT (not STATIC). Update test assertions, fixture docstring, and UFC examples doc accordingly.
'No matching split' means the allocation has splits with shards but the targeting key falls outside every shard range — allocation skipped → DEFAULT. The empty splits:[] form is an unexpected structural edge case covered by a separate spec point. Fixture: salt 'b11-static-no-match' + range [3921,10000) guarantees user-1 (shard 3920) deterministically misses. Manifest reset to missing_feature pending a verified test run against Java master.
"No split" means no matching split — the split entry exists but the subject's hash lands outside all shard ranges. Allocation skipped, waterfall exhausted → coded default / DEFAULT. Distinct from REASON-12 (vacuous split → STATIC) and from the empty-splits-array edge case (separate spec point). Rename fixture to make_shard_miss_static_fixture, class to Test_FFE_REASON_11_StaticNoMatchingSplit. Update manifest and UFC examples doc.
Shard-miss fixture (no matching split → DEFAULT) verified passing against Java 1.62.0-SNAPSHOT. 9 REASON tests now active.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
RFC Appendix B defines 26 test cases covering all valid (return value type, reason, error code) combinations for Feature Flag Evaluation. This adds system tests validating that SDK implementations emit the correct evaluation reason and error details in span tags — work that was previously untested at this level.
Changes
tests/ffe/test_flag_eval_reasons.py(new, 1466 lines)test_flag_eval_metrics.py(cross-referenced in the file header)missing_featurepending Java RFC compliance gaps (see manifest comment block for details)REASON-##naming throughout (self-documenting vs opaqueB-##references)manifests/java.ymlmissing_featureentry (which defeated all per-test version pins due to additive manifest semantics) with 20 individual per-test entriestest_flag_eval_metrics.pyentries frombug (FFL-1972)→v1.61.0where the underlying Java issues are now resolvedWorkflow
🚀 Once your PR is reviewed and the CI green, you can merge it!
🛟 #apm-shared-testing 🛟
Reviewer checklist
tests/ormanifests/is modified ? I have the approval from R&P teambuild-XXX-imagelabel is present