Skip to content

Fix/tree perf/subtree traversal optimization#589

Merged
brieflynn merged 9 commits intomainfrom
fix/TreePerf/subtree-traversal-optimization
Apr 10, 2026
Merged

Fix/tree perf/subtree traversal optimization#589
brieflynn merged 9 commits intomainfrom
fix/TreePerf/subtree-traversal-optimization

Conversation

@brieflynn
Copy link
Copy Markdown
Contributor

$O(subtree \times N_{launchers})$ traversal in tree_perf.py

get_kernel_launchers computed subtree GPU time by calling _compute_subtree_kernel_time_us(event), which called loop_and_aggregate_kernels (a full recursive subtree traversal) for every launcher. Since add_gpu_ops_to_tree already propagates all GPU kernel UIDs up to every ancestor via event["gpu_events"], this is now an $O(1)$ field lookup.

Split PR at request of @ajassani #577 (comment)

Pull Request Template

Note to AMDers:
This is a public repository. Please do not upload any confidential or customer data. Make sure all such data has been anonymized or removed before making this PR. If you need to attach any private files or links, please insert a Internal OneDrive Link or a Jira Ticket Link instead.

@brieflynn brieflynn requested a review from ajassani April 10, 2026 17:41
Copy link
Copy Markdown
Collaborator

@ajassani ajassani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the core optimization is correct and well-motivated — the recursive _compute_subtree_kernel_time_us traversal was redundant since gpu_events is already propagated by add_gpu_ops_to_tree. CI passes and the logic matches patterns used throughout the rest of the file. Nice work.

However, I'd like to avoid changing the 53 reference CSVs. The sorting additions for determinism are good, but we can make the test comparison order-insensitive instead of regenerating all the golden files. I verified this locally — all 13 regression tests pass with main's reference CSVs after this one change.

Requested changes

1. Add order-insensitive comparison for kernel detail columns in tests/conftest.py

In normalize_value, change:

    elif isinstance(val, list):
        return [normalize_value(v) for v in val]

to:

    elif isinstance(val, list):
        normalized = [normalize_value(v) for v in val]
        if normalized and all(isinstance(v, dict) for v in normalized):
            normalized.sort(key=lambda d: str(sorted(d.items())))
        return normalized

This canonically sorts lists of dicts (kernel_details, trunc_kernel_details, etc.) before comparison, making the tests insensitive to kernel ordering within those columns. Only affects lists-of-dicts — lists of strings/numbers/lists are left alone.

2. Revert all reference CSV changes back to main

git checkout origin/main -- tests/traces/

With change #1, the tests pass without any reference file modifications.

…n, making kernel ordering irrelevant, reference csv files reverted to main
@brieflynn
Copy link
Copy Markdown
Contributor Author

Hi @ajassani thank you for your review and comments, I have made the requested changes and reverted the reference CSV changes back to main

@ajassani ajassani self-requested a review April 10, 2026 21:04
@brieflynn brieflynn merged commit 1be6fc7 into main Apr 10, 2026
2 checks passed
@brieflynn brieflynn deleted the fix/TreePerf/subtree-traversal-optimization branch April 10, 2026 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants