Fix/tree perf/subtree traversal optimization by brieflynn · Pull Request #589 · AMD-AGI/TraceLens

brieflynn · 2026-04-10T17:41:14Z

$O(subtree \times N_{launchers})$ traversal in `tree_perf.py`

get_kernel_launchers computed subtree GPU time by calling _compute_subtree_kernel_time_us(event), which called loop_and_aggregate_kernels (a full recursive subtree traversal) for every launcher. Since add_gpu_ops_to_tree already propagates all GPU kernel UIDs up to every ancestor via event["gpu_events"], this is now an $O(1)$ field lookup.

Split PR at request of @ajassani #577 (comment)

Pull Request Template

Note to AMDers:
This is a public repository. Please do not upload any confidential or customer data. Make sure all such data has been anonymized or removed before making this PR. If you need to attach any private files or links, please insert a Internal OneDrive Link or a Jira Ticket Link instead.

… gpu_events lookup

…fail

…us descending

ajassani

Overall the core optimization is correct and well-motivated — the recursive _compute_subtree_kernel_time_us traversal was redundant since gpu_events is already propagated by add_gpu_ops_to_tree. CI passes and the logic matches patterns used throughout the rest of the file. Nice work.

However, I'd like to avoid changing the 53 reference CSVs. The sorting additions for determinism are good, but we can make the test comparison order-insensitive instead of regenerating all the golden files. I verified this locally — all 13 regression tests pass with main's reference CSVs after this one change.

Requested changes

1. Add order-insensitive comparison for kernel detail columns in tests/conftest.py

In normalize_value, change:

    elif isinstance(val, list):
        return [normalize_value(v) for v in val]

to:

    elif isinstance(val, list):
        normalized = [normalize_value(v) for v in val]
        if normalized and all(isinstance(v, dict) for v in normalized):
            normalized.sort(key=lambda d: str(sorted(d.items())))
        return normalized

This canonically sorts lists of dicts (kernel_details, trunc_kernel_details, etc.) before comparison, making the tests insensitive to kernel ordering within those columns. Only affects lists-of-dicts — lists of strings/numbers/lists are left alone.

2. Revert all reference CSV changes back to main

git checkout origin/main -- tests/traces/

With change #1, the tests pass without any reference file modifications.

…n, making kernel ordering irrelevant, reference csv files reverted to main

brieflynn · 2026-04-10T20:57:16Z

Hi @ajassani thank you for your review and comments, I have made the requested changes and reverted the reference CSV changes back to main

brieflynn and others added 8 commits April 10, 2026 17:15

fix(trace2tree): fix cross-rank GPU attribution in add_gpu_ops_to_tree

2baee5a

performance - tree_perf: replace loop_and_aggregate_kernels with O(1)…

abc6277

… gpu_events lookup

fix non-deterministic kernel ordering causing CI regression tests to …

a3b56b7

…fail

linting

6434cb8

sort final summary_list in _summarize_kernel_stats by total_duration_…

0a05711

…us descending

changed to ascending order

1fd9941

add reference files

f786747

trace to tree revert to main and regenerate reference files

7182e54

brieflynn requested a review from ajassani April 10, 2026 17:41

ajassani reviewed Apr 10, 2026

View reviewed changes

normalize_value now canonically sorts lists-of-dicts before compariso…

3025f98

…n, making kernel ordering irrelevant, reference csv files reverted to main

ajassani self-requested a review April 10, 2026 21:04

ajassani approved these changes Apr 10, 2026

View reviewed changes

brieflynn merged commit 1be6fc7 into main Apr 10, 2026
2 checks passed

brieflynn deleted the fix/TreePerf/subtree-traversal-optimization branch April 10, 2026 21:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/tree perf/subtree traversal optimization#589

Fix/tree perf/subtree traversal optimization#589
brieflynn merged 9 commits intomainfrom
fix/TreePerf/subtree-traversal-optimization

brieflynn commented Apr 10, 2026

Uh oh!

ajassani left a comment •

edited

Loading

Uh oh!

brieflynn commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brieflynn commented Apr 10, 2026

$O(subtree \times N_{launchers})$ traversal in tree_perf.py

Pull Request Template

Uh oh!

ajassani left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Requested changes

Uh oh!

brieflynn commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

$O(subtree \times N_{launchers})$ traversal in `tree_perf.py`

ajassani left a comment •

edited

Loading