Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions TraceLens/TreePerf/tree_perf.py
Original file line number Diff line number Diff line change
Expand Up @@ -1478,6 +1478,49 @@ def traverse(event_uid):
for root_uid in self.tree.cpu_root_nodes:
traverse(root_uid)

# Collect GPU kernels that have no cpu_op in their parent hierarchy.
# These are missed by the cpu_root_nodes traversal above.
collected_gpu_uids = set()
for evt in collected:
collected_gpu_uids.update(evt.get("gpu_events", []))

orphan_kernels = []
for evt in self.tree.events:
if self.event_to_category(evt) not in {"kernel", "gpu_memcpy", "gpu_memset"}:
continue
if evt["UID"] in collected_gpu_uids:
continue
if not include_nccl and "nccl" in evt.get("name", "").lower():
continue

has_cpu_op = False
parent = self.tree.get_parent_event(evt)
while parent is not None:
if self.event_to_category(parent) == "cpu_op":
has_cpu_op = True
break
parent = self.tree.get_parent_event(parent)

if not has_cpu_op:
orphan_kernels.append(evt)

Comment on lines +1481 to +1506
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second-pass logic introduces new behavior (including unlinked runtime/kernel paths) but there are no unit tests exercising collect_unified_perf_events() for the "runtime has no cpu_op ancestor" case. Adding a focused test (similar to tests/test_kernel_launchers.py::TestUnlinkedRuntimeEvents) would help prevent regressions and validate NCCL filtering + synthetic event creation.

Copilot uses AI. Check for mistakes.
# Group orphan kernels by their immediate parent (typically a runtime event)
parent_to_orphans = defaultdict(list)
for kernel in orphan_kernels:
parent = self.tree.get_parent_event(kernel)
parent_uid = parent["UID"] if parent else None
parent_to_orphans[parent_uid].append(kernel)

for parent_uid, kernels in parent_to_orphans.items():
if parent_uid is None:
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orphan kernels whose get_parent_event(kernel) is None are currently dropped (if parent_uid is None: continue). That contradicts the goal of ensuring unlinked GPU kernels are included. Consider emitting a synthetic top-level event for these kernels (or attaching them to a dedicated "orphan" bucket) so they still appear in collected/the unified perf table.

Suggested change
if parent_uid is None:
if parent_uid is None:
# Kernels with no parent at all: emit a synthetic top-level event
# so they are still visible in the unified perf table.
for kernel in kernels:
synthetic = dict(kernel)
# Treat this kernel as its own "root" unified event, anchored
# by its own UID in gpu_events.
synthetic["gpu_events"] = [kernel["UID"]]
collected.append(synthetic)

Copilot uses AI. Check for mistakes.
continue
parent_evt = self.tree.get_UID2event(parent_uid)
for kernel in kernels:
synthetic = dict(parent_evt)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synthetic = dict(parent_evt) copies the parent event's UID, so if a runtime parent has multiple orphan kernels this will append multiple collected rows with the same UID. This makes UID-based identification/aggregation ambiguous in downstream tables. Consider generating a unique UID for synthetic events (e.g., based on the kernel UID) and/or adding an explicit flag/field indicating the row is synthetic.

Suggested change
synthetic = dict(parent_evt)
synthetic = dict(parent_evt)
# Assign a unique UID for the synthetic event to avoid clashes
synthetic["UID"] = f"{parent_uid}_synthetic_{kernel['UID']}"
# Explicitly mark this row as synthetic for downstream consumers
synthetic["is_synthetic"] = True

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict(parent_evt) is a shallow copy, so mutable fields like args and children will be shared between parent_evt and synthetic. If any later code mutates those nested structures (even just for formatting/enrichment), it can accidentally affect the original tree event. Prefer constructing a minimal synthetic dict with only the required fields, or use a deep copy and then strip fields that shouldn't carry over (e.g., children).

Suggested change
synthetic = dict(parent_evt)
# Create a deep copy to avoid sharing mutable nested structures (e.g., args).
synthetic = copy.deepcopy(parent_evt)
# Synthetic events should not carry over the original children hierarchy.
synthetic.pop("children", None)

Copilot uses AI. Check for mistakes.
synthetic["name"] = f"{parent_evt['name']}::{kernel['name']}"
synthetic["gpu_events"] = [kernel["UID"]]
collected.append(synthetic)

return collected

def build_df_unified_perf_table(
Expand Down
Loading