-
Notifications
You must be signed in to change notification settings - Fork 11
Perform a second pass in unified table creation to find GPU events wi… #511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -1478,6 +1478,49 @@ def traverse(event_uid): | |||||||||||||||||||||||
| for root_uid in self.tree.cpu_root_nodes: | ||||||||||||||||||||||||
| traverse(root_uid) | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # Collect GPU kernels that have no cpu_op in their parent hierarchy. | ||||||||||||||||||||||||
| # These are missed by the cpu_root_nodes traversal above. | ||||||||||||||||||||||||
| collected_gpu_uids = set() | ||||||||||||||||||||||||
| for evt in collected: | ||||||||||||||||||||||||
| collected_gpu_uids.update(evt.get("gpu_events", [])) | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| orphan_kernels = [] | ||||||||||||||||||||||||
| for evt in self.tree.events: | ||||||||||||||||||||||||
| if self.event_to_category(evt) not in {"kernel", "gpu_memcpy", "gpu_memset"}: | ||||||||||||||||||||||||
| continue | ||||||||||||||||||||||||
| if evt["UID"] in collected_gpu_uids: | ||||||||||||||||||||||||
| continue | ||||||||||||||||||||||||
| if not include_nccl and "nccl" in evt.get("name", "").lower(): | ||||||||||||||||||||||||
| continue | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| has_cpu_op = False | ||||||||||||||||||||||||
| parent = self.tree.get_parent_event(evt) | ||||||||||||||||||||||||
| while parent is not None: | ||||||||||||||||||||||||
| if self.event_to_category(parent) == "cpu_op": | ||||||||||||||||||||||||
| has_cpu_op = True | ||||||||||||||||||||||||
| break | ||||||||||||||||||||||||
| parent = self.tree.get_parent_event(parent) | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| if not has_cpu_op: | ||||||||||||||||||||||||
| orphan_kernels.append(evt) | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # Group orphan kernels by their immediate parent (typically a runtime event) | ||||||||||||||||||||||||
| parent_to_orphans = defaultdict(list) | ||||||||||||||||||||||||
| for kernel in orphan_kernels: | ||||||||||||||||||||||||
| parent = self.tree.get_parent_event(kernel) | ||||||||||||||||||||||||
| parent_uid = parent["UID"] if parent else None | ||||||||||||||||||||||||
| parent_to_orphans[parent_uid].append(kernel) | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| for parent_uid, kernels in parent_to_orphans.items(): | ||||||||||||||||||||||||
| if parent_uid is None: | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
| if parent_uid is None: | |
| if parent_uid is None: | |
| # Kernels with no parent at all: emit a synthetic top-level event | |
| # so they are still visible in the unified perf table. | |
| for kernel in kernels: | |
| synthetic = dict(kernel) | |
| # Treat this kernel as its own "root" unified event, anchored | |
| # by its own UID in gpu_events. | |
| synthetic["gpu_events"] = [kernel["UID"]] | |
| collected.append(synthetic) |
Copilot
AI
Mar 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synthetic = dict(parent_evt) copies the parent event's UID, so if a runtime parent has multiple orphan kernels this will append multiple collected rows with the same UID. This makes UID-based identification/aggregation ambiguous in downstream tables. Consider generating a unique UID for synthetic events (e.g., based on the kernel UID) and/or adding an explicit flag/field indicating the row is synthetic.
| synthetic = dict(parent_evt) | |
| synthetic = dict(parent_evt) | |
| # Assign a unique UID for the synthetic event to avoid clashes | |
| synthetic["UID"] = f"{parent_uid}_synthetic_{kernel['UID']}" | |
| # Explicitly mark this row as synthetic for downstream consumers | |
| synthetic["is_synthetic"] = True |
Copilot
AI
Mar 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dict(parent_evt) is a shallow copy, so mutable fields like args and children will be shared between parent_evt and synthetic. If any later code mutates those nested structures (even just for formatting/enrichment), it can accidentally affect the original tree event. Prefer constructing a minimal synthetic dict with only the required fields, or use a deep copy and then strip fields that shouldn't carry over (e.g., children).
| synthetic = dict(parent_evt) | |
| # Create a deep copy to avoid sharing mutable nested structures (e.g., args). | |
| synthetic = copy.deepcopy(parent_evt) | |
| # Synthetic events should not carry over the original children hierarchy. | |
| synthetic.pop("children", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This second-pass logic introduces new behavior (including unlinked runtime/kernel paths) but there are no unit tests exercising
collect_unified_perf_events()for the "runtime has no cpu_op ancestor" case. Adding a focused test (similar totests/test_kernel_launchers.py::TestUnlinkedRuntimeEvents) would help prevent regressions and validate NCCL filtering + synthetic event creation.