Skip to content

Fix tile_dot for scalar float64 tiles#1565

Closed
kiwigitops wants to merge 2 commits into
NVIDIA:mainfrom
kiwigitops:fix-tile-dot-float64-scalar
Closed

Fix tile_dot for scalar float64 tiles#1565
kiwigitops wants to merge 2 commits into
NVIDIA:mainfrom
kiwigitops:fix-tile-dot-float64-scalar

Conversation

@kiwigitops

@kiwigitops kiwigitops commented Jun 16, 2026

Copy link
Copy Markdown

Summary

  • add the missing scalar float64 native tensordot() overload
  • cover wp.tile_dot() on scalar wp.float64 tiles with a CPU regression test

Validation

  • python build_lib.py --no-cuda --no-use-libmathdx -j 4
  • python -m unittest warp.tests.tile.test_tile_fused_ops.TestTileFusedOps.test_tile_dot_basic_cpu warp.tests.tile.test_tile_fused_ops.TestTileFusedOps.test_tile_dot_float64_scalar_cpu -v

Closes #1563

Summary by CodeRabbit

  • New Features

    • Extended tensor dot support by adding a float64-specific overload, ensuring float64 scalar dot/tensordot code paths compile and dispatch correctly.
  • Tests

    • Added a new fused-tile dot test for float64 scalar outputs, comparing tiled tile_dot results against NumPy with float64-appropriate tolerance.

@copy-pr-bot

copy-pr-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Enterprise

Run ID: 11c63458-b93b-409c-b981-511462b726c7

📥 Commits

Reviewing files that changed from the base of the PR and between 43eb63f and e32d6d4.

📒 Files selected for processing (1)
  • warp/tests/tile/test_tile_fused_ops.py
💤 Files with no reviewable changes (1)
  • warp/tests/tile/test_tile_fused_ops.py

📝 Walkthrough

Walkthrough

Adds a missing wp::tensordot overload for float64 in builtin.h so that scalar float64 tile operands resolve to a float64 result type, fixing a compilation failure in wp.tile_dot. A new test verifies the fix end-to-end.

Changes

float64 scalar tile_dot fix

Layer / File(s) Summary
float64 tensordot overload
warp/native/builtin.h
Adds inline CUDA_CALLABLE float64 tensordot(float64 a, float64 b) to namespace wp, ensuring decltype(tensordot(T{}, T{})) resolves to float64 for scalar float64 tile operands instead of float.
Test: test_tile_dot_float64_scalar
warp/tests/tile/test_tile_fused_ops.py
Adds and registers test_tile_dot_float64_scalar, which launches a tiled kernel using float64 register tiles, calls wp.tile_dot, and asserts the result matches NumPy's dot product at float64 precision.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: fixing tile_dot for scalar float64 tiles.
Linked Issues check ✅ Passed The PR fully addresses issue #1563 by adding the missing float64 tensordot overload and including a regression test for scalar float64 tiles.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the scalar float64 tile_dot issue: adding tensordot overload and test coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps

greptile-apps Bot commented Jun 16, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a compilation failure in wp.tile_dot when used with scalar wp.float64 tiles by adding the missing tensordot(float64, float64) overload in builtin.h, and adds a focused regression test to prevent recurrence.

  • warp/native/builtin.h: One-line addition of inline CUDA_CALLABLE float64 tensordot(float64 a, float64 b) that mirrors the existing float overload. The tile_dot template resolves its scalar type via decltype(tensordot(T{}, T{})), so this single overload unblocks both the forward pass and adjoint compilation for float64 scalar tiles.
  • warp/tests/tile/test_tile_fused_ops.py: New test_tile_dot_float64_scalar test using non-trivial inputs (arange(64) and arange(64) * 0.5), validated against np.dot at places=8 precision, registered for all configured devices.

Confidence Score: 5/5

Safe to merge — the change is a single-line targeted fix with a matching regression test and no side effects on existing code paths.

The new tensordot(float64, float64) overload is a direct, minimal mirror of the existing float overload and delegates to mul(float64, float64) which already exists via DECLARE_FLOAT_OPS. There is no risk of breaking other overloads, and the template dispatch in tile_dot / adj_tile_dot now resolves cleanly for float64. The test validates correctness end-to-end at float64 precision.

No files require special attention.

Important Files Changed

Filename Overview
warp/native/builtin.h Adds tensordot(float64, float64) overload, mirroring the existing float overload; delegates to the already-present mul(float64, float64) from DECLARE_FLOAT_OPS — the fix is minimal, correct, and complete for the forward path.
warp/tests/tile/test_tile_fused_ops.py Adds test_tile_dot_float64_scalar covering the new code path; uses non-trivial input (arange * 0.5), validates against NumPy with places=8, and is registered for all configured devices.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["wp.tile_dot(a, b)\n(TileA::Type = float64)"] --> B["tile_dot template\nScalarT = decltype(tensordot(float64{}, float64{}))"] 
    B --> C{"tensordot overload\nresolution"}
    C -->|"Before PR\n(missing overload)"| D["❌ Compilation failure"]
    C -->|"After PR\n(new overload added)"| E["tensordot(float64, float64)\n→ mul(a, b) → a * b"]
    E --> F["thread_sum += tensordot(...)"]
    F --> G["Cross-thread reduction\n(warp shuffle / atomic)"]
    G --> H["Single-element float64\noutput tile"]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["wp.tile_dot(a, b)\n(TileA::Type = float64)"] --> B["tile_dot template\nScalarT = decltype(tensordot(float64{}, float64{}))"] 
    B --> C{"tensordot overload\nresolution"}
    C -->|"Before PR\n(missing overload)"| D["❌ Compilation failure"]
    C -->|"After PR\n(new overload added)"| E["tensordot(float64, float64)\n→ mul(a, b) → a * b"]
    E --> F["thread_sum += tensordot(...)"]
    F --> G["Cross-thread reduction\n(warp shuffle / atomic)"]
    G --> H["Single-element float64\noutput tile"]
Loading

Reviews (4): Last reviewed commit: "Remove unused tile test variable" | Re-trigger Greptile

Comment thread warp/tests/tile/test_tile_fused_ops.py
Signed-off-by: kiwigitops <kiwisclubco@gmail.com>
@kiwigitops kiwigitops force-pushed the fix-tile-dot-float64-scalar branch from 4061315 to c6269e6 Compare June 17, 2026 00:13
Signed-off-by: kiwigitops <kiwisclubco@gmail.com>
@kiwigitops kiwigitops force-pushed the fix-tile-dot-float64-scalar branch from 43eb63f to e32d6d4 Compare June 17, 2026 22:12

Copy link
Copy Markdown
Author

Quick bump here. I addressed the review note and the visible checks are green on my end. Happy to adjust anything else if needed.

@shi-eric

Copy link
Copy Markdown
Contributor

Hi @kiwigitops, we’re closing this pull request because #1563 has now been fixed on main by ebcce32.

The merged fix addresses the same underlying issue with a type-preserving scalar tensordot() implementation, covers both float64 and float16, and includes forward and gradient coverage across CPU and CUDA. This PR has therefore been superseded and is no longer needed.

We should have let you know sooner that a broader fix was in progress. We appreciate the work and testing you contributed here. For future work on assigned issues, please coordinate in the issue thread first so we can avoid parallel implementations.

@shi-eric shi-eric closed this Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

wp.tile_dot() fails to compile for scalar wp.float64 tiles.

2 participants