Proposal: GPU primitives for product-manifold optimization (batched small-matrix kernels) #1232

zazabap · 2026-05-18T05:32:00Z

zazabap
May 18, 2026

Updating the original post — manifold optimization sits outside cuOpt's scope (LP / MIP / VRP). Reframing this as a primitive proposal rather than a feature ask.

I'm one of the authors of ManifoldsGPU.jl. On product manifolds, retraction and projection — what supply the non-convex constraints — require many small dense linear-algebra operations executed in parallel, and that's where the bottleneck sits.

Primitives that are missing or under-tuned at small sizes (≲ 64, batch ≥ 10⁴):

Batched QR / polar decomposition — for Stiefel / Grassmann retractions.
Batched matrix exponential / logarithm — for SO(n) / SE(n) retractions.
Batched symmetric eigendecomposition — for SPD projections and fixed-rank truncations.

cuBLAS / cuSOLVER batched APIs cover larger sizes well, but the small-batch retraction kernels are the gap. The same primitives would also serve orthogonal NN layers, geometric deep learning, and robotics state estimation.

A pointer to the right repo (cuBLAS / cuSOLVER discussion channels?) would be welcome. Happy to collaborate from the ManifoldsGPU.jl side with concrete kernels and benchmarks.

zazabap · 2026-05-18T05:47:25Z

zazabap
May 18, 2026
Author

Follow-up — quick landscape check.

After posting I did some homework to make sure this proposal isn't redundant. Sharing in case it's useful for scoping.

No CUDA-native manifold-optimization library exists

Every "GPU manifold opt" project I could locate delegates to a higher-level framework rather than shipping custom kernels:

McTorch — PyTorch ops; last release Apr 2021, appears unmaintained.
Geoopt — PyTorch ops via ManifoldTensor.to(device).
geotorch — PyTorch parametrizations.
TensorFlow-RiemOpt — TF ops.
RiemannAX / Rieoptax — JAX / XLA.
Manopt (MATLAB) — gpuArray, partial coverage (sphere / Stiefel / complex-circle).
ROPTLIB, Manopt.jl — CPU only.
ManifoldsGPU.jl — our own; same delegation pattern, hitting the same wall.

The gap is library-agnostic — a primitive layer would underpin all of these without competing with any one.

Against current cuBLAS / cuSOLVER coverage

Primitive	Status
Batched GEMM	✅ `gemmBatched`, `gemmStridedBatched`
Batched LU / Cholesky	✅ `getrfBatched`, `potrfBatched`
Batched QR	⚠️ `geqrfBatched` exists; no column-pivoted / retraction-stable variant
Batched Jacobi SVD	✅ `gesvdjBatched`, `gesvdaStridedBatched`
Batched symmetric eigendecomp	✅ `syevjBatched`
Batched polar decomposition	❌ no kernel, no published library
Batched matrix exp / log	❌ no batched kernel (research only)

This sharpens the original proposal: the two clearest gaps — batched polar decomposition and batched matrix exponential / logarithm at small sizes (≲ 64, batch ≥ 10⁴) — are exactly the operations retractions on Stiefel / Grassmann / SO(n) / SE(n) rely on most. Polar in particular is the natural stable retraction on Stiefel, and downstream libraries either reimplement it via SVD (heavy at small sizes) or via Newton iterations that don't vectorize cleanly across batches.

Happy to file these against cuSOLVER / cuBLAS directly if that's the right venue — a pointer to the right repo would be welcome.

0 replies

rgsl888prabhu · 2026-05-18T15:06:47Z

rgsl888prabhu
May 18, 2026
Maintainer

@mlubin @chris-maes @rg20 for viz

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: GPU primitives for product-manifold optimization (batched small-matrix kernels) #1232

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Proposal: GPU primitives for product-manifold optimization (batched small-matrix kernels) #1232

Uh oh!

Uh oh!

zazabap May 18, 2026

Replies: 2 comments

Uh oh!

zazabap May 18, 2026 Author

No CUDA-native manifold-optimization library exists

Against current cuBLAS / cuSOLVER coverage

Uh oh!

rgsl888prabhu May 18, 2026 Maintainer

zazabap
May 18, 2026

zazabap
May 18, 2026
Author

rgsl888prabhu
May 18, 2026
Maintainer