Replies: 2 comments
-
|
Follow-up — quick landscape check. After posting I did some homework to make sure this proposal isn't redundant. Sharing in case it's useful for scoping. No CUDA-native manifold-optimization library existsEvery "GPU manifold opt" project I could locate delegates to a higher-level framework rather than shipping custom kernels:
The gap is library-agnostic — a primitive layer would underpin all of these without competing with any one. Against current cuBLAS / cuSOLVER coverage
This sharpens the original proposal: the two clearest gaps — batched polar decomposition and batched matrix exponential / logarithm at small sizes (≲ 64, batch ≥ 10⁴) — are exactly the operations retractions on Stiefel / Grassmann / SO(n) / SE(n) rely on most. Polar in particular is the natural stable retraction on Stiefel, and downstream libraries either reimplement it via SVD (heavy at small sizes) or via Newton iterations that don't vectorize cleanly across batches. Happy to file these against cuSOLVER / cuBLAS directly if that's the right venue — a pointer to the right repo would be welcome. |
Beta Was this translation helpful? Give feedback.
-
|
@mlubin @chris-maes @rg20 for viz |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Updating the original post — manifold optimization sits outside cuOpt's scope (LP / MIP / VRP). Reframing this as a primitive proposal rather than a feature ask.
I'm one of the authors of ManifoldsGPU.jl. On product manifolds, retraction and projection — what supply the non-convex constraints — require many small dense linear-algebra operations executed in parallel, and that's where the bottleneck sits.
Primitives that are missing or under-tuned at small sizes (≲ 64, batch ≥ 10⁴):
cuBLAS / cuSOLVER batched APIs cover larger sizes well, but the small-batch retraction kernels are the gap. The same primitives would also serve orthogonal NN layers, geometric deep learning, and robotics state estimation.
A pointer to the right repo (cuBLAS / cuSOLVER discussion channels?) would be welcome. Happy to collaborate from the ManifoldsGPU.jl side with concrete kernels and benchmarks.
Beta Was this translation helpful? Give feedback.
All reactions