hi there! this repo is an hands on accompaniment to this great article: https://siboehm.com/articles/22/CUDA-MMM, also inspired by https://github.com/srush/GPU-Puzzles
along with the above article, this notebook will guide you through writing progressively fancier CUDA kernels, with explanations and skeleton code to assist you.
so far, 6 have been implemented, so you'll end up with 80% of the performance of CUBLASS (Nvidia's official matrix multiplication kernel)
click 'Open in Colab' to run this notebook. you'll be working on a personal copy and any changes you make won't affect the original notebook.
here's what you'll find in this repo:
notebooks/:optimizing_cuda_matmul.ipynb: interactive notebook
solutions/: (try to give each kernel an honest effort first, and let me know where you got stuck)kernel_1_solution.cpptokernel_6_solution.cpp: solutions for each kernelroofline_solutions.py: solution for the roofline model exercise
src/: utility stuffhelper_functions.py: testing functions and suchtest_sgemm.cu,test_sgemm2.cu,test_sgemm3.cu: test files for our kernels
- basic programming skills (if you can write a for loop in c++/python, you're prob good)
- curiosity about making gpus go zoom 🏎️