Supported Chips

SuperKitten

This is SuperKitty, and she loves to MEOW (Maxamizing Effficient Operation's per Watt), inspired by ThunderKittens

Why?

Consumer hardware, especially apple silicon is immenstly power dense and underutilized with today's open source infrence/training supporitng libraries. SK is here to change that.

Writing deep learning metal kernels should be easy; this library aims to do such that, without sacrificing performance for abstraction, it delivers the fastest compute (not theoretical), so you can sqeeze out maximum perf! It is .metal, headers and .c, and was developed with easy use in mind. It has an assortment of metal kernels so your chips don't starve!

Bridging the gap between bleeding intelligence and consumer hardware!

This is superkitty, and she loves to MEOW! - Maxamizing effficient operation's per Watt

It is:

Simple

SuperKittens is straightforward to write and works seamlessly out the box with your existing apple silicon code running on any of the M(1, 2, 3, 4, 5) chips.

Fast

The aim was never sacrificing perf for easier abstractions, we didn't! In opposite, we aim to provide simpler, yet much faster kernels that are still performant.

Supported Chips

We currently only support M1 and M2 and are in the process of adding support for M2+.

Quickstart

Install (team, private wheel)

Wheels are published as GitHub Release assets on the private repo. Auth via gh (preferred) or a fine-grained GH_TOKEN with repo:read scope.

# preferred — uses your gh auth, no token plumbing
gh release download dev-latest -p '*.whl' -R Lazarus-931/SuperKittens && pip install superkittens-*.whl

# or with a token
pip install "https://${GH_TOKEN}@github.com/Lazarus-931/SuperKittens/releases/download/dev-latest/superkittens-<version>-cp312-cp312-macosx_<ver>_arm64.whl"

Pinned versions live under tags (v0.1.0, ...) once cut; dev-latest floats on main.

Build from source

git clone https://github.com/Lazarus-931/SuperKittens.git
cd SuperKittens
./build.sh                          # compiles Metal kernels → build/libsk.metallib + libsk.dylib

import numpy as np
from sk.src.py import activation

x = np.random.randn(512, 1024).astype(np.float16)
y = activation.gelu(x)              # dispatches Metal kernel via ctypes → libsk.dylib

Prerequisites

You need the Metal toolchain (metal, metallib). Command Line Tools alone are not enough — they ship metal but not metallib. Two options:

Full Xcode (App Store) — install Xcode, then:

sudo xcode-select -s /Applications/Xcode.app/Contents/Developer
sudo xcodebuild -license accept

Metal toolchain only (smaller, requires Xcode already installed to invoke xcodebuild):
```
xcodebuild -downloadComponent MetalToolchain
```

Verify:

xcrun -f metallib    # should print a path
xcrun -f metal

Python (3.10+, Homebrew recommended):

python3 -m venv ~/sk-venv
source ~/sk-venv/bin/activate
pip install -U "huggingface_hub[cli]" numpy sentencepiece tokenizers

Benchmarking

Kernels & their respetive benchmarks done

[INSERT TABLE HERE, ROWS ARE KERNELS, CHIPS ARE COLS]

What's coming

The whole point of SuperKittens is giving you fast, composable Metal primitives you can drop into any project — a Swift app, a C++ inference engine, whatever. No framework lock-in, just headers and shaders.

Here's where we're headed:

Templated attention — support any head dim (64, 96, 128, 256) and sequence length out of the box, not just hardcoded configs
Causal masking — fused into the attention kernel, not bolted on after
Multi-head and GQA — batched heads with grouped-query attention so you can run real models
GEMM for common inference shapes — not trying to be a general BLAS, just the shapes that actually show up in transformer inference
One include, everything works — #include "superkittens.h" gives you BlockMMA, Tile, Frag, loaders, and every fused kernel. Compose them into your own stuff or use the ready-made ones
Docs that actually help — examples showing how to build a custom kernel from the primitives, not just API reference

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
.claude		.claude
.github/workflows		.github/workflows
SuperKittens.xcodeproj		SuperKittens.xcodeproj
SuperKittens		SuperKittens
metal-cpp		metal-cpp
temp		temp
tools		tools
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTION		CONTRIBUTION
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
bench.py		bench.py
best.md		best.md
build.sh		build.sh
download.sh		download.sh
meow.png		meow.png
pyproject.toml		pyproject.toml
setup.py		setup.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SuperKitten

Why?

Supported Chips

Quickstart

Install (team, private wheel)

Build from source

Prerequisites

Benchmarking

What's coming

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SuperKitten

Why?

Supported Chips

Quickstart

Install (team, private wheel)

Build from source

Prerequisites

Benchmarking

What's coming

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages