This project provides a minimal implementation of a GPT-like model using pure C++, reproducing the structure and inference of the TinyStories-33M model.
It is organized into two main parts:
A reference implementation using NumPy, designed to help verify the model structure and logic before translating to C++.
Includes:
- A simple transformer model built with NumPy
- A minimal Byte-Pair Encoding (BPE) tokenizer
The C++ version contains three main modules:
A lightweight tensor library that supports:
- Tensors with arbitrary dimensions
- Auto broadcasting
- Basic operations required for Transformer models
- numpy style api
A neural network framework built on MTB, implementing:
- Embedding layer
- Linear (fully connected) layer
- Layer Normalization
- Activation functions (e.g. GELU)
- Self-Attention layer
Implements the GPT-Neo style blocks used in TinyStories-33M with cachecing machanism. Also includes a simple tokenizer for mapping between characters and embedding IDs.
Download the pre-converted model parameters from the original TinyStories-33M checkpoint:
Unzip the downloaded file into the root directory. You should get the following folder: "model_state_dic/"
cd cpp
mkdir build
make runYou should see output like the following:
python python/test_model_v2.py GPTNeoModel::forward: 15531 us
Embedding Layer: 10 us
Transformer Layers: 9109 us
GPTNeoBlock::forward: 2186 us
Attention Layer: 794 us
GPTNeoBlock::forward: 2349 us
Attention Layer: 993 us
GPTNeoBlock::forward: 2330 us
Attention Layer: 936 us
GPTNeoBlock::forward: 2233 us
Attention Layer: 877 us
LN&LM: 6411 us GPTNeoModel(
(wte) : MEmbed(50257, 768)
(wpe) : MEmbed(2048, 768)
(layers) : MList size:4 (
(0) : GPTNeoBlock(
(ln_1) : MLayerNorm(768, 1e-05)
(attn) : GPTNeoAttention(
(attention) : MSelfAT(
(k_proj) : MLinear (in_features: 768, out_features: 768, has_bias: false)
(v_proj) : MLinear (in_features: 768, out_features: 768, has_bias: false)
(q_proj) : MLinear (in_features: 768, out_features: 768, has_bias: false)
(out_proj) : MLinear (in_features: 768, out_features: 768, has_bias: true)
)
)
(ln_2) : MLayerNorm(768, 1e-05)
(mlp) : GPTNeoMLP(
(c_fc) : MLinear (in_features: 768, out_features: 3072, has_bias: true)
(c_proj) : MLinear (in_features: 3072, out_features: 768, has_bias: true)
(act) :MGELUActivation
)
)
)
(ln_f) : MLayerNorm(768, 1e-05)
(lm_head) : MLinear (in_features: 768, out_features: 50257, has_bias: false)
)- Add configuration
