This project is part of a personal learning journey to deepen foundational knowledge in neural networks before moving on to attention mechanisms and transformers. The approach taken was to study Andrej Karpathy's micrograd repository (https://github.com/karpathy/micrograd) as an educational reference, understand its internals thoroughly, then implement a personal version from scratch without looking at the source code.
Micrograd is a tiny scalar-valued autograd engine that implements backpropagation over a dynamically built Directed Acyclic Graph (DAG), with a PyTorch-like neural network library on top of it. Credit goes to Andrej Karpathy for the original project: https://github.com/karpathy/micrograd.
The implementation process consisted of two phases:
- Study Phase: Carefully studying
engine.pyin the original repository to understand theValueclass, how operations (add, mul, pow, relu) construct the computation graph, and how thebackward()method utilizes topological sort to propagate gradients. - Re-implementation Phase: Closing the reference and re-implementing
engine.pyfrom scratch from memory.
During the first attempt, a few errors were identified and corrected:
- Incorrect method signatures.
- A typo in the
__add__method whereselfwas wrapped instead ofother. - Naming the ReLU operation
__relu__, which is not a recognized Python dunder method, and correcting it torelu.
The core engine containing the Value class, which implements scalar autograd. It supports the following operations:
- Addition:
__add__,__radd__ - Subtraction:
__sub__,__rsub__ - Multiplication:
__mul__,__rmul__ - Division:
__div__,__rdiv__ - Power:
__pow__ - Negation:
__neg__ - ReLU Activation:
relu - Backpropagation:
backward()using a topological sort of the computation graph.
A simple test script to verify the correctness of the forward and backward passes.
from engine import Value
a = Value(2.0)
b = Value(3.0)
c = a * b + a**2 # forward result: 10.0
c.backward()
print(f"a.grad: {a.grad}") # Expected: 7.0 (dc/da = b + 2a = 3 + 4)
print(f"b.grad: {b.grad}") # Expected: 2.0 (dc/db = a = 2)This project is a step in a broader learning path: mastering neural network fundamentals, including Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and Spiking Neural Networks (SNNs), before advancing to the study of attention mechanisms and transformer architectures.
Credit to Andrej Karpathy for the original micrograd project: https://github.com/karpathy/micrograd