The curriculum is structured as a continuous 9-part journey, released sequentially. It is broken down into three major phases: moving from 2D pixel understanding, into 3D spatial awareness, and finally into physical simulation and generative synthesis.
We start at the absolute basics of image data, build classical intuition, and modernize it with deep learning.
- Part 0: The Engineer’s Primer on Image Data
- The First Step: Before we run algorithms, we need to understand the data. We cover linear algebra for images, matrix operations on pixels, basic image transformations, and setting up your computational environment.
- Part 1: Core Classical Vision
- Building Intuition: Extracting meaning from 2D images without neural networks. We dive into image filtering, convolution, edge detection, feature descriptors, and optical flow.
- Part 2: Deep Learning for Computer Vision
- Modernizing Perception: Taking the foundational tasks from Part 1 and solving them with modern neural architectures. Covering CNNs, object detection, semantic segmentation, and representation learning.
Moving from flat screens into the real, 3D world. We explore how neural networks interpret space, the rigorous classical math behind it, and how robots use this to navigate.
- Part 3: 3D Computer Vision
- Entering the Third Dimension: Applying modern learning to spatial data. We process point clouds, 3D structures, and explore how algorithms understand depth and volume.
- Part 4: Geometric Methods for Vision
- The Mathematical Backbone: Building on the introductory concepts to understand the strict geometric laws of the world. Covering camera calibration, epipolar geometry, triangulation, and Structure from Motion (SfM).
- Part 5: Robot Localization and Mapping
- Active Perception: Deploying 3D geometry onto a moving agent. We cover Visual SLAM (Simultaneous Localization and Mapping), visual odometry, state estimation, and loop closure.
Once a robot can navigate the world, we need to understand how light forms that world—allowing us to synthesize new data and build robust simulation engines.
- Part 6: Physics-Based Vision
- Inverse Rendering: Deducing physical scene properties from captured light. We explore radiometry, surface reflectance, photometric stereo, and how materials interact with photons.
- Part 7: Learning-Based Image Synthesis
- Generative AI for Robotics: Using deep learning to create novel, physically plausible visual data. Covering generative models, neural rendering, and novel view synthesis.
- Part 8: Computer Graphics & Forward Rendering
- Building the Matrix: The culmination of the curriculum. We reverse the physics-based vision step to accurately simulate light and create entire rendered environments for robotic testing.