Open-Vocabulary Top-Down Semantic Mapping from RGB-D Sequences
Generates top-down semantic maps from RGB-D sequences using CLIPSeg for zero-shot, open-vocabulary segmentation.
# Clone
git clone https://github.com/cukurovaai/ClipSegMap.git
cd ClipSegMap
# Create conda environment
conda env create -f environment.yml
conda activate clipseg-map
# Install package
pip install -e .# Edit config/default.yaml with your scene path, then run:
clipseg-map
# Or specify scene directly:
clipseg-map --scene /path/to/scene --output results/from clipseg_map import SemanticMapper, MapConfig
# Load configuration
config = MapConfig.from_yaml("config/default.yaml")
# Create mapper and build map
mapper = SemanticMapper(config.data_dir, config)
result = mapper.build_map()
# Access outputs
predictions = result["predictions"] # (H, W) class indices
color_topdown = result["color_topdown"] # (H, W, 3) RGB top-down
# Save results
mapper.save_results(result, config.output_path)
mapper.visualize(result, f"{config.output_path}/semantic_map.png")Edit config/default.yaml:
# Scene configuration
scene_data_root: "/path/to/dataset"
scene_name: "scene_001"
output_dir: "output"
# Map parameters
grid_size: 500
cell_size: 0.05
camera_height: 1.5
fov: 90.0
# Processing
frame_skip: 10
depth_sample_rate: 100
# Semantic labels
labels:
- floor
- wall
- chair
- table
- doorOverride via CLI:
clipseg-map --grid-size 800 --cell-size 0.03 --frame-skip 5scene_dir/
├── rgb/ # RGB images: 000000.png, 000001.png, ...
├── depth/ # Depth maps: 000000.npy, 000001.npy, ... (meters, float32)
└── pose/ # Camera poses: 000000.txt, 000001.txt, ...
Pose format: 7 values x y z qx qy qz qw or 16 values (4×4 matrix, row-major)
| File | Description |
|---|---|
predictions.npy |
Class indices (H, W) |
scores.npy |
Class scores (H, W, N) |
obstacles.npy |
Obstacle map (H, W) |
color_topdown.npy |
RGB projection (H, W, 3) |
semantic_map.png |
Visualization |
ClipSegMap/
├── clipseg_map/
│ ├── __init__.py # Package exports
│ ├── cli.py # CLI entry point
│ ├── config.py # MapConfig dataclass
│ ├── model.py # CLIPSeg model wrapper
│ ├── mapper.py # SemanticMapper class
│ ├── data/
│ │ └── loader.py # Data loading utilities
│ └── utils/
│ ├── geometry.py # Point cloud utilities
│ └── visualization.py # Visualization functions
├── config/
│ └── default.yaml # Default configuration
├── scripts/
│ └── run_mapping.py # Example script
└── tests/