Skip to content

Add VideoSegmentationSam3Boxes node#48

Open
demoulinv wants to merge 6 commits intomainfrom
dev/sam3VideoBoxes
Open

Add VideoSegmentationSam3Boxes node#48
demoulinv wants to merge 6 commits intomainfrom
dev/sam3VideoBoxes

Conversation

@demoulinv
Copy link
Copy Markdown
Collaborator

This pull request introduces a new segmentation node and makes several improvements and bug fixes to the video segmentation pipeline. The main addition is the new VideoSegmentationSam3Boxes node, which segments video frames using bounding boxes from a JSON file. Additionally, several changes in VideoSegmentationSam3Text.py improve the consistency of mask and bounding box handling.

New Node Addition:

  • Added a new node VideoSegmentationSam3Boxes for segmenting video frames based on bounding boxes from a JSON file, supporting multiple input resolutions, GPU usage, mask inversion, and flexible output options. This node integrates with the SAM3 video predictor and handles mask generation, file management, and metadata.

Improvements in VideoSegmentationSam3Text:

  • Fixed mapping and indexing for mask and bounding box dictionaries to use absolute frame IDs instead of local indices, ensuring correct association across the video sequence.
  • Updated mask assignment to encode object IDs in the mask values for better downstream processing and visualization.
  • Simplified function signatures by removing unnecessary width and height arguments from calls to sam3Utils.mapIds, as this information is not needed.

These changes collectively improve the flexibility, correctness, and usability of the video segmentation pipeline, especially for workflows involving bounding box-based segmentation and multi-resolution inputs.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the video segmentation pipeline by adding a new Meshroom node (VideoSegmentationSam3Boxes) that generates masks from tracked bounding boxes stored in a JSON file, and aligns parts of the existing SAM3 text-based video segmentation to use absolute frame IDs and updated ID mapping.

Changes:

  • Added VideoSegmentationSam3Boxes node to segment video frames using per-frame bounding boxes (with multi-resolution inputs and mask inversion support).
  • Added segmentationRDS/bboxUtils.py to parse/merge/expand boxes and split them into consecutive-frame chunks.
  • Updated SAM3 utilities and text node to use the new mapIds signature and to key box dictionaries by absolute frame IDs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.

File Description
segmentationRDS/sam3Utils.py Changed mapIds signature and scaled ROI using mask dimensions instead of passed-in w/h.
segmentationRDS/bboxUtils.py New helper module for reading/merging/expanding boxes and creating tracking chunks.
meshroom/imageSegmentation/VideoSegmentationSam3Text.py Updated mapIds calls and changed box dictionary indexing to absolute frame IDs; adjusted mask filling values.
meshroom/imageSegmentation/VideoSegmentationSam3Boxes.py New node implementation for box-driven video segmentation with SAM3 video predictor and multi-resolution crop handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


def merge_boxes(box1: list, box2: list, iou_threshold: float = 0.5) -> tuple[list, str]:
"""
Merge 2 boxes xyxy by taking the bounding boxe, if their IoU is higher than the threshold.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Merge 2 boxes xyxy by taking the bounding boxe, if their IoU is higher than the threshold.
Merge 2 boxes xyxy by taking the bounding box, if their IoU is higher than the threshold.

]
return merged, f"bounding (IoU={iou:.2f})"
else:
return box1, f"forward (IoU={iou:.2f} < seuil={iou_threshold})"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return box1, f"forward (IoU={iou:.2f} < seuil={iou_threshold})"
return box1, f"forward (IoU={iou:.2f} < threshold={iou_threshold})"


expanded_display = [int(new_x1), int(new_y1), int(new_x2), int(new_y2)]

# 3. Back conversion to source space
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# 3. Back conversion to source space
# Back conversion to source space

) -> dict:
"""
Extract bounding boxes per object and organize them in chunck of consecutive frames.
Coordinates in the json file are supposed to be in the original source space, with the pixel aspect ratio not applicated.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Coordinates in the json file are supposed to be in the original source space, with the pixel aspect ratio not applicated.
Coordinates in the json file are supposed to be in the original source space, with the pixel aspect ratio not applied.

import json
from dataclasses import dataclass, field

THRESHOLDS = [252, 504, 1008]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
THRESHOLDS = [252, 504, 1008]
SIZE_THRESHOLDS = [252, 504, 1008]

Comment on lines +247 to +250
if target_size < 504 and not x4_ok:
target_size = 504
if target_size < 1008 and not x2_ok:
target_size = 1008
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if target_size < 504 and not x4_ok:
target_size = 504
if target_size < 1008 and not x2_ok:
target_size = 1008
if target_size < SIZE_THRESHOLDS[1] and not x4_ok:
target_size = SIZE_THRESHOLDS[1]
if target_size < SIZE_THRESHOLDS[2] and not x2_ok:
target_size = SIZE_THRESHOLDS[2]

desc.File(
name="inputx2",
label="Inputx2",
description="Folder containing source images upscale by 2.",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description="Folder containing source images upscale by 2.",
description="Folder containing source images upscaled by 2.",

desc.File(
name="inputx4",
label="Inputx4",
description="Folder containing source images upscale by 4.",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description="Folder containing source images upscale by 4.",
description="Folder containing source images upscaled by 4.",


image_paths.sort(key=lambda x: x[0])
else:
raise ValueError(f"Input path '{input_path}' is not a valid path (folder or sfmData file).")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError(f"Input path '{input_path}' is not a valid path (folder or sfmData file).")
raise ValueError(f"Input path '{input_path}' is not a valid sfmData file.")

Comment on lines +278 to +293
for id, v in views.items():
image_x1_path = Path(v.getImage().getImagePath())
image_x1_name = image_x1_path.name
image_x2_path = None
if os.path.isfile(os.path.join(path_folder_x2, image_x1_name)):
image_x2_path = os.path.join(path_folder_x2, image_x1_name)
image_x4_path = None
if os.path.isfile(os.path.join(path_folder_x4, image_x1_name)):
image_x4_path = os.path.join(path_folder_x4, image_x1_name)
intrinsic = dataAV.getIntrinsicSharedPtr(v.getIntrinsicId())
pinhole = camera.Pinhole.cast(intrinsic)
par = 1.0
if pinhole is not None:
par = pinhole.getPixelAspectRatio()
image_paths.append((image_x1_path, str(id), v.getFrameId(), v.getImage().getWidth(),
v.getImage().getHeight(), par, image_x2_path, image_x4_path))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for id, v in views.items():
image_x1_path = Path(v.getImage().getImagePath())
image_x1_name = image_x1_path.name
image_x2_path = None
if os.path.isfile(os.path.join(path_folder_x2, image_x1_name)):
image_x2_path = os.path.join(path_folder_x2, image_x1_name)
image_x4_path = None
if os.path.isfile(os.path.join(path_folder_x4, image_x1_name)):
image_x4_path = os.path.join(path_folder_x4, image_x1_name)
intrinsic = dataAV.getIntrinsicSharedPtr(v.getIntrinsicId())
pinhole = camera.Pinhole.cast(intrinsic)
par = 1.0
if pinhole is not None:
par = pinhole.getPixelAspectRatio()
image_paths.append((image_x1_path, str(id), v.getFrameId(), v.getImage().getWidth(),
v.getImage().getHeight(), par, image_x2_path, image_x4_path))
commonParams = None
for id, v in views.items():
image_x1_path = Path(v.getImage().getImagePath())
image_x1_name = image_x1_path.name
image_x2_path = None
if os.path.isfile(os.path.join(path_folder_x2, image_x1_name)):
image_x2_path = os.path.join(path_folder_x2, image_x1_name)
image_x4_path = None
if os.path.isfile(os.path.join(path_folder_x4, image_x1_name)):
image_x4_path = os.path.join(path_folder_x4, image_x1_name)
intrinsic = dataAV.getIntrinsicSharedPtr(v.getIntrinsicId())
pinhole = camera.Pinhole.cast(intrinsic)
par = 1.0
if pinhole is not None:
par = pinhole.getPixelAspectRatio()
if commonParams is None:
commonParams = [v.getImage().getWidth(), v.getImage().getHeight(), par, image_x2_path is None, image_x4_path is None]
if commonParams != [v.getImage().getWidth(), v.getImage().getHeight(), par, image_x2_path is None, image_x4_path is None]:
raise ValueError("All images do not have same dimensions or one image is missing its upscaled version.")
image_paths.append((image_x1_path, str(id), v.getFrameId(), v.getImage().getWidth(),
v.getImage().getHeight(), par, image_x2_path, image_x4_path))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants