Skip to content

zjykzj/DataFlow-CV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

235 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DataFlow-CV

🌊 Where Vibe Coding meets CV data. Convert, visualize & evaluate datasets β€” built with the flow of Claude Code.

PyPI Python 3.8+ CI License
Linux Windows macOS

A computer vision dataset processing library for seamless format conversion, visualization, and evaluation between YOLO, LabelMe, and COCO annotation formats. Designed for researchers and developers working with multi-format annotation pipelines.

graph LR
    A[YOLO<br/>.txt] -->|convert| D[DataFlow-CV]
    B[LabelMe<br/>.json] -->|convert| D
    C[COCO<br/>.json] -->|convert| D
    D -->|visualize| E[🎨 Rendered<br/>Images]
    D -->|evaluate| F[πŸ“Š mAP / AR<br/>Metrics]
Loading

✨ Features

πŸ”„ Format Conversion Convert between YOLO, LabelMe, and COCO in any direction β€” 6 conversion paths, plus prediction file support (outputs standard list-format COCO predictions)
🎯 Detection & Segmentation Handle both object detection (bbox) and instance segmentation (polygon/RLE) annotations
🎨 Visualization Render annotations with OpenCV β€” color-coded classes, semi-transparent masks, display & save modes
πŸ“Š Evaluation COCO-standard 12-metric output (mAP, AP50, AP75, AR) via pycocotools, with per-class breakdowns
πŸ’» Command-line Interface Intuitive CLI with convert, visualize, and evaluate subcommands β€” positional args, rich --help
🐍 Python API Programmatic access for integration into larger ML pipelines
πŸ“ Verbose Logging File-based debug logging with timestamps β€” toggle with --verbose
πŸ–₯️ Headless Mode Server/Docker-friendly: --no-display + --save for off-screen rendering
πŸ›‘οΈ Flexible Error Handling Strict mode (abort on error) or lenient mode (skip & continue with warnings) via --no-strict

πŸ“¦ Installation

From PyPI

pip install dataflow-cv

From Source

git clone https://github.com/zjykzj/DataFlow-CV.git
cd DataFlow-CV

# Regular installation
pip install .

# Editable installation (for development)
pip install -e .

πŸ’‘ Tip: When installed in editable mode, use python -m dataflow.cli instead of the dataflow-cv command.

Optional Dependencies

Dependency Purpose Install
pycocotools COCO RLE segmentation + evaluation pip install pycocotools

πŸš€ Quick Start

Command-line Interface

All required parameters (image directories, label directories, class files, output paths) are positional arguments for better usability. Use --help on any subcommand for detailed usage.

πŸ”„ Format Conversion

# YOLO β†’ COCO
dataflow-cv convert yolo2coco images/ yolo_labels/ classes.txt output.json

# YOLO β†’ COCO (with RLE encoding)
dataflow-cv convert yolo2coco images/ yolo_labels/ classes.txt output.json --do-rle

# YOLO β†’ LabelMe
dataflow-cv convert yolo2labelme images/ yolo_labels/ classes.txt labelme_json/

# LabelMe β†’ YOLO
dataflow-cv convert labelme2yolo labelme_json/ classes.txt yolo_labels/

# LabelMe β†’ COCO
dataflow-cv convert labelme2coco labelme_json/ classes.txt output.json

# COCO β†’ YOLO
dataflow-cv convert coco2yolo input.json yolo_labels/

# COCO β†’ LabelMe
dataflow-cv convert coco2labelme input.json labelme_json/

# YOLO predictions β†’ COCO (output: plain JSON list β€” prediction format)
dataflow-cv convert yolo2coco --prediction images/ yolo_preds/ classes.txt pred.json

# Options
dataflow-cv convert yolo2coco --verbose images/ labels/ classes.txt output.json
dataflow-cv convert yolo2coco --no-strict images/ labels/ classes.txt output.json

🎨 Visualization

# Visualize YOLO annotations
dataflow-cv visualize yolo images/ yolo_labels/ classes.txt --save visualized/

# Visualize LabelMe annotations
dataflow-cv visualize labelme images/ labelme_json/ --save visualized/

# Visualize COCO annotations
dataflow-cv visualize coco images/ coco_annotations.json --save visualized/

# Verbose logging + headless mode
dataflow-cv visualize yolo --verbose --no-display images/ yolo_labels/ classes.txt --save visualized/

πŸ“Š Evaluation

Evaluate object detection and instance segmentation model outputs using COCO-standard metrics. Two COCO-format JSON files are required:

File Role Format Source
anno.json Ground Truth (GT) β€” reference annotations Full COCO dict (images, annotations, categories) yolo2coco (label mode)
pred.json Detection (DT) β€” model predictions Plain JSON list of annotation dicts (with score) yolo2coco --prediction, Detectron2, MMDetection
β‘  Preparing Evaluation Data

If your annotations and predictions are in YOLO format, convert them to COCO JSON first:

# Step 1: YOLO ground truth labels β†’ COCO GT (anno.json)
#   Label format:   class_id cx cy w h               ← 5 tokens (detection)
#                   class_id x1 y1 ... xn yn          ← odd tokens (segmentation)
dataflow-cv convert yolo2coco images/ yolo_labels/ classes.txt anno.json

# Step 2: YOLO predictions β†’ COCO DT (pred.json)
#   Prediction fmt: class_id cx cy w h confidence     ← 6 tokens (detection)
#                   class_id x1 y1 ... xn yn confidence ← even tokens (segmentation)
dataflow-cv convert yolo2coco --prediction images/ yolo_preds/ classes.txt pred.json

⚠️ Important: YOLO label files (GT) use odd token counts, while prediction files (DT) use even token counts with a trailing confidence. The --prediction flag is required for DT β€” it outputs a plain JSON list of annotation dicts (not a full COCO dict with images/categories). Mixed label/prediction files in the same directory are not supported.

ℹ️ Note: The --prediction flag is only available for yolo2coco. labelme2coco does not support prediction conversion β€” LabelMe files (.json) have no label vs prediction format distinction, so there is no equivalent prediction source format to convert from.

β‘‘ Detection vs Segmentation β€” Format Requirements
Field Detection GT Detection DT Segmentation GT Segmentation DT
bbox βœ… Required βœ… Required βœ… Required (for area) βœ… Required (for area)
score β€” βœ… Required β€” βœ… Required
segmentation ❌ Not required ❌ Not required βœ… Required βœ… Required
area βšͺ Recommended βšͺ Recommended βœ… Required βœ… Required
iscrowd βšͺ Optional β€” βšͺ Optional β€”
  • Object Detection (iouType='bbox'): Bounding box overlap evaluation. Only bbox + score mandatory in DT.
  • Instance Segmentation (iouType='segm'): Mask overlap evaluation. GT and DT must include segmentation (polygon or RLE), area, and bbox.
β‘’ CLI Commands
# Object detection evaluation (bbox IoU)
dataflow-cv evaluate detection anno.json pred.json

# Verbose per-class breakdown
dataflow-cv evaluate detection --verbose anno.json pred.json

# With P/R/F1 at IoU=0.5
dataflow-cv evaluate detection --prf1 --prf1-iou 0.5 anno.json pred.json

# Instance segmentation evaluation (mask IoU)
dataflow-cv evaluate segmentation anno.json pred.json

# Save results as JSON
dataflow-cv evaluate detection --output results.json anno.json pred.json
β‘£ End-to-End Workflow
# Complete pipeline: YOLO β†’ COCO β†’ Evaluation
dataflow-cv convert yolo2coco images/ yolo_labels/ classes.txt anno.json
dataflow-cv convert yolo2coco --prediction images/ yolo_preds/ classes.txt pred.json
dataflow-cv evaluate detection --verbose --prf1 anno.json pred.json

🐍 Python API

from dataflow.convert import YoloAndCocoConverter
from dataflow.visualize import YOLOVisualizer
from dataflow.evaluate import DetectionEvaluator, compute_pr_f1

# ── Convert ──────────────────────────────────────────
# YOLO labels β†’ COCO (label mode)
converter = YoloAndCocoConverter(source_to_target=True, verbose=True, strict_mode=True)
result = converter.convert(
    source_path="yolo_labels/", target_path="anno.json",
    class_file="classes.txt", image_dir="images/",
)

# YOLO predictions β†’ COCO (prediction mode)
converter = YoloAndCocoConverter(source_to_target=True, prediction=True)
result = converter.convert(
    source_path="yolo_preds/", target_path="pred.json",
    class_file="classes.txt", image_dir="images/",
)

# ── Visualize ────────────────────────────────────────
visualizer = YOLOVisualizer(
    label_dir="yolo_labels/", image_dir="images/",
    class_file="classes.txt", is_show=True, is_save=True,
    output_dir="visualized/", verbose=True, strict_mode=True,
)
result = visualizer.visualize()

# ── Evaluate ─────────────────────────────────────────
evaluator = DetectionEvaluator(verbose=True)
result = evaluator.evaluate("anno.json", "pred.json")
print(f"AP: {result.metrics.ap:.3f}, AP50: {result.metrics.ap50:.3f}")

# Quick P/R/F1 at IoU=0.5
prf1 = compute_pr_f1("anno.json", "pred.json", iou_threshold=0.5)
print(f"F1: {prf1.overall.f1_score:.3f}")

πŸ“‚ See the samples/ directory for complete examples: samples/visualize/ (YOLO, LabelMe, COCO demos), samples/convert/ (conversion examples).


πŸ“– Documentation

Resource Description
CLAUDE.md Architecture overview, development guide, and known gotchas
CHANGELOG.md Version history and breaking changes
specs/evaluate/ Evaluation metric contracts β€” IoU, matching, AP/mAP/AR
specs/formats/ External format contracts β€” YOLO, LabelMe, COCO, conversion rules
specs/modules/ Internal module architecture, interface contracts, dependency constraints

πŸ’‘ Key Concepts

  • Format-Native Coordinates: Coordinates stored in each format's native representation β€” YOLO normalized [0,1] center-based, LabelMe/COCO absolute pixels top-left. Check DatasetAnnotations.format to determine semantics.
  • Explicit Coordinate Transforms: Converters handle all coordinate transformations between formats β€” no hidden normalization.
  • Strict Mode: Validation errors raise exceptions by default. Disable with --no-strict (CLI) or strict_mode=False (API).
  • Verbose Logging: Detailed debug logs saved to files when --verbose is used. The CLI prints the log file path after each operation.
  • Headless Support: Use --no-display for servers/Docker; pair with --save to output visualization images without a window.
  • Keyboard Shortcuts: During visualization β€” q/ESC to exit, Enter/Space to advance, any other key to continue.
  • Color Management: Each class ID gets a unique color from an HSV-based palette (up to 1000 classes) for consistent visualization.
  • Evaluation Metrics: COCO-standard 12-metric output with optional per-class breakdown and P/R/F1 computation.
  • Prediction Files: YOLO prediction files use 6 tokens (detection) or even tokens (segmentation) vs 5/odd for labels. --prediction outputs a plain JSON list of annotation dicts β€” the standard prediction exchange format compatible with pycocotools loadRes().

πŸ”§ Development

For detailed developer guidance including advanced test commands, debugging, and architecture overview, see CLAUDE.md.

πŸ§ͺ Testing

370 tests, 75% code coverage (3912 statements).

pytest                                    # All tests
pytest --cov=dataflow --cov-report=term   # With coverage
pytest tests/convert/test_yolo_and_coco.py  # Single module
pytest tests/evaluate/test_evaluator.py     # Single module
πŸ“Š Coverage by module
Module Coverage Highlights
dataflow/label/ 68% models (87%), coco_handler (75%), labelme_handler (70%), yolo_handler (58%)
dataflow/convert/ 87% yolo_and_coco (90%), labelme_and_yolo (86%), coco_and_labelme (87%), rle (80%), base (83%), utils (92%)
dataflow/visualize/ 81% yolo_vis (100%), labelme_vis (100%), coco_vis (97%), base (74%)
dataflow/evaluate/ 88% evaluator (100%), metrics (96%), result (99%), base (91%), utils (69%)
dataflow/cli/ 59% main (96%), convert cmd (48%), evaluate cmd (24%), visualize cmd (84%), utils (86%)
dataflow/util/ 93% logging (98%), file_util (84%)

🎨 Code Quality

pip install -e .[dev]        # Install dev dependencies
black dataflow tests samples  # Format
isort dataflow tests samples  # Sort imports
mypy dataflow                 # Type check
flake8 dataflow tests samples # Lint

πŸ”— Pre-commit Hooks (Optional)

pip install pre-commit
pre-commit install            # Install git hooks (run once)

# After this, every `git commit` auto-runs:
#   black β†’ isort β†’ flake8 β†’ whitespace checks

pre-commit run --all-files    # Manual run against all files

πŸ“ Project Structure

dataflow/
β”œβ”€β”€ label/           # Annotation handlers + data models
β”œβ”€β”€ convert/         # Format converters + RLE utility
β”œβ”€β”€ visualize/       # OpenCV-based rendering
β”œβ”€β”€ evaluate/        # pycocotools-based metrics
β”œβ”€β”€ util/            # Logging & file utilities
└── cli/             # CLI entry point, commands, validation
tests/               # Unit & integration tests
samples/             # Python API usage examples
assets/              # Test data (det/seg by format)
specs/               # Canonical specifications (evaluate/ + formats/ + modules/)

🀝 Contributing

Contributions are welcome! Please review CLAUDE.md for architecture and development patterns before contributing.

  1. 🍴 Fork the repository
  2. 🌿 Create a feature branch
  3. ✏️ Make your changes
  4. πŸ§ͺ Add or update tests as needed
  5. βœ… Ensure code passes formatting and linting checks
  6. πŸ“¬ Submit a pull request

πŸ“„ License

This project is licensed under the MIT License β€” see LICENSE for details.


πŸ™ Acknowledgments

  • Thanks to the creators of YOLO, LabelMe, and COCO formats for establishing these annotation standards
  • Built with OpenCV, NumPy, Click, and pycocotools
  • Inspired by the need for seamless format conversion in multi-tool CV pipelines