feat: classification dataset support (v1.2) by ortizeg · Pull Request #2 · ortizeg/data-visor

ortizeg · 2026-02-16T21:41:24Z

Summary

Add class filter checkboxes to statistics tab for slicing data by label
First-class single-label classification dataset support with full feature parity to detection workflows
Classification JSONL parser with auto-detection, multi-split ingestion, and sentinel bbox pattern
Evaluation metrics: accuracy, macro/weighted F1, per-class P/R/F1, clickable confusion matrix
Error analysis categorizing images as correct, misclassified, or missing prediction
Confusion matrix polish with threshold filtering and overflow scroll for 43+ classes
Embedding scatter color modes: GT class, predicted class, correct/incorrect (Tableau 20 palette)
Most-confused class pairs summary and F1 bars with color-coded thresholds

Milestone v1.2 — 3 phases, 6 plans, 34 commits

Phase	Scope
15	Classification ingestion & display (parser, grid badges, modal editor, stats)
16	Classification evaluation (predictions, metrics, confusion matrix, error analysis)
17	Classification polish (threshold filtering, most-confused pairs, embedding colors)

Stats: 70 files changed, +8,035 / -3,883 lines

Test plan

🤖 Generated with Claude Code

Archive Deployment, Workflow & Competitive Parity milestone: - 7 phases (8-14), 20 plans, 26/26 requirements complete - Roadmap and requirements archived to .planning/milestones/ - PROJECT.md evolved with validated v1.1 requirements - ROADMAP.md collapsed v1.1 into details tag Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Allow users to exclude specific classes from statistics computation to analyze data slices by label. Collapsible checkbox panel with select-all/deselect-all controls filters class distribution chart and recomputes summary stats client-side. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Lift ClassFilter above sub-tab navigation so it's shared across all tabs. EvaluationPanel now receives excludedClasses and uses a new useFilteredEvaluation hook that: - Filters PR curves, per-class metrics, and confusion matrix rows/cols - Recomputes mAP as mean of filtered per-class AP values - Synthesizes a new "all" PR curve from included classes (COCO 101-pt) - Caches results in a Map keyed by serialized excluded set, so revisiting the same combination is O(1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tion, and model updates - Create ClassificationJSONLParser with sentinel bbox values (0.0) for classification annotations - Add dataset_type column migration to DuckDB schema (default 'detection') - Add dataset_type field to DatasetResponse Pydantic model - Update BaseParser.build_image_batches signature with image_dir parameter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…, and category update endpoint - Add classification JSONL layout detectors (D: split dirs, E: flat) to FolderScanner - Add GCS classification detection support - Dispatch to ClassificationJSONLParser in IngestionService based on format - Store dataset_type (classification/detection) on dataset INSERT - Thread format through ImportRequest -> ingest_splits_with_progress -> ingest_with_progress - Add dataset_type to GET /datasets and GET /datasets/{id} responses - Add PATCH /annotations/{id}/category endpoint for classification label editing - Make statistics gt_annotations classification-aware (distinct labeled images) - Show .jsonl files in browse endpoint Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>