Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
cbae356
chore: archive v1.1 milestone
ortizeg Feb 14, 2026
928cfb2
feat: add class filter checkboxes to statistics overview tab
ortizeg Feb 16, 2026
a468c62
feat: apply class filter to evaluation tab with cached recomputation
ortizeg Feb 16, 2026
aec9d6b
docs: start milestone v1.2 Classification Dataset Support
ortizeg Feb 18, 2026
7f9043d
docs: complete project research
ortizeg Feb 18, 2026
927afab
docs: define milestone v1.2 requirements
ortizeg Feb 18, 2026
0b196c1
docs: create milestone v1.2 roadmap (3 phases)
ortizeg Feb 19, 2026
8754346
docs(15): research phase domain
ortizeg Feb 19, 2026
6868104
docs(15): create phase plan
ortizeg Feb 19, 2026
9286e8e
fix(15): revise plans based on checker feedback
ortizeg Feb 19, 2026
5264e51
feat(15-01): add ClassificationJSONLParser, dataset_type schema migra…
ortizeg Feb 19, 2026
8af8a11
feat(15-01): add scanner detection, parser dispatch, dataset_type API…
ortizeg Feb 19, 2026
0f93248
docs(15-01): complete classification ingestion backend plan
ortizeg Feb 19, 2026
b96ce5e
feat(15-02): add dataset_type threading, grid class badges, and scan …
ortizeg Feb 19, 2026
e7ad776
feat(15-02): classification-aware modal, annotation list, and statistics
ortizeg Feb 19, 2026
bfac740
docs(15-02): complete classification frontend display plan
ortizeg Feb 19, 2026
f522fda
docs(phase-15): complete phase execution
ortizeg Feb 19, 2026
e148018
docs(16): research phase domain
ortizeg Feb 19, 2026
d6eef42
docs(16): create phase plan
ortizeg Feb 19, 2026
76bda2a
feat(16-01): add classification prediction parser and import endpoint
ortizeg Feb 19, 2026
9f38741
feat(16-01): add classification evaluation, confusion cell, and error…
ortizeg Feb 19, 2026
5d6c2ee
docs(16-01): complete classification evaluation backend plan
ortizeg Feb 19, 2026
7dca67e
feat(16-02): add classification types, prediction import format, and …
ortizeg Feb 19, 2026
5d1433e
feat(16-02): classification evaluation panel and error analysis panel
ortizeg Feb 19, 2026
1bce172
docs(16-02): complete classification evaluation frontend plan
ortizeg Feb 19, 2026
3bbeed2
fix(16-02): use misclassified key for classification error samples grid
ortizeg Feb 19, 2026
79ff323
docs(phase-16): complete phase execution
ortizeg Feb 19, 2026
a5f9460
docs(classification-polish): research phase domain
ortizeg Feb 19, 2026
2893b39
docs(17): create phase plan
ortizeg Feb 19, 2026
10a3230
feat(17-01): add threshold filtering and overflow scroll to confusion…
ortizeg Feb 19, 2026
4ff366a
feat(17-02): enrich coordinates endpoint with GT/pred labels
ortizeg Feb 19, 2026
660d287
feat(17-01): add most-confused pairs and F1 bars to classification eval
ortizeg Feb 19, 2026
1f4c858
feat(17-02): add color mode dropdown and categorical coloring to embe…
ortizeg Feb 19, 2026
af2607e
docs(17-01): complete confusion matrix polish plan
ortizeg Feb 19, 2026
fc0e2d1
docs(17-02): complete embedding color modes plan
ortizeg Feb 19, 2026
67a7a9c
docs(phase-17): complete phase execution — milestone v1.2 complete
ortizeg Feb 19, 2026
09da917
chore: archive v1.2 Classification Dataset Support milestone
ortizeg Feb 19, 2026
bc50e0a
feat: add delete predictions endpoint and persist stats selection state
ortizeg Feb 25, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions .planning/MILESTONES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,33 @@
# Project Milestones: DataVisor

## v1.1 Deployment, Workflow & Competitive Parity (Shipped: 2026-02-13)

**Delivered:** Production-ready Docker deployment, smart dataset ingestion UI, annotation editing, error triage workflows, interactive visualizations with grid filtering, keyboard shortcuts, and per-annotation TP/FP/FN classification.

**Phases completed:** 8-14 (20 plans total)

**Key accomplishments:**

- Production-ready Docker stack (Caddy + FastAPI + Next.js) with single-user auth, GCP deployment scripts, and comprehensive documentation
- Smart dataset ingestion wizard with auto-detection of COCO layouts (Roboflow/Standard/Flat) and multi-split support
- Annotation editing via react-konva canvas (move, resize, draw, delete bounding boxes) with DuckDB persistence
- Error triage workflow: per-sample tagging, per-annotation TP/FP/FN auto-classification via IoU matching, worst-images ranking, and highlight mode
- Interactive data discovery: clickable confusion matrix, near-duplicate detection, histogram filtering, and find-similar — all piping results to the grid
- Full keyboard navigation with 16 shortcuts across grid, modal, triage, and editing contexts

**Stats:**

- 171 files created/modified
- ~19,460 lines of code added (9,306 Python + 10,154 TypeScript)
- 7 phases, 20 plans, 97 commits
- 2 days (Feb 12-13, 2026)

**Git range:** `a83d6cf` → `1bed6cf`

**What's next:** Format expansion (YOLO/VOC), PR curves, per-class AP metrics

---

## v1.0 MVP (Shipped: 2026-02-12)

**Delivered:** A unified CV dataset introspection tool with visual browsing, annotation overlays, model comparison, embedding visualization, error analysis, and AI-powered pattern detection.
Expand Down Expand Up @@ -28,3 +56,32 @@
**What's next:** Interactive model evaluation dashboard (PR curves, confusion matrix, per-class AP metrics)

---

## v1.2 Classification Dataset Support (Shipped: 2026-02-19)

**Delivered:** First-class single-label classification dataset support with full feature parity to detection workflows — from JSONL ingestion through evaluation metrics to production-ready polish for high-cardinality datasets.

**Phases completed:** 15-17 (6 plans total)

**Key accomplishments:**

- Classification JSONL parser with auto-detection of dataset type, multi-split ingestion, and sentinel bbox pattern for unified schema
- Grid browsing with class label badges and detail modal with dropdown class editor (PATCH mutation)
- Classification evaluation: accuracy, macro/weighted F1, per-class precision/recall/F1, and clickable confusion matrix
- Error analysis categorizing each image as correct, misclassified, or missing prediction
- Confusion matrix polish with threshold filtering and overflow scroll for 43+ classes, most-confused pairs summary
- Embedding scatter color modes: GT class, predicted class, and correct/incorrect with Tableau 20 categorical palette

**Stats:**

- 61 files created/modified
- ~6,052 lines of code added
- 3 phases, 6 plans, 27 commits
- 1 day (Feb 18, 2026)

**Git range:** `5264e51` → `67a7a9c`

**What's next:** TBD — next milestone planning

---

124 changes: 55 additions & 69 deletions .planning/PROJECT.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,103 +2,89 @@

## What This Is

DataVisor is an open-source dataset introspection tool for computer vision — an alternative to Voxel51. It combines a high-performance visual browser with VLM-powered agentic workflows to automatically discover dataset blind spots (poor lighting, rare occlusions, label errors). Built as a personal tool for exploring 100K+ image datasets with COCO format annotations.
DataVisor is an open-source dataset introspection tool for computer vision — an alternative to Voxel51. It combines a high-performance visual browser with VLM-powered agentic workflows to automatically discover dataset blind spots (poor lighting, rare occlusions, label errors). Built as a personal tool for exploring 100K+ image datasets with COCO detection or JSONL classification annotations.

## Core Value

A single tool that replaces scattered one-off scripts: load any CV dataset, visually browse with annotation overlays, compare ground truth against predictions, cluster via embeddings, and surface mistakes — all in one workflow.

## Current State

**Shipped:** v1.2 (2026-02-19)
**Codebase:** ~38K LOC (16,256+ Python + 15,924+ TypeScript) across 17 phases
**Architecture:** FastAPI + DuckDB + Qdrant (backend), Next.js + Tailwind + deck.gl + Recharts (frontend), Pydantic AI (agents), Moondream2 (VLM)

## Requirements

### Validated

- ✓ Multi-format ingestion (COCO) with streaming parser architecture — v1.0
- ✓ DuckDB-backed metadata storage for fast analytical queries over 100K+ samples — v1.0
- ✓ Virtualized infinite-scroll grid view with overlaid bounding box annotations — v1.0
- ✓ Ground Truth vs Model Predictions comparison toggle (solid vs dashed lines) — v1.0
- ✓ Deterministic class-to-color hashing (same class = same color across sessions) — v1.0
- ✓ t-SNE embedding generation from images (DINOv2-base) — v1.0
- ✓ deck.gl-powered 2D embedding scatterplot with zoom, pan, and lasso selection — v1.0
- ✓ Lasso-to-grid filtering (select cluster points → filter grid to those images) — v1.0
- ✓ Hover thumbnails on embedding map points — v1.0
- ✓ Qdrant vector storage for embedding similarity search — v1.0
- ✓ Error categorization: Hard False Positives, Label Errors, False Negatives — v1.0
- ✓ Pydantic AI agent that monitors error distribution and recommends actions — v1.0
- ✓ Pattern detection (e.g., "90% of False Negatives occur in low-light images") — v1.0
- ✓ Import pre-computed predictions (JSON) — v1.0
- ✓ BasePlugin class for Python extensibility — v1.0
- ✓ Local disk and GCS image source support — v1.0
- ✓ Dynamic metadata filtering (sidebar filters on any metadata field) — v1.0
- ✓ VLM auto-tagging (Moondream2) for scene attribute tags — v1.0
- ✓ Search by filename and sort by metadata — v1.0
- ✓ Save and load filter configurations (saved views) — v1.0
- ✓ Add/remove tags (individual + bulk) — v1.0
- ✓ Sample detail modal with full-resolution image — v1.0
- ✓ Dataset statistics dashboard (class distribution, annotation counts) — v1.0
- Streaming COCO ingestion with ijson at 100K+ scale, local + GCS sources — v1.0
- DuckDB metadata storage with fast analytical queries — v1.0
- Virtualized grid with SVG annotation overlays, deterministic color hashing — v1.0
- GT vs Predictions comparison toggle — v1.0
- t-SNE embeddings with deck.gl scatter plot, lasso-to-grid filtering — v1.0
- Error categorization (TP/FP/FN/Label Error) + Qdrant similarity search — v1.0
- Pydantic AI agent for error patterns + Moondream2 VLM auto-tagging — v1.0
- Metadata filtering, search, saved views, bulk tagging — v1.0
- Docker 3-service stack with Caddy auth, GCP deployment scripts — v1.1
- Smart ingestion UI with auto-detection of COCO layouts and multi-split support — v1.1
- Annotation editing via react-konva (move, resize, draw, delete) — v1.1
- Error triage: sample tagging, per-annotation TP/FP/FN via IoU, worst-images ranking, highlight mode — v1.1
- Interactive discovery: confusion matrix, near-duplicates, histogram filtering, find-similar — v1.1
- Keyboard shortcuts: 16 shortcuts across grid, modal, triage, editing — v1.1
- Auto-detect dataset type (detection vs classification) from annotation format — v1.2
- JSONL classification ingestion with multi-split support — v1.2
- Grid browsing with class label badges for classification datasets — v1.2
- Classification prediction import and GT vs predicted comparison — v1.2
- Classification stats: accuracy, F1, per-class precision/recall, confusion matrix — v1.2
- Embedding color modes (GT class, predicted class, correct/incorrect) — v1.2
- Confusion matrix scaling to 43+ classes with threshold filtering — v1.2

### Active

- [ ] Dockerized deployment with single-user auth for secure cloud VM access
- [ ] GCP deployment script + local run script with setup instructions
- [ ] Smart dataset ingestion UI (point at folder → auto-detect train/val/test splits → import)
- [ ] Annotation editing in the UI (move, resize, delete bounding boxes — depth TBD)
- [ ] Error triage workflow (tag FP/TP/FN/mistake, highlight errors, dim non-errors)
- [ ] Smart "worst images" ranking (combined score: errors + confidence + uniqueness)
- [ ] Keyboard shortcuts for navigation
- [ ] Competitive feature parity with FiftyOne/Encord (gaps TBD after research)
(None — planning next milestone)

### Out of Scope

- Multi-user collaboration — personal tool, single-user auth only for VM security
- Video annotation support — image-only for now
- Training pipeline integration — DataVisor inspects data, doesn't train models
- Multi-user collaboration — personal tool, single-user auth only
- Video annotation support — image-only
- Training pipeline integration — DataVisor inspects data, doesn't train
- Mobile/tablet interface — desktop browser only
- Real-time streaming inference — batch-oriented analysis
- Full annotation editor (draw new boxes, complex labeling workflows) — quick corrections only, not CVAT replacement

## Current Milestone: v1.1 Deployment, Workflow & Competitive Parity

**Goal:** Make DataVisor deployable (Docker + GCP), secure for cloud access, and close key workflow gaps vs FiftyOne/Encord — smart ingestion, error triage, annotation corrections, and keyboard-driven navigation.

**Target features:**
- Dockerized project with single-user auth (basic auth for cloud VM security)
- GCP deployment script + local run script
- Smart dataset ingestion UI (auto-detect folder structure, train/val/test splits)
- Annotation management (organize + quick edit: move/resize/delete bboxes)
- Error triage & data curation workflow (tag, highlight, rank worst images)
- Keyboard shortcuts for navigation
- Competitive gaps from FiftyOne/Encord analysis

## Context

Shipped v1.0 with 12,720 LOC (6,950 Python + 5,770 TypeScript) across 7 phases and 21 plans.
Tech stack: FastAPI + DuckDB + Qdrant (backend), Next.js + Tailwind + deck.gl + Recharts (frontend), Pydantic AI (agents), Moondream2 (VLM).
59 backend tests passing. TypeScript compiles with 0 errors.
Architecture: 3 Zustand stores, FastAPI DI, source discriminator for GT/prediction separation, 4 SSE progress streams, lazy model loading.
- Full annotation editor (polygons, segmentation) — bounding box only
- Multi-label classification — single-label per image only for now

## Constraints

- **Tech Stack**: FastAPI + DuckDB + Qdrant (backend), Next.js + Tailwind + deck.gl (frontend), Pydantic AI (agents) — established
- **Performance**: Must handle 100K+ images without UI lag; DuckDB for metadata queries, deck.gl for WebGL rendering, virtualized scrolling
- **Storage**: Supports both local filesystem and GCS bucket sources
- **GPU**: VLM inference (Moondream2) supports MPS/CUDA/CPU auto-detection; DINOv2 embeddings likewise
- **GPU**: VLM inference (Moondream2) supports MPS/CUDA/CPU auto-detection; SigLIP embeddings likewise
- **Extensibility**: BasePlugin architecture exists; hooks system ready for expansion
- **Python**: 3.14+ (numba/umap-learn incompatible; using scikit-learn t-SNE)

## Key Decisions

| Decision | Rationale | Outcome |
|----------|-----------|---------|
| DuckDB over SQLite | Analytical queries on metadata at scale; columnar storage for filtering 100K+ rows | ✓ Good |
| Qdrant over FAISS | Payload filtering support; Rust-based performance; local deployment | ✓ Good |
| deck.gl for embedding viz | WebGL-powered; handles millions of points; lasso/interaction built-in | ✓ Good |
| Pydantic AI for agents | Type-safe agent definitions; native FastAPI/Pydantic integration | ✓ Good |
| Deterministic color hashing | Class names hash to consistent colors across sessions; no manual palette | ✓ Good |
| Plugin hooks over monolith | Ingestion/UI/transformation hooks enable domain-specific extensions without forking | ✓ Good |
| Source discriminator column | Clean GT/prediction separation in annotations table via source field | ✓ Good |
| Lazy model loading | VLM and Qdrant loaded on-demand, not at startup, to avoid memory pressure | ✓ Good |
| t-SNE over UMAP | umap-learn blocked by Python 3.14 numba incompatibility; t-SNE via scikit-learn | ⚠️ Revisit when numba supports 3.14 |
| Moondream2 via transformers | trust_remote_code with all_tied_weights_keys patch for transformers 5.x compat | ✓ Good (fragile — monitor updates) |
| DuckDB over SQLite | Analytical queries on metadata at scale; columnar storage for filtering 100K+ rows | Good |
| Qdrant over FAISS | Payload filtering support; Rust-based performance; local deployment | Good |
| deck.gl for embedding viz | WebGL-powered; handles millions of points; lasso/interaction built-in | Good |
| Pydantic AI for agents | Type-safe agent definitions; native FastAPI/Pydantic integration | Good |
| Deterministic color hashing | Class names hash to consistent colors across sessions; no manual palette | Good |
| Source discriminator column | Clean GT/prediction separation in annotations table via source field | Good |
| Caddy over nginx | Auto-HTTPS, built-in basic_auth, simpler config | Good |
| react-konva for editing | Canvas-based editing in modal; SVG stays for grid overlays | Good |
| Gemini 2.0 Flash for agent | Fast, cheap, good structured output; replaced GPT-4o | Good |
| Pre-computed agent prompt | All data in prompt, no tool calls; avoids Pydantic AI request_limit issues | Good |
| t-SNE over UMAP | umap-learn blocked by Python 3.14 numba incompatibility | Revisit when numba supports 3.14 |
| Moondream2 via transformers | trust_remote_code with all_tied_weights_keys patch for transformers 5.x | Fragile — monitor updates |
| Sentinel bbox values (0.0) for classification | Avoids 30+ null guards; unified schema for detection and classification | Good |
| Separate classification evaluation service | ~50-line function vs modifying 560-line detection eval; clean separation | Good |
| Dataset-type routing at endpoint level | Keep classification/detection services separate; route in router layer | Good |
| Parser registry in IngestionService | Format-based dispatch to COCOParser or ClassificationJSONLParser | Good |
| Threshold slider for confusion matrix | Hide noisy off-diagonal cells at high cardinality (0-50%, default 1%) | Good |
| Client-side most-confused pairs | Derived from confusion matrix data; no new API endpoint needed | Good |
| Tableau 20 palette for embeddings | Stable categorical coloring for class-based scatter modes | Good |

---
*Last updated: 2026-02-12 after v1.1 scope redefinition*
*Last updated: 2026-02-19 after v1.2 milestone*
Loading