diff --git a/.planning/MILESTONES.md b/.planning/MILESTONES.md
index 23c5086..e3db4ca 100644
--- a/.planning/MILESTONES.md
+++ b/.planning/MILESTONES.md
@@ -1,5 +1,33 @@
# Project Milestones: DataVisor
+## v1.1 Deployment, Workflow & Competitive Parity (Shipped: 2026-02-13)
+
+**Delivered:** Production-ready Docker deployment, smart dataset ingestion UI, annotation editing, error triage workflows, interactive visualizations with grid filtering, keyboard shortcuts, and per-annotation TP/FP/FN classification.
+
+**Phases completed:** 8-14 (20 plans total)
+
+**Key accomplishments:**
+
+- Production-ready Docker stack (Caddy + FastAPI + Next.js) with single-user auth, GCP deployment scripts, and comprehensive documentation
+- Smart dataset ingestion wizard with auto-detection of COCO layouts (Roboflow/Standard/Flat) and multi-split support
+- Annotation editing via react-konva canvas (move, resize, draw, delete bounding boxes) with DuckDB persistence
+- Error triage workflow: per-sample tagging, per-annotation TP/FP/FN auto-classification via IoU matching, worst-images ranking, and highlight mode
+- Interactive data discovery: clickable confusion matrix, near-duplicate detection, histogram filtering, and find-similar — all piping results to the grid
+- Full keyboard navigation with 16 shortcuts across grid, modal, triage, and editing contexts
+
+**Stats:**
+
+- 171 files created/modified
+- ~19,460 lines of code added (9,306 Python + 10,154 TypeScript)
+- 7 phases, 20 plans, 97 commits
+- 2 days (Feb 12-13, 2026)
+
+**Git range:** `a83d6cf` → `1bed6cf`
+
+**What's next:** Format expansion (YOLO/VOC), PR curves, per-class AP metrics
+
+---
+
## v1.0 MVP (Shipped: 2026-02-12)
**Delivered:** A unified CV dataset introspection tool with visual browsing, annotation overlays, model comparison, embedding visualization, error analysis, and AI-powered pattern detection.
@@ -28,3 +56,32 @@
**What's next:** Interactive model evaluation dashboard (PR curves, confusion matrix, per-class AP metrics)
---
+
+## v1.2 Classification Dataset Support (Shipped: 2026-02-19)
+
+**Delivered:** First-class single-label classification dataset support with full feature parity to detection workflows — from JSONL ingestion through evaluation metrics to production-ready polish for high-cardinality datasets.
+
+**Phases completed:** 15-17 (6 plans total)
+
+**Key accomplishments:**
+
+- Classification JSONL parser with auto-detection of dataset type, multi-split ingestion, and sentinel bbox pattern for unified schema
+- Grid browsing with class label badges and detail modal with dropdown class editor (PATCH mutation)
+- Classification evaluation: accuracy, macro/weighted F1, per-class precision/recall/F1, and clickable confusion matrix
+- Error analysis categorizing each image as correct, misclassified, or missing prediction
+- Confusion matrix polish with threshold filtering and overflow scroll for 43+ classes, most-confused pairs summary
+- Embedding scatter color modes: GT class, predicted class, and correct/incorrect with Tableau 20 categorical palette
+
+**Stats:**
+
+- 61 files created/modified
+- ~6,052 lines of code added
+- 3 phases, 6 plans, 27 commits
+- 1 day (Feb 18, 2026)
+
+**Git range:** `5264e51` → `67a7a9c`
+
+**What's next:** TBD — next milestone planning
+
+---
+
diff --git a/.planning/PROJECT.md b/.planning/PROJECT.md
index 61ab957..e62013d 100644
--- a/.planning/PROJECT.md
+++ b/.planning/PROJECT.md
@@ -2,86 +2,63 @@
## What This Is
-DataVisor is an open-source dataset introspection tool for computer vision — an alternative to Voxel51. It combines a high-performance visual browser with VLM-powered agentic workflows to automatically discover dataset blind spots (poor lighting, rare occlusions, label errors). Built as a personal tool for exploring 100K+ image datasets with COCO format annotations.
+DataVisor is an open-source dataset introspection tool for computer vision — an alternative to Voxel51. It combines a high-performance visual browser with VLM-powered agentic workflows to automatically discover dataset blind spots (poor lighting, rare occlusions, label errors). Built as a personal tool for exploring 100K+ image datasets with COCO detection or JSONL classification annotations.
## Core Value
A single tool that replaces scattered one-off scripts: load any CV dataset, visually browse with annotation overlays, compare ground truth against predictions, cluster via embeddings, and surface mistakes — all in one workflow.
+## Current State
+
+**Shipped:** v1.2 (2026-02-19)
+**Codebase:** ~38K LOC (16,256+ Python + 15,924+ TypeScript) across 17 phases
+**Architecture:** FastAPI + DuckDB + Qdrant (backend), Next.js + Tailwind + deck.gl + Recharts (frontend), Pydantic AI (agents), Moondream2 (VLM)
+
## Requirements
### Validated
-- ✓ Multi-format ingestion (COCO) with streaming parser architecture — v1.0
-- ✓ DuckDB-backed metadata storage for fast analytical queries over 100K+ samples — v1.0
-- ✓ Virtualized infinite-scroll grid view with overlaid bounding box annotations — v1.0
-- ✓ Ground Truth vs Model Predictions comparison toggle (solid vs dashed lines) — v1.0
-- ✓ Deterministic class-to-color hashing (same class = same color across sessions) — v1.0
-- ✓ t-SNE embedding generation from images (DINOv2-base) — v1.0
-- ✓ deck.gl-powered 2D embedding scatterplot with zoom, pan, and lasso selection — v1.0
-- ✓ Lasso-to-grid filtering (select cluster points → filter grid to those images) — v1.0
-- ✓ Hover thumbnails on embedding map points — v1.0
-- ✓ Qdrant vector storage for embedding similarity search — v1.0
-- ✓ Error categorization: Hard False Positives, Label Errors, False Negatives — v1.0
-- ✓ Pydantic AI agent that monitors error distribution and recommends actions — v1.0
-- ✓ Pattern detection (e.g., "90% of False Negatives occur in low-light images") — v1.0
-- ✓ Import pre-computed predictions (JSON) — v1.0
-- ✓ BasePlugin class for Python extensibility — v1.0
-- ✓ Local disk and GCS image source support — v1.0
-- ✓ Dynamic metadata filtering (sidebar filters on any metadata field) — v1.0
-- ✓ VLM auto-tagging (Moondream2) for scene attribute tags — v1.0
-- ✓ Search by filename and sort by metadata — v1.0
-- ✓ Save and load filter configurations (saved views) — v1.0
-- ✓ Add/remove tags (individual + bulk) — v1.0
-- ✓ Sample detail modal with full-resolution image — v1.0
-- ✓ Dataset statistics dashboard (class distribution, annotation counts) — v1.0
+- Streaming COCO ingestion with ijson at 100K+ scale, local + GCS sources — v1.0
+- DuckDB metadata storage with fast analytical queries — v1.0
+- Virtualized grid with SVG annotation overlays, deterministic color hashing — v1.0
+- GT vs Predictions comparison toggle — v1.0
+- t-SNE embeddings with deck.gl scatter plot, lasso-to-grid filtering — v1.0
+- Error categorization (TP/FP/FN/Label Error) + Qdrant similarity search — v1.0
+- Pydantic AI agent for error patterns + Moondream2 VLM auto-tagging — v1.0
+- Metadata filtering, search, saved views, bulk tagging — v1.0
+- Docker 3-service stack with Caddy auth, GCP deployment scripts — v1.1
+- Smart ingestion UI with auto-detection of COCO layouts and multi-split support — v1.1
+- Annotation editing via react-konva (move, resize, draw, delete) — v1.1
+- Error triage: sample tagging, per-annotation TP/FP/FN via IoU, worst-images ranking, highlight mode — v1.1
+- Interactive discovery: confusion matrix, near-duplicates, histogram filtering, find-similar — v1.1
+- Keyboard shortcuts: 16 shortcuts across grid, modal, triage, editing — v1.1
+- Auto-detect dataset type (detection vs classification) from annotation format — v1.2
+- JSONL classification ingestion with multi-split support — v1.2
+- Grid browsing with class label badges for classification datasets — v1.2
+- Classification prediction import and GT vs predicted comparison — v1.2
+- Classification stats: accuracy, F1, per-class precision/recall, confusion matrix — v1.2
+- Embedding color modes (GT class, predicted class, correct/incorrect) — v1.2
+- Confusion matrix scaling to 43+ classes with threshold filtering — v1.2
### Active
-- [ ] Dockerized deployment with single-user auth for secure cloud VM access
-- [ ] GCP deployment script + local run script with setup instructions
-- [ ] Smart dataset ingestion UI (point at folder → auto-detect train/val/test splits → import)
-- [ ] Annotation editing in the UI (move, resize, delete bounding boxes — depth TBD)
-- [ ] Error triage workflow (tag FP/TP/FN/mistake, highlight errors, dim non-errors)
-- [ ] Smart "worst images" ranking (combined score: errors + confidence + uniqueness)
-- [ ] Keyboard shortcuts for navigation
-- [ ] Competitive feature parity with FiftyOne/Encord (gaps TBD after research)
+(None — planning next milestone)
### Out of Scope
-- Multi-user collaboration — personal tool, single-user auth only for VM security
-- Video annotation support — image-only for now
-- Training pipeline integration — DataVisor inspects data, doesn't train models
+- Multi-user collaboration — personal tool, single-user auth only
+- Video annotation support — image-only
+- Training pipeline integration — DataVisor inspects data, doesn't train
- Mobile/tablet interface — desktop browser only
-- Real-time streaming inference — batch-oriented analysis
-- Full annotation editor (draw new boxes, complex labeling workflows) — quick corrections only, not CVAT replacement
-
-## Current Milestone: v1.1 Deployment, Workflow & Competitive Parity
-
-**Goal:** Make DataVisor deployable (Docker + GCP), secure for cloud access, and close key workflow gaps vs FiftyOne/Encord — smart ingestion, error triage, annotation corrections, and keyboard-driven navigation.
-
-**Target features:**
-- Dockerized project with single-user auth (basic auth for cloud VM security)
-- GCP deployment script + local run script
-- Smart dataset ingestion UI (auto-detect folder structure, train/val/test splits)
-- Annotation management (organize + quick edit: move/resize/delete bboxes)
-- Error triage & data curation workflow (tag, highlight, rank worst images)
-- Keyboard shortcuts for navigation
-- Competitive gaps from FiftyOne/Encord analysis
-
-## Context
-
-Shipped v1.0 with 12,720 LOC (6,950 Python + 5,770 TypeScript) across 7 phases and 21 plans.
-Tech stack: FastAPI + DuckDB + Qdrant (backend), Next.js + Tailwind + deck.gl + Recharts (frontend), Pydantic AI (agents), Moondream2 (VLM).
-59 backend tests passing. TypeScript compiles with 0 errors.
-Architecture: 3 Zustand stores, FastAPI DI, source discriminator for GT/prediction separation, 4 SSE progress streams, lazy model loading.
+- Full annotation editor (polygons, segmentation) — bounding box only
+- Multi-label classification — single-label per image only for now
## Constraints
- **Tech Stack**: FastAPI + DuckDB + Qdrant (backend), Next.js + Tailwind + deck.gl (frontend), Pydantic AI (agents) — established
- **Performance**: Must handle 100K+ images without UI lag; DuckDB for metadata queries, deck.gl for WebGL rendering, virtualized scrolling
- **Storage**: Supports both local filesystem and GCS bucket sources
-- **GPU**: VLM inference (Moondream2) supports MPS/CUDA/CPU auto-detection; DINOv2 embeddings likewise
+- **GPU**: VLM inference (Moondream2) supports MPS/CUDA/CPU auto-detection; SigLIP embeddings likewise
- **Extensibility**: BasePlugin architecture exists; hooks system ready for expansion
- **Python**: 3.14+ (numba/umap-learn incompatible; using scikit-learn t-SNE)
@@ -89,16 +66,25 @@ Architecture: 3 Zustand stores, FastAPI DI, source discriminator for GT/predicti
| Decision | Rationale | Outcome |
|----------|-----------|---------|
-| DuckDB over SQLite | Analytical queries on metadata at scale; columnar storage for filtering 100K+ rows | ✓ Good |
-| Qdrant over FAISS | Payload filtering support; Rust-based performance; local deployment | ✓ Good |
-| deck.gl for embedding viz | WebGL-powered; handles millions of points; lasso/interaction built-in | ✓ Good |
-| Pydantic AI for agents | Type-safe agent definitions; native FastAPI/Pydantic integration | ✓ Good |
-| Deterministic color hashing | Class names hash to consistent colors across sessions; no manual palette | ✓ Good |
-| Plugin hooks over monolith | Ingestion/UI/transformation hooks enable domain-specific extensions without forking | ✓ Good |
-| Source discriminator column | Clean GT/prediction separation in annotations table via source field | ✓ Good |
-| Lazy model loading | VLM and Qdrant loaded on-demand, not at startup, to avoid memory pressure | ✓ Good |
-| t-SNE over UMAP | umap-learn blocked by Python 3.14 numba incompatibility; t-SNE via scikit-learn | ⚠️ Revisit when numba supports 3.14 |
-| Moondream2 via transformers | trust_remote_code with all_tied_weights_keys patch for transformers 5.x compat | ✓ Good (fragile — monitor updates) |
+| DuckDB over SQLite | Analytical queries on metadata at scale; columnar storage for filtering 100K+ rows | Good |
+| Qdrant over FAISS | Payload filtering support; Rust-based performance; local deployment | Good |
+| deck.gl for embedding viz | WebGL-powered; handles millions of points; lasso/interaction built-in | Good |
+| Pydantic AI for agents | Type-safe agent definitions; native FastAPI/Pydantic integration | Good |
+| Deterministic color hashing | Class names hash to consistent colors across sessions; no manual palette | Good |
+| Source discriminator column | Clean GT/prediction separation in annotations table via source field | Good |
+| Caddy over nginx | Auto-HTTPS, built-in basic_auth, simpler config | Good |
+| react-konva for editing | Canvas-based editing in modal; SVG stays for grid overlays | Good |
+| Gemini 2.0 Flash for agent | Fast, cheap, good structured output; replaced GPT-4o | Good |
+| Pre-computed agent prompt | All data in prompt, no tool calls; avoids Pydantic AI request_limit issues | Good |
+| t-SNE over UMAP | umap-learn blocked by Python 3.14 numba incompatibility | Revisit when numba supports 3.14 |
+| Moondream2 via transformers | trust_remote_code with all_tied_weights_keys patch for transformers 5.x | Fragile — monitor updates |
+| Sentinel bbox values (0.0) for classification | Avoids 30+ null guards; unified schema for detection and classification | Good |
+| Separate classification evaluation service | ~50-line function vs modifying 560-line detection eval; clean separation | Good |
+| Dataset-type routing at endpoint level | Keep classification/detection services separate; route in router layer | Good |
+| Parser registry in IngestionService | Format-based dispatch to COCOParser or ClassificationJSONLParser | Good |
+| Threshold slider for confusion matrix | Hide noisy off-diagonal cells at high cardinality (0-50%, default 1%) | Good |
+| Client-side most-confused pairs | Derived from confusion matrix data; no new API endpoint needed | Good |
+| Tableau 20 palette for embeddings | Stable categorical coloring for class-based scatter modes | Good |
---
-*Last updated: 2026-02-12 after v1.1 scope redefinition*
+*Last updated: 2026-02-19 after v1.2 milestone*
diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md
index 6073dd0..36db1ce 100644
--- a/.planning/ROADMAP.md
+++ b/.planning/ROADMAP.md
@@ -2,8 +2,9 @@
## Milestones
-- v1.0 MVP - Phases 1-7 (shipped 2026-02-12)
-- **v1.1 Deployment, Workflow & Competitive Parity** - Phases 8-14
+- v1.0 MVP - Phases 1-7 (shipped 2026-02-12) — [archive](.planning/milestones/v1.0-ROADMAP.md)
+- v1.1 Deployment, Workflow & Competitive Parity - Phases 8-14 (shipped 2026-02-13) — [archive](.planning/milestones/v1.1-ROADMAP.md)
+- v1.2 Classification Dataset Support - Phases 15-17 (shipped 2026-02-19) — [archive](.planning/milestones/v1.2-ROADMAP.md)
## Phases
@@ -40,143 +41,57 @@
-### v1.1 Deployment, Workflow & Competitive Parity
-
-**Milestone Goal:** Make DataVisor deployable (Docker + GCP), secure for cloud access, and close key workflow gaps vs FiftyOne/Encord -- smart ingestion, annotation editing, error triage, interactive visualizations, and keyboard-driven navigation.
-
-**Phase Numbering:**
-- Integer phases (8, 9, 10, ...): Planned milestone work
-- Decimal phases (9.1, 9.2): Urgent insertions (marked with INSERTED)
-
-Decimal phases appear between their surrounding integers in numeric order.
-
-- [x] **Phase 8: Docker Deployment & Auth** - Dockerized 3-service stack with Caddy reverse proxy, basic auth, and deployment scripts
-- [x] **Phase 9: Smart Ingestion** - No-code dataset import from folder path with auto-detection and confirmation
-- [x] **Phase 10: Annotation Editing** - Move, resize, delete, and draw bounding boxes via react-konva in sample detail modal
-- [x] **Phase 11: Error Triage** - Tag errors, highlight mode, and worst-images ranking with DuckDB persistence
-- [x] **Phase 12: Interactive Viz & Discovery** - Confusion matrix, near-duplicates, interactive histograms, and find-similar
-- [x] **Phase 13: Keyboard Shortcuts** - Keyboard navigation, triage hotkeys, edit shortcuts, and help overlay
-- [x] **Phase 14: Per-Annotation Triage** - Auto-discover TP/FP/FN per bounding box via IoU overlap, color-coded boxes in detail modal, click to override classifications
-
-## Phase Details
+
+v1.1 Deployment, Workflow & Competitive Parity (Phases 8-14) - SHIPPED 2026-02-13
### Phase 8: Docker Deployment & Auth
-**Goal**: DataVisor runs as a deployable Docker stack with single-user auth, accessible securely on a cloud VM or locally with a single command
-**Depends on**: Phase 7 (v1.0 complete)
-**Requirements**: DEPLOY-01, DEPLOY-02, DEPLOY-03, DEPLOY-04, DEPLOY-05
-**Success Criteria** (what must be TRUE):
- 1. User can run `docker compose up` and access DataVisor at `http://localhost` with all features working (grid, embeddings, error analysis)
- 2. User is prompted for username/password before accessing any page or API endpoint, and unauthenticated requests are rejected
- 3. User can run a deployment script that provisions a GCP VM with persistent disk and starts DataVisor accessible at a public IP with HTTPS
- 4. User can follow deployment documentation to configure environment variables, deploy to GCP, and set up a custom domain
- 5. DuckDB data, Qdrant vectors, and thumbnail cache persist across container restarts without data loss
-**Plans**: 5 plans
-
-Plans:
-- [x] 08-01-PLAN.md -- Backend Dockerfile + config fixes (CORS, DuckDB CHECKPOINT)
-- [x] 08-02-PLAN.md -- Frontend Dockerfile + Caddyfile reverse proxy with auth
-- [x] 08-03-PLAN.md -- Docker Compose orchestration + .dockerignore + env config
-- [x] 08-04-PLAN.md -- Local run script + GCP deployment scripts
-- [x] 08-05-PLAN.md -- Deployment documentation + full stack verification
+**Goal**: Deployable Docker stack with single-user auth, accessible on cloud VM or locally
+**Plans**: 5 plans (complete)
### Phase 9: Smart Ingestion
-**Goal**: Users can import datasets from the UI by pointing at a folder, reviewing auto-detected structure, and confirming import -- no CLI or config files needed
-**Depends on**: Phase 8 (auth protects new endpoints)
-**Requirements**: INGEST-01, INGEST-02, INGEST-03, INGEST-04, INGEST-05
-**Success Criteria** (what must be TRUE):
- 1. User can enter a folder path in the UI and trigger a scan that returns detected dataset structure
- 2. Scanner correctly identifies COCO annotation files and image directories within the folder
- 3. Scanner detects train/val/test split subdirectories and presents them as separate importable splits
- 4. User sees the detected structure as a confirmation step and can approve or adjust before import begins
- 5. Import progress displays per-split status via real-time SSE updates until completion
-**Plans**: 2 plans
-
-Plans:
-- [x] 09-01-PLAN.md -- Backend FolderScanner service, scan/import API endpoints, split-aware ingestion pipeline
-- [x] 09-02-PLAN.md -- Frontend ingestion wizard (path input, scan results, import progress) + landing page link
+**Goal**: No-code dataset import from folder path with auto-detection and confirmation
+**Plans**: 2 plans (complete)
### Phase 10: Annotation Editing
-**Goal**: Users can make quick bounding box corrections directly in the sample detail modal without leaving DataVisor
-**Depends on**: Phase 8 (auth protects mutation endpoints)
-**Requirements**: ANNOT-01, ANNOT-02, ANNOT-03, ANNOT-04, ANNOT-05
-**Success Criteria** (what must be TRUE):
- 1. User can enter edit mode in the sample detail modal and drag a bounding box to a new position
- 2. User can grab resize handles on a bounding box and change its dimensions
- 3. User can delete a bounding box and the deletion persists after closing the modal
- 4. User can draw a new bounding box and assign it a class label
- 5. Only ground truth annotations show edit controls; prediction annotations remain read-only and non-interactive
-**Plans**: 3 plans
-
-Plans:
-- [x] 10-01-PLAN.md -- Backend annotation CRUD endpoints + frontend mutation hooks and types
-- [x] 10-02-PLAN.md -- Konva building blocks: coord-utils, EditableRect, DrawLayer, ClassPicker
-- [x] 10-03-PLAN.md -- AnnotationEditor composition, sample modal integration, annotation list delete
+**Goal**: Move, resize, delete, and draw bounding boxes via react-konva in sample detail modal
+**Plans**: 3 plans (complete)
### Phase 11: Error Triage
-**Goal**: Users can systematically review and tag errors with a focused triage workflow that persists decisions and surfaces the worst samples first
-**Depends on**: Phase 8 (extends v1.0 error analysis)
-**Requirements**: TRIAGE-01, TRIAGE-02, TRIAGE-03
-**Success Criteria** (what must be TRUE):
- 1. User can tag any sample or annotation as FP, TP, FN, or mistake, and the tag persists across page refreshes
- 2. User can activate highlight mode to dim non-error samples in the grid, making errors visually prominent
- 3. User can view a "worst images" ranking that surfaces samples with the highest combined error score (error count + confidence spread + uniqueness)
-**Plans**: 2 plans
-
-Plans:
-- [x] 11-01-PLAN.md -- Backend triage endpoints (set-triage-tag, worst-images scoring) + frontend hooks and types
-- [x] 11-02-PLAN.md -- Triage tag buttons in detail modal, highlight mode grid dimming, worst-images stats panel
+**Goal**: Tag errors, highlight mode, and worst-images ranking with DuckDB persistence
+**Plans**: 2 plans (complete)
### Phase 12: Interactive Viz & Discovery
-**Goal**: Users can explore dataset quality interactively -- clicking visualization elements filters the grid, finding similar samples and near-duplicates is one click away
-**Depends on**: Phase 11 (triage data informs confusion matrix), Phase 8 (auth protects endpoints)
-**Requirements**: ANNOT-06, TRIAGE-04, TRIAGE-05, TRIAGE-06
-**Success Criteria** (what must be TRUE):
- 1. User can click "Find Similar" on any sample to see nearest neighbors from Qdrant displayed in the grid
- 2. User can view a confusion matrix and click any cell to filter the grid to samples matching that GT/prediction pair
- 3. User can trigger near-duplicate detection and browse groups of visually similar images
- 4. User can click a bar in any statistics dashboard histogram to filter the grid to samples in that bucket
-**Plans**: 3 plans
-
-Plans:
-- [x] 12-01-PLAN.md -- Discovery filter foundation + Find Similar grid filtering + interactive histogram bars
-- [x] 12-02-PLAN.md -- Clickable confusion matrix cells with backend sample ID resolution
-- [x] 12-03-PLAN.md -- Near-duplicate detection via Qdrant pairwise search with SSE progress
+**Goal**: Confusion matrix, near-duplicates, interactive histograms, and find-similar
+**Plans**: 3 plans (complete)
### Phase 13: Keyboard Shortcuts
-**Goal**: Power users can navigate, triage, and edit entirely from the keyboard without reaching for the mouse
-**Depends on**: Phase 10 (annotation edit shortcuts), Phase 11 (triage shortcuts), Phase 12 (all UI features exist)
-**Requirements**: UX-01, UX-02, UX-03, UX-04
-**Success Criteria** (what must be TRUE):
- 1. User can navigate between samples in the grid and modal using arrow keys, j/k, Enter, and Escape
- 2. User can quick-tag errors during triage using number keys and toggle highlight mode with h
- 3. User can delete annotations and undo edits with keyboard shortcuts while in annotation edit mode
- 4. User can press ? to open a shortcut help overlay listing all available keyboard shortcuts
-**Plans**: 2 plans
-
-Plans:
-- [x] 13-01-PLAN.md -- Foundation (react-hotkeys-hook, shortcut registry, ui-store) + grid keyboard navigation
-- [x] 13-02-PLAN.md -- Modal shortcuts (navigation, triage, editing, undo) + help overlay
+**Goal**: Keyboard navigation, triage hotkeys, edit shortcuts, and help overlay
+**Plans**: 2 plans (complete)
### Phase 14: Per-Annotation Triage
-**Goal**: Users can see auto-discovered TP/FP/FN classifications per bounding box based on IoU overlap, with color-coded visualization in the detail modal and the ability to click individual annotations to override their classification
-**Depends on**: Phase 11 (extends triage system), Phase 6 (error analysis IoU matching)
-**Success Criteria** (what must be TRUE):
- 1. User opens a sample with GT and predictions and sees each bounding box color-coded as TP (green), FP (red), or FN (orange) based on automatic IoU matching
- 2. User can click an individual bounding box to override its auto-assigned classification (e.g. mark an auto-TP as a mistake)
- 3. Per-annotation triage decisions persist across page refreshes and are stored in DuckDB
- 4. Highlight mode dims samples that have no triage annotations, making triaged samples visually prominent
-**Plans**: 3 plans
-
-Plans:
-- [x] 14-01-PLAN.md -- Backend schema, IoU matching service, and annotation triage API endpoints
-- [x] 14-02-PLAN.md -- Frontend types, hooks, and clickable TriageOverlay SVG component
-- [x] 14-03-PLAN.md -- Wire TriageOverlay into sample modal + highlight mode integration
+**Goal**: Auto-discover TP/FP/FN per bounding box via IoU overlap, color-coded boxes in detail modal, click to override classifications
+**Plans**: 3 plans (complete)
-## Progress
+
+
+
+v1.2 Classification Dataset Support (Phases 15-17) - SHIPPED 2026-02-19
+
+### Phase 15: Classification Ingestion & Display
+**Goal**: Users can import, browse, and inspect classification datasets with the same ease as detection datasets
+**Plans**: 2 plans (complete)
+
+### Phase 16: Classification Evaluation
+**Goal**: Users can import predictions and analyze classification model performance with accuracy, F1, confusion matrix, and error categorization
+**Plans**: 2 plans (complete)
+
+### Phase 17: Classification Polish
+**Goal**: Classification workflows are production-ready for high-cardinality datasets (43+ classes) with visual aids that surface actionable insights
+**Plans**: 2 plans (complete)
-**Execution Order:**
-Phases execute in numeric order: 8 -> 9 -> 10 -> 11 -> 12 -> 13 -> 14
-(Note: Phases 9, 10, 11 are independent after Phase 8. Execution is sequential but no inter-dependency exists between 9/10/11.)
+
+
+## Progress
| Phase | Milestone | Plans Complete | Status | Completed |
|-------|-----------|----------------|--------|-----------|
@@ -194,3 +109,6 @@ Phases execute in numeric order: 8 -> 9 -> 10 -> 11 -> 12 -> 13 -> 14
| 12. Interactive Viz & Discovery | v1.1 | 3/3 | Complete | 2026-02-13 |
| 13. Keyboard Shortcuts | v1.1 | 2/2 | Complete | 2026-02-13 |
| 14. Per-Annotation Triage | v1.1 | 3/3 | Complete | 2026-02-13 |
+| 15. Classification Ingestion & Display | v1.2 | 2/2 | Complete | 2026-02-18 |
+| 16. Classification Evaluation | v1.2 | 2/2 | Complete | 2026-02-18 |
+| 17. Classification Polish | v1.2 | 2/2 | Complete | 2026-02-18 |
diff --git a/.planning/STATE.md b/.planning/STATE.md
index 1526a0a..ab9e078 100644
--- a/.planning/STATE.md
+++ b/.planning/STATE.md
@@ -2,19 +2,18 @@
## Project Reference
-See: .planning/PROJECT.md (updated 2026-02-12)
+See: .planning/PROJECT.md (updated 2026-02-19)
**Core value:** A single tool that replaces scattered scripts: load any CV dataset, visually browse with annotation overlays, compare GT vs predictions, cluster via embeddings, and surface mistakes -- all in one workflow.
-**Current focus:** v1.1 complete. All 14 phases delivered.
+**Current focus:** Planning next milestone
## Current Position
-Phase: 14 of 14 (Per-Annotation Triage)
-Plan: 3 of 3 in current phase
-Status: Complete
-Last activity: 2026-02-13 -- Phase 14 verified and complete
+Phase: 17 of 17 (all milestones complete)
+Status: v1.2 archived, ready for next milestone
+Last activity: 2026-02-19 -- Completed v1.2 milestone archival
-Progress: [████████████████████████████████████████████████████████████] v1.1: 41/41 plans complete
+Progress: [################################] 100% (v1.0 + v1.1 + v1.2 complete)
## Performance Metrics
@@ -23,96 +22,20 @@ Progress: [███████████████████████
- Average duration: 3.9 min
- Total execution time: 82 min
-**By Phase (v1.0):**
-
-| Phase | Plans | Total | Avg/Plan |
-|-------|-------|-------|----------|
-| 1. Data Foundation | 4/4 | 14 min | 3.5 min |
-| 2. Visual Grid | 3/3 | 15 min | 5.0 min |
-| 3. Filtering & Search | 2/2 | 10 min | 5.0 min |
-| 4. Predictions & Comparison | 3/3 | 9 min | 3.0 min |
-| 5. Embeddings & Visualization | 4/4 | 16 min | 4.0 min |
-| 6. Error Analysis & Similarity | 2/2 | 9 min | 4.5 min |
-| 7. Intelligence & Agents | 3/3 | 9 min | 3.0 min |
-
-**By Phase (v1.1):**
-
-| Phase | Plans | Total | Avg/Plan |
-|-------|-------|-------|----------|
-| 8. Docker Deployment & Auth | 5/5 | 25 min | 5.0 min |
-| 9. Smart Ingestion | 2/2 | 10 min | 5.0 min |
-| 10. Annotation Editing | 3/3 | 9 min | 3.0 min |
-| 11. Error Triage | 2/2 | 6 min | 3.0 min |
-| 12. Interactive Viz & Discovery | 3/3 | 10 min | 3.3 min |
-| 13. Keyboard Shortcuts | 2/2 | 6 min | 3.0 min |
-| 14. Per-Annotation Triage | 3/3 | 7 min | 2.3 min |
+**Velocity (v1.1):**
+- Total plans completed: 20
+- Average duration: 3.7 min
+- Total execution time: 73 min
+
+**Velocity (v1.2):**
+- Total plans completed: 6
+- Timeline: 1 day (2026-02-18)
## Accumulated Context
### Decisions
Decisions are logged in PROJECT.md Key Decisions table.
-Recent decisions affecting current work:
-
-- [v1.1 Roadmap]: Keep Qdrant in local mode for Docker (single-user <1M vectors)
-- [v1.1 Roadmap]: Caddy over nginx for reverse proxy (auto-HTTPS, built-in basic_auth)
-- [v1.1 Roadmap]: react-konva for annotation editing in detail modal only (SVG stays for grid)
-- [v1.1 Roadmap]: FastAPI HTTPBasic DI over middleware (testable, composable)
-- [08-01]: CPU-only PyTorch via post-sync replacement in Dockerfile (uv sync then uv pip install from CPU index)
-- [08-01]: CORS restricted to localhost:3000 in dev, disabled entirely behind proxy (DATAVISOR_BEHIND_PROXY=true)
-- [08-02]: NEXT_PUBLIC_API_URL=/api baked at build time for same-origin API via Caddy
-- [08-02]: Caddy handles all auth at proxy layer -- zero application code changes
-- [08-03]: Directory bind mount ./data:/app/data for DuckDB WAL + Qdrant + thumbnails persistence
-- [08-03]: AUTH_PASSWORD_HASH has no default -- forces explicit auth configuration before deployment
-- [08-03]: Only Caddy exposes ports 80/443 -- backend and frontend are Docker-internal only
-- [08-04]: VM startup script does NOT auto-start docker compose -- requires manual .env setup first
-- [08-04]: GCP config via env vars with defaults (only GCP_PROJECT_ID required)
-- [08-05]: 10-section deployment docs covering local Docker, GCP, custom domain HTTPS, data persistence, troubleshooting
-- [08-05]: opencv-python-headless replaces opencv-python in Docker builder stage (no X11/GUI libs in slim images)
-- [09-01]: Three-layout priority detection: Roboflow > Standard COCO > Flat
-- [09-01]: ijson peek at top-level keys for COCO detection (max 10 keys, files >500MB skipped)
-- [09-01]: Optional dataset_id param on ingest_with_progress for multi-split ID sharing
-- [09-01]: INSERT-or-UPDATE pattern for dataset record across multi-split imports
-- [09-02]: POST SSE streaming via fetch + ReadableStream (not EventSource, which is GET-only)
-- [09-02]: FolderScanner refactored to accept StorageBackend for GCS support
-- [09-02]: Split-prefixed IDs for collision avoidance in multi-split import
-- [10-01]: get_cursor DI for annotation router (auto-close cursor)
-- [10-01]: source='ground_truth' enforced in SQL WHERE clauses for PUT/DELETE safety
-- [10-01]: Dataset counts refreshed via subquery UPDATE (no race conditions)
-- [10-02]: useDrawLayer hook pattern (handlers + ReactNode) instead of separate component
-- [10-02]: Transformer scale reset to 1 on transformEnd (Konva best practice)
-- [10-03]: AnnotationEditor loaded via next/dynamic with ssr:false (prevents Konva SSR errors)
-- [10-03]: Draw completion shows ClassPicker before creating annotation (requires category selection)
-- [10-03]: Delete buttons only appear on ground_truth rows when edit mode is active
-- [11-01]: Dual router pattern (samples_router + datasets_router) from single triage module
-- [11-01]: Atomic triage tag replacement via list_filter + list_append single SQL
-- [11-01]: get_db DI pattern for triage router (matching statistics.py style)
-- [11-02]: Triage buttons always visible in detail modal (not gated by edit mode)
-- [11-02]: Highlight toggle uses yellow-500 active styling to distinguish from edit buttons
-- [11-02]: Triage tag badges show short label (TP/FP/FN/MISTAKE) instead of full prefix
-- [12-01]: Lasso selection takes priority over discovery filter (effectiveIds = lassoSelectedIds ?? sampleIdFilter)
-- [12-01]: "Show in Grid" button only appears after similarity results load (progressive disclosure)
-- [12-01]: getState() pattern for store access in Recharts onClick handlers (non-reactive)
-- [12-01]: DiscoveryFilterChip in dataset header for cross-tab visibility
-- [12-02]: Imperative fetch function (not hook) for one-shot confusion cell sample lookups
-- [12-02]: Greedy IoU matching replayed per sample for consistent CM cell membership
-- [12-02]: getState() pattern for Zustand store writes in async callbacks
-- [12-03]: Tab bar always visible so Near Duplicates is accessible without predictions
-- [12-03]: Union-find with path compression for O(alpha(n)) grouping of pairwise matches
-- [12-03]: Progress updates throttled to every 10 points to avoid excessive state updates
-- [13-01]: isFocused passed as prop from ImageGrid (avoids N store subscriptions per GridCell)
-- [13-01]: Central shortcut registry pattern: all shortcuts as data in lib/shortcuts.ts
-- [13-02]: Single useHotkeys('1, 2, 3, 4') with event.key dispatch (avoids rules-of-hooks violation)
-- [13-02]: Single-level undo stack via React state for annotation delete undo
-- [13-02]: Triage number keys disabled during edit mode (prevents Konva focus confusion)
-- [13-02]: groupByCategory via reduce instead of Object.groupBy (avoids es2024 lib dep)
-- [14-01]: Reuse _compute_iou_matrix from evaluation.py (no duplicate IoU code)
-- [14-01]: Auto-computed labels ephemeral (computed on GET, not stored); overrides persist in annotation_triage table
-- [14-01]: triage:annotated sample tag bridges per-annotation triage to highlight mode
-- [14-02]: TriageOverlay is separate from AnnotationOverlay (interactive vs non-interactive SVG)
-- [14-02]: Click handler delegates to parent via callback (overlay does not manage mutations)
-- [14-02]: Annotations not in triageMap skipped (handles GT-only samples gracefully)
-- [14-03]: GT boxes show category name only, predictions show category + confidence% (color conveys triage type)
### Pending Todos
@@ -120,10 +43,16 @@ None.
### Blockers/Concerns
-- [RESOLVED] SVG-to-Canvas coordinate mismatch resolved by coord-utils.ts (10-02)
+- Confirm Roboflow JSONL format against actual export before finalizing parser
+
+### Roadmap Evolution
+
+- v1.0: 7 phases (1-7), 21 plans -- shipped 2026-02-12
+- v1.1: 7 phases (8-14), 20 plans -- shipped 2026-02-13
+- v1.2: 3 phases (15-17), 6 plans -- shipped 2026-02-19
## Session Continuity
-Last session: 2026-02-13
-Stopped at: Phase 14 complete, v1.1 milestone complete
+Last session: 2026-02-19
+Stopped at: Completed v1.2 milestone archival
Resume file: None
diff --git a/.planning/REQUIREMENTS.md b/.planning/milestones/v1.1-REQUIREMENTS.md
similarity index 68%
rename from .planning/REQUIREMENTS.md
rename to .planning/milestones/v1.1-REQUIREMENTS.md
index a8c8887..12b090b 100644
--- a/.planning/REQUIREMENTS.md
+++ b/.planning/milestones/v1.1-REQUIREMENTS.md
@@ -1,11 +1,10 @@
-# Requirements: DataVisor v1.1
+# Requirements Archive: DataVisor v1.1
**Defined:** 2026-02-12
-**Core Value:** A single tool that replaces scattered scripts: load any CV dataset, visually browse with annotation overlays, compare GT vs predictions, cluster via embeddings, and surface mistakes — all in one workflow.
+**Completed:** 2026-02-13
+**Core Value:** A single tool that replaces scattered scripts: load any CV dataset, visually browse with annotation overlays, compare GT vs predictions, cluster via embeddings, and surface mistakes -- all in one workflow.
-## v1.1 Requirements
-
-Requirements for Deployment, Workflow & Competitive Parity milestone.
+## v1.1 Requirements (All Complete)
### Deployment & Infrastructure
@@ -39,7 +38,7 @@ Requirements for Deployment, Workflow & Competitive Parity milestone.
- [x] **TRIAGE-03**: "Worst images" ranking surfaces samples with highest combined error score (error count + confidence spread + uniqueness)
- [x] **TRIAGE-04**: Interactive confusion matrix that filters grid when a cell is clicked
- [x] **TRIAGE-05**: Near-duplicate detection surfaces visually similar images in the dataset
-- [x] **TRIAGE-06**: Interactive histograms on the statistics dashboard — clicking a bar filters the grid
+- [x] **TRIAGE-06**: Interactive histograms on the statistics dashboard -- clicking a bar filters the grid
### UX
@@ -48,46 +47,8 @@ Requirements for Deployment, Workflow & Competitive Parity milestone.
- [x] **UX-03**: Keyboard shortcuts for annotation editing (Delete, Ctrl+Z, e for edit mode)
- [x] **UX-04**: Shortcut help overlay triggered by ? key
-## v1.2 Requirements
-
-Deferred to future milestone. Tracked but not in current roadmap.
-
-### Format Expansion
-
-- **FMT-01**: YOLO format parser (.txt annotation files with class_id + normalized xywh)
-- **FMT-02**: Pascal VOC format parser (XML annotation files)
-- **FMT-03**: Dataset export in COCO and YOLO formats
-
-### Evaluation
-
-- **EVAL-01**: PR curves per class
-- **EVAL-02**: Per-class AP metrics dashboard
-
-### Advanced
-
-- **ADV-01**: Model zoo / in-app inference (ONNX/TorchScript)
-- **ADV-02**: Custom workspaces / panel layouts
-- **ADV-03**: Customizable keyboard shortcut remapping
-- **ADV-04**: CVAT/Label Studio integration for complex annotation workflows
-
-## Out of Scope
-
-Explicitly excluded. Documented to prevent scope creep.
-
-| Feature | Reason |
-|---------|--------|
-| Multi-user collaboration / RBAC | Personal tool — single-user auth for VM security only |
-| Video annotation support | Image-only for now; multiplies complexity |
-| Training pipeline integration | DataVisor inspects data, doesn't train models |
-| Mobile/tablet interface | Desktop browser only |
-| Real-time streaming inference | Batch-oriented analysis |
-| 3D point cloud visualization | Different rendering pipeline entirely |
-| Full annotation editor (polygon, segmentation) | Bounding box CRUD only for v1.1 |
-
## Traceability
-Which phases cover which requirements. Updated during roadmap creation.
-
| Requirement | Phase | Status |
|-------------|-------|--------|
| DEPLOY-01 | Phase 8 | Complete |
@@ -117,11 +78,7 @@ Which phases cover which requirements. Updated during roadmap creation.
| UX-03 | Phase 13 | Complete |
| UX-04 | Phase 13 | Complete |
-**Coverage:**
-- v1.1 requirements: 26 total
-- Mapped to phases: 26
-- Unmapped: 0
+**Coverage:** 26/26 requirements complete (100%)
---
-*Requirements defined: 2026-02-12*
-*Last updated: 2026-02-13 — Phase 13 requirements marked Complete (v1.1 milestone complete)*
+*Archived: 2026-02-13*
diff --git a/.planning/milestones/v1.1-ROADMAP.md b/.planning/milestones/v1.1-ROADMAP.md
new file mode 100644
index 0000000..ad66394
--- /dev/null
+++ b/.planning/milestones/v1.1-ROADMAP.md
@@ -0,0 +1,131 @@
+# Milestone v1.1: Deployment, Workflow & Competitive Parity
+
+**Status:** SHIPPED 2026-02-13
+**Phases:** 8-14
+**Total Plans:** 20
+
+## Overview
+
+Make DataVisor deployable (Docker + GCP), secure for cloud access, and close key workflow gaps vs FiftyOne/Encord -- smart ingestion, annotation editing, error triage, interactive visualizations, and keyboard-driven navigation.
+
+## Phases
+
+### Phase 8: Docker Deployment & Auth
+
+**Goal**: DataVisor runs as a deployable Docker stack with single-user auth, accessible securely on a cloud VM or locally with a single command
+**Depends on**: Phase 7 (v1.0 complete)
+**Requirements**: DEPLOY-01, DEPLOY-02, DEPLOY-03, DEPLOY-04, DEPLOY-05
+**Plans**: 5 plans
+
+Plans:
+- [x] 08-01: Backend Dockerfile + config fixes (CORS, DuckDB CHECKPOINT)
+- [x] 08-02: Frontend Dockerfile + Caddyfile reverse proxy with auth
+- [x] 08-03: Docker Compose orchestration + .dockerignore + env config
+- [x] 08-04: Local run script + GCP deployment scripts
+- [x] 08-05: Deployment documentation + full stack verification
+
+### Phase 9: Smart Ingestion
+
+**Goal**: Users can import datasets from the UI by pointing at a folder, reviewing auto-detected structure, and confirming import -- no CLI or config files needed
+**Depends on**: Phase 8 (auth protects new endpoints)
+**Requirements**: INGEST-01, INGEST-02, INGEST-03, INGEST-04, INGEST-05
+**Plans**: 2 plans
+
+Plans:
+- [x] 09-01: Backend FolderScanner service, scan/import API endpoints, split-aware ingestion pipeline
+- [x] 09-02: Frontend ingestion wizard (path input, scan results, import progress) + landing page link
+
+### Phase 10: Annotation Editing
+
+**Goal**: Users can make quick bounding box corrections directly in the sample detail modal without leaving DataVisor
+**Depends on**: Phase 8 (auth protects mutation endpoints)
+**Requirements**: ANNOT-01, ANNOT-02, ANNOT-03, ANNOT-04, ANNOT-05
+**Plans**: 3 plans
+
+Plans:
+- [x] 10-01: Backend annotation CRUD endpoints + frontend mutation hooks and types
+- [x] 10-02: Konva building blocks: coord-utils, EditableRect, DrawLayer, ClassPicker
+- [x] 10-03: AnnotationEditor composition, sample modal integration, annotation list delete
+
+### Phase 11: Error Triage
+
+**Goal**: Users can systematically review and tag errors with a focused triage workflow that persists decisions and surfaces the worst samples first
+**Depends on**: Phase 8 (extends v1.0 error analysis)
+**Requirements**: TRIAGE-01, TRIAGE-02, TRIAGE-03
+**Plans**: 2 plans
+
+Plans:
+- [x] 11-01: Backend triage endpoints (set-triage-tag, worst-images scoring) + frontend hooks and types
+- [x] 11-02: Triage tag buttons in detail modal, highlight mode grid dimming, worst-images stats panel
+
+### Phase 12: Interactive Viz & Discovery
+
+**Goal**: Users can explore dataset quality interactively -- clicking visualization elements filters the grid, finding similar samples and near-duplicates is one click away
+**Depends on**: Phase 11 (triage data informs confusion matrix), Phase 8 (auth protects endpoints)
+**Requirements**: ANNOT-06, TRIAGE-04, TRIAGE-05, TRIAGE-06
+**Plans**: 3 plans
+
+Plans:
+- [x] 12-01: Discovery filter foundation + Find Similar grid filtering + interactive histogram bars
+- [x] 12-02: Clickable confusion matrix cells with backend sample ID resolution
+- [x] 12-03: Near-duplicate detection via Qdrant pairwise search with SSE progress
+
+### Phase 13: Keyboard Shortcuts
+
+**Goal**: Power users can navigate, triage, and edit entirely from the keyboard without reaching for the mouse
+**Depends on**: Phase 10 (annotation edit shortcuts), Phase 11 (triage shortcuts), Phase 12 (all UI features exist)
+**Requirements**: UX-01, UX-02, UX-03, UX-04
+**Plans**: 2 plans
+
+Plans:
+- [x] 13-01: Foundation (react-hotkeys-hook, shortcut registry, ui-store) + grid keyboard navigation
+- [x] 13-02: Modal shortcuts (navigation, triage, editing, undo) + help overlay
+
+### Phase 14: Per-Annotation Triage
+
+**Goal**: Users can see auto-discovered TP/FP/FN classifications per bounding box based on IoU overlap, with color-coded visualization in the detail modal and the ability to click individual annotations to override their classification
+**Depends on**: Phase 11 (extends triage system), Phase 6 (error analysis IoU matching)
+**Plans**: 3 plans
+
+Plans:
+- [x] 14-01: Backend schema, IoU matching service, and annotation triage API endpoints
+- [x] 14-02: Frontend types, hooks, and clickable TriageOverlay SVG component
+- [x] 14-03: Wire TriageOverlay into sample modal + highlight mode integration
+
+## Milestone Summary
+
+**Key Decisions:**
+
+- Caddy over nginx for reverse proxy (auto-HTTPS, built-in basic_auth)
+- CPU-only PyTorch via post-sync replacement in Dockerfile
+- react-konva for annotation editing (SVG stays for grid overlays)
+- FastAPI HTTPBasic DI over middleware (testable, composable)
+- Atomic triage tag replacement via list_filter + list_append single SQL
+- Union-find with path compression for near-duplicate grouping
+- Central shortcut registry pattern (all shortcuts as data)
+- Auto-computed triage labels ephemeral (computed on GET); overrides persist in annotation_triage table
+- Switched AI agent from OpenAI GPT-4o to Google Gemini 2.0 Flash
+- Pre-compute all data for AI agent prompt (no tool calls needed)
+
+**Issues Resolved:**
+
+- opencv-python-headless for Docker slim images (no X11 libs needed)
+- DuckDB WAL stale file recovery via CHECKPOINT on shutdown
+- PyTorch CPU install order (uv sync first, then replace with CPU wheel)
+- Pydantic AI request_limit exceeded by Gemini tool-call loop (eliminated tools)
+- GEMINI_API_KEY not loading (load_dotenv for third-party libs)
+- pyvips missing for Moondream2 auto-tag (added dependency)
+
+**Issues Deferred:**
+
+- UMAP blocked by Python 3.14 numba incompatibility (using t-SNE)
+- Moondream2 trust_remote_code fragile with transformers updates
+
+**Technical Debt Incurred:**
+
+- Module-level cache for Intelligence panel results (should use React Query cache)
+- Old triage tags filtered client-side (OBSOLETE_TRIAGE_TAGS set in grid-cell.tsx)
+
+---
+
+_For current project status, see .planning/ROADMAP.md_
diff --git a/.planning/milestones/v1.2-REQUIREMENTS.md b/.planning/milestones/v1.2-REQUIREMENTS.md
new file mode 100644
index 0000000..28bec2b
--- /dev/null
+++ b/.planning/milestones/v1.2-REQUIREMENTS.md
@@ -0,0 +1,103 @@
+# Requirements Archive: v1.2 Classification Dataset Support
+
+**Archived:** 2026-02-19
+**Status:** SHIPPED
+
+For current requirements, see `.planning/REQUIREMENTS.md`.
+
+---
+
+# Requirements: DataVisor
+
+**Defined:** 2026-02-18
+**Core Value:** A single tool that replaces scattered scripts: load any CV dataset, visually browse with annotation overlays, compare GT vs predictions, cluster via embeddings, and surface mistakes -- all in one workflow.
+
+## v1.2 Requirements
+
+Requirements for classification dataset support. Each maps to roadmap phases.
+
+### Ingestion
+
+- [x] **INGEST-01**: User can import a classification dataset from a directory containing JSONL annotations and images
+- [x] **INGEST-02**: System auto-detects dataset type (detection vs classification) from annotation format during import
+- [x] **INGEST-03**: User can import multi-split classification datasets (train/valid/test) in a single operation
+- [x] **INGEST-04**: Schema stores dataset_type on the datasets table and handles classification annotations without bbox values
+
+### Display
+
+- [x] **DISP-01**: User sees class label badges on grid thumbnails for classification datasets
+- [x] **DISP-02**: User sees class label (GT and prediction) prominently in the sample detail modal
+- [x] **DISP-03**: User can edit the GT class label via dropdown in the detail modal
+- [x] **DISP-04**: Statistics dashboard shows classification-appropriate metrics (labeled images, class distribution) and hides detection-only elements (bbox area, IoU slider)
+
+### Evaluation
+
+- [x] **EVAL-01**: User can import classification predictions in JSONL format with confidence scores
+- [x] **EVAL-02**: User sees accuracy, macro F1, weighted F1, and per-class precision/recall/F1 metrics
+- [x] **EVAL-03**: User sees a confusion matrix for classification with click-to-filter support
+- [x] **EVAL-04**: User sees error analysis categorizing each image as correct, misclassified, or missing prediction
+- [x] **EVAL-05**: User sees GT vs predicted label comparison on grid thumbnails and in the modal
+
+### Polish
+
+- [x] **POLISH-01**: Confusion matrix scales to 43+ classes with readable rendering
+- [x] **POLISH-02**: User can color embedding scatter by GT class, predicted class, or correct/incorrect status
+- [x] **POLISH-03**: User sees most-confused class pairs summary from the confusion matrix
+- [x] **POLISH-04**: User sees per-class performance sparklines with color-coded thresholds
+
+## Future Requirements
+
+### Multi-label Classification
+
+- **MLABEL-01**: User can import multi-label classification datasets (multiple labels per image)
+- **MLABEL-02**: User sees multi-label metrics (hamming loss, subset accuracy)
+
+### Advanced Evaluation
+
+- **ADVEVAL-01**: User can import top-K predictions with full probability distributions
+- **ADVEVAL-02**: User sees confidence calibration plot (reliability diagram)
+- **ADVEVAL-03**: User can compare performance across train/valid/test splits side-by-side
+
+## Out of Scope
+
+| Feature | Reason |
+|---------|--------|
+| Multi-label classification | Different data model, metrics, and UI; scope explosion for v1.2 |
+| Top-K evaluation | Requires importing full probability distributions; complicates schema |
+| PR curves for classification | Less informative than confusion matrix + per-class metrics for multi-class |
+| mAP for classification | Detection metric, not applicable to classification |
+| Bbox editing for classification | No bounding boxes in classification datasets |
+| IoU threshold controls for classification | No spatial matching in classification |
+
+## Traceability
+
+Which phases cover which requirements. Updated during roadmap creation.
+
+| Requirement | Phase | Status |
+|-------------|-------|--------|
+| INGEST-01 | Phase 15 | Done |
+| INGEST-02 | Phase 15 | Done |
+| INGEST-03 | Phase 15 | Done |
+| INGEST-04 | Phase 15 | Done |
+| DISP-01 | Phase 15 | Done |
+| DISP-02 | Phase 15 | Done |
+| DISP-03 | Phase 15 | Done |
+| DISP-04 | Phase 15 | Done |
+| EVAL-01 | Phase 16 | Done |
+| EVAL-02 | Phase 16 | Done |
+| EVAL-03 | Phase 16 | Done |
+| EVAL-04 | Phase 16 | Done |
+| EVAL-05 | Phase 16 | Done |
+| POLISH-01 | Phase 17 | Done |
+| POLISH-02 | Phase 17 | Done |
+| POLISH-03 | Phase 17 | Done |
+| POLISH-04 | Phase 17 | Done |
+
+**Coverage:**
+- v1.2 requirements: 17 total
+- Mapped to phases: 17
+- Unmapped: 0
+
+---
+*Requirements defined: 2026-02-18*
+*Last updated: 2026-02-18 after roadmap creation*
diff --git a/.planning/milestones/v1.2-ROADMAP.md b/.planning/milestones/v1.2-ROADMAP.md
new file mode 100644
index 0000000..cb34b45
--- /dev/null
+++ b/.planning/milestones/v1.2-ROADMAP.md
@@ -0,0 +1,145 @@
+# Roadmap: DataVisor
+
+## Milestones
+
+- v1.0 MVP - Phases 1-7 (shipped 2026-02-12) — [archive](.planning/milestones/v1.0-ROADMAP.md)
+- v1.1 Deployment, Workflow & Competitive Parity - Phases 8-14 (shipped 2026-02-13) — [archive](.planning/milestones/v1.1-ROADMAP.md)
+- v1.2 Classification Dataset Support - Phases 15-17 (in progress)
+
+## Phases
+
+
+v1.0 MVP (Phases 1-7) - SHIPPED 2026-02-12
+
+### Phase 1: Data Foundation
+**Goal**: DuckDB-backed streaming ingestion pipeline for COCO datasets at 100K+ scale
+**Plans**: 4 plans (complete)
+
+### Phase 2: Visual Grid
+**Goal**: Virtualized infinite-scroll grid with SVG annotation overlays
+**Plans**: 3 plans (complete)
+
+### Phase 3: Filtering & Search
+**Goal**: Full metadata filtering, search, saved views, and bulk tagging
+**Plans**: 2 plans (complete)
+
+### Phase 4: Predictions & Comparison
+**Goal**: Model prediction import with GT vs Predictions comparison
+**Plans**: 3 plans (complete)
+
+### Phase 5: Embeddings & Visualization
+**Goal**: DINOv2 embeddings with t-SNE reduction and deck.gl scatter plot
+**Plans**: 4 plans (complete)
+
+### Phase 6: Error Analysis & Similarity
+**Goal**: Error categorization pipeline and Qdrant-powered similarity search
+**Plans**: 2 plans (complete)
+
+### Phase 7: Intelligence & Agents
+**Goal**: Pydantic AI agent for error patterns and Moondream2 VLM auto-tagging
+**Plans**: 3 plans (complete)
+
+
+
+
+v1.1 Deployment, Workflow & Competitive Parity (Phases 8-14) - SHIPPED 2026-02-13
+
+### Phase 8: Docker Deployment & Auth
+**Goal**: Deployable Docker stack with single-user auth, accessible on cloud VM or locally
+**Plans**: 5 plans (complete)
+
+### Phase 9: Smart Ingestion
+**Goal**: No-code dataset import from folder path with auto-detection and confirmation
+**Plans**: 2 plans (complete)
+
+### Phase 10: Annotation Editing
+**Goal**: Move, resize, delete, and draw bounding boxes via react-konva in sample detail modal
+**Plans**: 3 plans (complete)
+
+### Phase 11: Error Triage
+**Goal**: Tag errors, highlight mode, and worst-images ranking with DuckDB persistence
+**Plans**: 2 plans (complete)
+
+### Phase 12: Interactive Viz & Discovery
+**Goal**: Confusion matrix, near-duplicates, interactive histograms, and find-similar
+**Plans**: 3 plans (complete)
+
+### Phase 13: Keyboard Shortcuts
+**Goal**: Keyboard navigation, triage hotkeys, edit shortcuts, and help overlay
+**Plans**: 2 plans (complete)
+
+### Phase 14: Per-Annotation Triage
+**Goal**: Auto-discover TP/FP/FN per bounding box via IoU overlap, color-coded boxes in detail modal, click to override classifications
+**Plans**: 3 plans (complete)
+
+
+
+### v1.2 Classification Dataset Support (In Progress)
+
+**Milestone Goal:** First-class single-label classification dataset support with full feature parity to detection workflows -- from ingestion through evaluation to polish.
+
+#### Phase 15: Classification Ingestion & Display
+**Goal**: Users can import, browse, and inspect classification datasets with the same ease as detection datasets
+**Depends on**: Phase 14 (existing codebase)
+**Requirements**: INGEST-01, INGEST-02, INGEST-03, INGEST-04, DISP-01, DISP-02, DISP-03, DISP-04
+**Success Criteria** (what must be TRUE):
+ 1. User can point the ingestion wizard at a folder with JSONL annotations and images, and the system auto-detects it as a classification dataset
+ 2. User can import multi-split classification datasets (train/valid/test) in a single operation, just like detection datasets
+ 3. User sees class label badges on grid thumbnails instead of bounding box overlays when browsing a classification dataset
+ 4. User sees GT class label prominently in the sample detail modal and can change it via a dropdown
+ 5. Statistics dashboard shows classification-appropriate metrics (labeled images count, class distribution) with no detection-only elements visible (no bbox area histogram, no IoU slider)
+**Plans**: 2 plans (complete)
+Plans:
+- [x] 15-01-PLAN.md -- Backend: schema migration, ClassificationJSONLParser, FolderScanner detection, IngestionService dispatch, API endpoints
+- [x] 15-02-PLAN.md -- Frontend: type updates, grid class badges, detail modal class label/dropdown, classification-aware statistics
+
+#### Phase 16: Classification Evaluation
+**Goal**: Users can import predictions and analyze classification model performance with accuracy, F1, confusion matrix, and error categorization
+**Depends on**: Phase 15
+**Requirements**: EVAL-01, EVAL-02, EVAL-03, EVAL-04, EVAL-05
+**Success Criteria** (what must be TRUE):
+ 1. User can import classification predictions in JSONL format with confidence scores and see them alongside ground truth
+ 2. User sees accuracy, macro F1, weighted F1, and per-class precision/recall/F1 metrics in the evaluation panel
+ 3. User sees a confusion matrix and can click any cell to filter the grid to images with that GT/predicted class pair
+ 4. User sees each image categorized as correct, misclassified, or missing prediction in the error analysis view
+ 5. User sees GT vs predicted label comparison on grid thumbnails and in the detail modal
+**Plans**: 2 plans (complete)
+Plans:
+- [x] 16-01-PLAN.md -- Backend: classification prediction parser, evaluation service, error analysis service, endpoint routing
+- [x] 16-02-PLAN.md -- Frontend: types, hooks, prediction import dialog, evaluation panel, error analysis panel, grid badges
+
+#### Phase 17: Classification Polish
+**Goal**: Classification workflows are production-ready for high-cardinality datasets (43+ classes) with visual aids that surface actionable insights
+**Depends on**: Phase 16
+**Requirements**: POLISH-01, POLISH-02, POLISH-03, POLISH-04
+**Success Criteria** (what must be TRUE):
+ 1. Confusion matrix renders readably at 43+ classes with threshold filtering and overflow handling
+ 2. User can color the embedding scatter plot by GT class, predicted class, or correct/incorrect status
+ 3. User sees a ranked list of most-confused class pairs derived from the confusion matrix
+ 4. User sees per-class performance sparklines with color-coded thresholds (green/yellow/red) in the metrics table
+**Plans**: 2 plans (complete)
+Plans:
+- [x] 17-01-PLAN.md -- Confusion matrix threshold/overflow, most-confused pairs, F1 bars in per-class table
+- [x] 17-02-PLAN.md -- Embedding scatter color modes (GT class, predicted class, correct/incorrect)
+
+## Progress
+
+| Phase | Milestone | Plans Complete | Status | Completed |
+|-------|-----------|----------------|--------|-----------|
+| 1. Data Foundation | v1.0 | 4/4 | Complete | 2026-02-10 |
+| 2. Visual Grid | v1.0 | 3/3 | Complete | 2026-02-10 |
+| 3. Filtering & Search | v1.0 | 2/2 | Complete | 2026-02-11 |
+| 4. Predictions & Comparison | v1.0 | 3/3 | Complete | 2026-02-11 |
+| 5. Embeddings & Visualization | v1.0 | 4/4 | Complete | 2026-02-11 |
+| 6. Error Analysis & Similarity | v1.0 | 2/2 | Complete | 2026-02-12 |
+| 7. Intelligence & Agents | v1.0 | 3/3 | Complete | 2026-02-12 |
+| 8. Docker Deployment & Auth | v1.1 | 5/5 | Complete | 2026-02-12 |
+| 9. Smart Ingestion | v1.1 | 2/2 | Complete | 2026-02-12 |
+| 10. Annotation Editing | v1.1 | 3/3 | Complete | 2026-02-12 |
+| 11. Error Triage | v1.1 | 2/2 | Complete | 2026-02-12 |
+| 12. Interactive Viz & Discovery | v1.1 | 3/3 | Complete | 2026-02-13 |
+| 13. Keyboard Shortcuts | v1.1 | 2/2 | Complete | 2026-02-13 |
+| 14. Per-Annotation Triage | v1.1 | 3/3 | Complete | 2026-02-13 |
+| 15. Classification Ingestion & Display | v1.2 | 2/2 | Complete | 2026-02-18 |
+| 16. Classification Evaluation | v1.2 | 2/2 | Complete | 2026-02-18 |
+| 17. Classification Polish | v1.2 | 2/2 | Complete | 2026-02-18 |
diff --git a/.planning/phases/15-classification-ingestion-display/15-01-PLAN.md b/.planning/phases/15-classification-ingestion-display/15-01-PLAN.md
new file mode 100644
index 0000000..be35187
--- /dev/null
+++ b/.planning/phases/15-classification-ingestion-display/15-01-PLAN.md
@@ -0,0 +1,256 @@
+---
+phase: 15-classification-ingestion-display
+plan: 01
+type: execute
+wave: 1
+depends_on: []
+files_modified:
+ - app/repositories/duckdb_repo.py
+ - app/models/dataset.py
+ - app/models/scan.py
+ - app/ingestion/base_parser.py
+ - app/ingestion/classification_jsonl_parser.py
+ - app/services/folder_scanner.py
+ - app/services/ingestion.py
+ - app/routers/ingestion.py
+ - app/routers/datasets.py
+ - app/routers/annotations.py
+ - app/routers/statistics.py
+autonomous: true
+
+must_haves:
+ truths:
+ - "POST /ingestion/scan on a folder with JSONL + images returns format='classification_jsonl' with correct splits"
+ - "POST /ingestion/import with classification_jsonl splits creates dataset with dataset_type='classification' and annotations with sentinel bbox values (0.0)"
+ - "GET /datasets/{id} returns dataset_type field"
+ - "PATCH /annotations/{id}/category updates category_name for classification label editing"
+ - "Classification annotations have bbox_x=0, bbox_y=0, bbox_w=0, bbox_h=0, area=0 as sentinel values"
+ artifacts:
+ - path: "app/ingestion/classification_jsonl_parser.py"
+ provides: "ClassificationJSONLParser extending BaseParser"
+ contains: "class ClassificationJSONLParser"
+ - path: "app/repositories/duckdb_repo.py"
+ provides: "dataset_type column migration"
+ contains: "dataset_type"
+ - path: "app/services/folder_scanner.py"
+ provides: "Classification JSONL layout detection"
+ contains: "classification_jsonl"
+ key_links:
+ - from: "app/services/folder_scanner.py"
+ to: "app/models/scan.py"
+ via: "ScanResult with format='classification_jsonl'"
+ pattern: "classification_jsonl"
+ - from: "app/services/ingestion.py"
+ to: "app/ingestion/classification_jsonl_parser.py"
+ via: "parser dispatch by format string"
+ pattern: "ClassificationJSONLParser"
+ - from: "app/services/ingestion.py"
+ to: "app/repositories/duckdb_repo.py"
+ via: "stores dataset_type on dataset record"
+ pattern: "dataset_type"
+---
+
+
+Add classification dataset ingestion support to the backend: schema migration, JSONL parser, folder scanner detection, parser dispatch, and annotation category update endpoint.
+
+Purpose: Enable the system to auto-detect, parse, and store classification datasets using the existing ingestion pipeline with sentinel bbox values. This is the backend foundation for all classification display work.
+Output: ClassificationJSONLParser, extended FolderScanner, updated IngestionService with parser dispatch, dataset_type column, and category update endpoint.
+
+
+
+@/Users/ortizeg/.claude/get-shit-done/workflows/execute-plan.md
+@/Users/ortizeg/.claude/get-shit-done/templates/summary.md
+
+
+
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/phases/15-classification-ingestion-display/15-RESEARCH.md
+@app/ingestion/base_parser.py
+@app/ingestion/coco_parser.py
+@app/services/folder_scanner.py
+@app/services/ingestion.py
+@app/repositories/duckdb_repo.py
+@app/models/dataset.py
+@app/models/scan.py
+@app/routers/ingestion.py
+@app/routers/datasets.py
+@app/routers/annotations.py
+
+
+
+
+
+ Task 1: Schema migration, Pydantic models, and ClassificationJSONLParser
+
+ app/repositories/duckdb_repo.py
+ app/models/dataset.py
+ app/models/scan.py
+ app/ingestion/base_parser.py
+ app/ingestion/classification_jsonl_parser.py
+
+
+ **1. Schema migration** (`app/repositories/duckdb_repo.py`):
+ Add after existing ALTER TABLE statements in `initialize_schema()`:
+ ```python
+ self.connection.execute(
+ "ALTER TABLE datasets ADD COLUMN IF NOT EXISTS dataset_type VARCHAR DEFAULT 'detection'"
+ )
+ ```
+
+ **2. Pydantic models**:
+ - `app/models/dataset.py`: Add `dataset_type: str = "detection"` field to `DatasetResponse`.
+ - `app/models/scan.py`: No change needed -- `ScanResult.format` already accepts any string. The format field will carry `"classification_jsonl"` for classification datasets.
+
+ **3. BaseParser update** (`app/ingestion/base_parser.py`):
+ Add `image_dir: str = ""` parameter to `build_image_batches` abstract method signature if not already present (check COCOParser -- it already has it in the concrete method, ensure the ABC matches).
+
+ **4. ClassificationJSONLParser** (`app/ingestion/classification_jsonl_parser.py` -- NEW FILE):
+ Create parser extending BaseParser with:
+
+ - `format_name` property returns `"classification_jsonl"`
+ - `parse_categories(file_path)`: Single pass over JSONL, collect unique labels from flexible keys (`label`, `class`, `category`, `class_name`). Return `{i: name for i, name in enumerate(sorted(labels))}`.
+ - `build_image_batches(file_path, dataset_id, split, image_dir)`: Read JSONL line by line. For each line, extract filename from flexible keys (`filename`, `file_name`, `image`, `path`). Generate sample_id as `f"{split}_{i}"` if split else `str(i)`. Yield DataFrames with columns matching samples table: `id, dataset_id, file_name, width, height, thumbnail_path, split, metadata, image_dir`. Set width=0, height=0 (resolved during thumbnail generation). Use `self.batch_size` for batching.
+ - `build_annotation_batches(file_path, dataset_id, categories, split)`: Read JSONL again. For each line, extract label using same flexible keys. Create annotation row with sentinel bbox values: `bbox_x=0.0, bbox_y=0.0, bbox_w=0.0, bbox_h=0.0, area=0.0, is_crowd=False, source="ground_truth", confidence=None, metadata=None`. Sample IDs must match those from `build_image_batches` (same `f"{split}_{i}"` pattern). Annotation IDs: `f"{split}_ann_{i}"` or `f"ann_{i}"`.
+
+ Handle edge cases:
+ - Skip empty lines
+ - If `label` is an array, emit one annotation row per label (forward-compatible for multi-label)
+ - Use `"unknown"` as fallback label if no label key found
+
+
+ - `python -c "from app.ingestion.classification_jsonl_parser import ClassificationJSONLParser; p = ClassificationJSONLParser(); print(p.format_name)"` prints `classification_jsonl`
+ - `python -c "from app.models.dataset import DatasetResponse; print(DatasetResponse.model_fields.keys())"` includes `dataset_type`
+ - All existing tests pass: `cd app && python -m pytest tests/ -x -q`
+
+ ClassificationJSONLParser exists with parse_categories, build_image_batches, build_annotation_batches producing sentinel bbox annotations. DatasetResponse includes dataset_type. Schema migration adds dataset_type column.
+
+
+
+ Task 2: FolderScanner detection, IngestionService dispatch, and API endpoints
+
+ app/services/folder_scanner.py
+ app/services/ingestion.py
+ app/routers/ingestion.py
+ app/routers/datasets.py
+ app/routers/annotations.py
+ app/routers/statistics.py
+
+
+ **1. FolderScanner** (`app/services/folder_scanner.py`):
+ Extend `scan()` to detect classification JSONL layouts BEFORE trying COCO layouts (classification is more specific -- a JSONL file is never COCO):
+
+ In the local scan path, add before `_try_layout_b`:
+ ```python
+ splits = self._try_layout_d(Path(resolved), warnings)
+ if not splits:
+ splits = self._try_layout_e(Path(resolved), warnings)
+ if splits:
+ return ScanResult(
+ root_path=resolved,
+ dataset_name=_basename(resolved),
+ format="classification_jsonl",
+ splits=splits,
+ warnings=warnings,
+ )
+ ```
+
+ Add two new layout detectors:
+
+ `_try_layout_d(root, warnings)` -- **Split directories with JSONL + images**:
+ - Use existing `_detect_split_dirs()` to find split dirs
+ - In each split dir, look for `.jsonl` files
+ - For each `.jsonl` file, call `_is_classification_jsonl(file_path)` (new static method)
+ - If valid, count images in the split dir, create DetectedSplit
+ - Return list of splits
+
+ `_try_layout_e(root, warnings)` -- **Flat JSONL at root**:
+ - Look for `.jsonl` files in root (no recursion)
+ - Check if any are classification JSONL via `_is_classification_jsonl()`
+ - Image dir: prefer `images/` subdir, else root itself
+ - Return single-element split list with name=root.name
+
+ `_is_classification_jsonl(file_path)` -- **Static method**:
+ - Open file, read first 5 non-empty lines
+ - Parse each as JSON
+ - Return True if line has (`filename` or `file_name` or `image` or `path`) AND (`label` or `class` or `category` or `class_name`) AND NOT (`bbox` or `annotations`)
+ - Catch all exceptions, return False
+
+ For GCS: Add similar classification detection in `_scan_gcs()` -- check for `.jsonl` files before `.json` files. Use `_is_classification_jsonl_remote()` that reads via `self.storage.open()`.
+
+ **2. IngestionService** (`app/services/ingestion.py`):
+ - Add import for ClassificationJSONLParser at top
+ - In `ingest_with_progress()`, replace hardcoded `COCOParser(batch_size=1000)` with format-based dispatch:
+ ```python
+ if format == "coco":
+ parser = COCOParser(batch_size=1000)
+ elif format == "classification_jsonl":
+ parser = ClassificationJSONLParser(batch_size=1000)
+ else:
+ raise ValueError(f"Unsupported format: {format}")
+ ```
+ - After the dataset INSERT (step 4), for new datasets set dataset_type:
+ ```python
+ dataset_type = "classification" if format == "classification_jsonl" else "detection"
+ ```
+ Include `dataset_type` in the INSERT VALUES. Update the INSERT statement to include the new column. For the UPDATE path (existing dataset), no change needed -- dataset_type is set on first insert.
+ - Update the existing INSERT INTO datasets to include `dataset_type` column. The INSERT currently uses positional VALUES -- add `dataset_type` after `prediction_count` (or adjust column list). Be careful to match column order.
+
+ **3. Ingestion router** (`app/routers/ingestion.py`):
+ - The `/ingestion/import` endpoint passes `format` through to `ingest_with_progress`. Currently it may not pass format. Ensure the ImportRequest or the stored ScanResult format is threaded through. The simplest approach: add `format: str = "coco"` field to `ImportRequest` model in `app/models/scan.py`.
+ - The router calls `ingest_splits_with_progress()` (not `ingest_with_progress` directly), so the full threading chain is:
+ 1. Add `format: str = "coco"` param to `ingest_splits_with_progress()` signature in `app/services/ingestion.py`
+ 2. Inside `ingest_splits_with_progress()`, pass `format=format` to each `self.ingest_with_progress(...)` call in the loop (replacing the hardcoded `format="coco"` default)
+ 3. In the router's import endpoint, pass `request.format` (or `scan_result.format`) to `ingest_splits_with_progress(format=...)`
+
+ **4. Datasets router** (`app/routers/datasets.py`):
+ - Ensure the `GET /datasets` and `GET /datasets/{id}` queries include `dataset_type` in SELECT. Currently using `SELECT *` or explicit columns -- add `dataset_type` to the result mapping into `DatasetResponse`.
+
+ **5. Annotations router** (`app/routers/annotations.py`):
+ - Add a new endpoint: `PATCH /annotations/{annotation_id}/category` accepting `{"category_name": "new_label"}`. It should UPDATE the annotation's `category_name` in DuckDB. Return 200 with the updated annotation. Use a simple Pydantic model `CategoryUpdateRequest(BaseModel): category_name: str`.
+
+ **6. Statistics router** (`app/routers/statistics.py`):
+ - For classification datasets, the `gt_annotations` stat should reflect "labeled images" (count of distinct sample_ids with GT annotations) rather than raw annotation count. Check `dataset_type` from the datasets table, and if `"classification"`, adjust the query. This is a minor conditional in the existing statistics aggregation.
+
+
+ - Create a test JSONL file and verify scanner detection:
+ ```bash
+ mkdir -p /tmp/test_cls/train && echo '{"filename": "a.jpg", "label": "cat"}' > /tmp/test_cls/train/annotations.jsonl && touch /tmp/test_cls/train/a.jpg
+ python -c "
+ from app.services.folder_scanner import FolderScanner
+ s = FolderScanner()
+ r = s.scan('/tmp/test_cls')
+ print(f'format={r.format}, splits={len(r.splits)}, split_name={r.splits[0].name if r.splits else None}')
+ assert r.format == 'classification_jsonl'
+ print('PASS')
+ "
+ ```
+ - All existing tests pass: `cd app && python -m pytest tests/ -x -q`
+ - Server starts without errors: `cd app && timeout 5 python -c "from app.main import app; print('OK')" 2>&1 || true`
+
+ FolderScanner detects classification JSONL layouts (D and E). IngestionService dispatches to ClassificationJSONLParser for classification_jsonl format and stores dataset_type. ImportRequest carries format. PATCH /annotations/{id}/category endpoint exists. Statistics endpoint is classification-aware. GET /datasets returns dataset_type. All existing tests pass.
+
+
+
+
+
+1. Scanner returns format="classification_jsonl" for split-dir JSONL layout
+2. Scanner returns format="classification_jsonl" for flat JSONL layout
+3. Scanner still returns format="coco" for existing COCO layouts (no regression)
+4. Dataset INSERT includes dataset_type="classification" for classification imports
+5. DatasetResponse includes dataset_type field
+6. PATCH /annotations/{id}/category updates category_name
+7. All existing tests pass
+
+
+
+- Classification JSONL folders are auto-detected by the scanner
+- Parser produces correct annotations with sentinel bbox values
+- dataset_type is stored and returned via API
+- Category update endpoint works for classification label editing
+- Zero regressions in existing detection workflow
+
+
+
diff --git a/.planning/phases/15-classification-ingestion-display/15-01-SUMMARY.md b/.planning/phases/15-classification-ingestion-display/15-01-SUMMARY.md
new file mode 100644
index 0000000..e131558
--- /dev/null
+++ b/.planning/phases/15-classification-ingestion-display/15-01-SUMMARY.md
@@ -0,0 +1,131 @@
+---
+phase: 15-classification-ingestion-display
+plan: 01
+subsystem: api, ingestion, database
+tags: [classification, jsonl, parser, duckdb, fastapi, sentinel-bbox]
+
+requires:
+ - phase: 07-evaluation
+ provides: "statistics router and evaluation service"
+ - phase: 02-ingestion
+ provides: "BaseParser, COCOParser, FolderScanner, IngestionService"
+provides:
+ - ClassificationJSONLParser with sentinel bbox values
+ - FolderScanner classification JSONL detection (layouts D and E)
+ - dataset_type column and API field
+ - PATCH /annotations/{id}/category endpoint
+ - Format-based parser dispatch in IngestionService
+ - Classification-aware statistics
+affects: [15-02, 16-classification-evaluation, frontend-classification-display]
+
+tech-stack:
+ added: []
+ patterns: [sentinel-bbox-for-classification, format-based-parser-dispatch, layout-detection-priority]
+
+key-files:
+ created:
+ - app/ingestion/classification_jsonl_parser.py
+ modified:
+ - app/repositories/duckdb_repo.py
+ - app/models/dataset.py
+ - app/models/scan.py
+ - app/models/annotation.py
+ - app/ingestion/base_parser.py
+ - app/services/folder_scanner.py
+ - app/services/ingestion.py
+ - app/routers/ingestion.py
+ - app/routers/datasets.py
+ - app/routers/annotations.py
+ - app/routers/statistics.py
+
+key-decisions:
+ - "Classification JSONL layouts checked before COCO layouts since JSONL is never COCO"
+ - "Sentinel bbox values (all 0.0) for classification annotations to avoid nullable columns"
+ - "Format string threaded through ImportRequest -> ingest_splits_with_progress -> ingest_with_progress"
+ - "Classification gt_annotations stat uses COUNT(DISTINCT sample_id) instead of COUNT(*)"
+
+patterns-established:
+ - "Format dispatch: IngestionService selects parser by format string, extensible for future formats"
+ - "Layout priority: classification-specific layouts tested before generic COCO layouts"
+
+duration: 5min
+completed: 2026-02-18
+---
+
+# Phase 15 Plan 01: Classification Ingestion & Backend Summary
+
+**ClassificationJSONLParser with sentinel bbox values, FolderScanner auto-detection of JSONL layouts, format-based parser dispatch, and category update endpoint**
+
+## Performance
+
+- **Duration:** 5 min
+- **Started:** 2026-02-19T02:13:50Z
+- **Completed:** 2026-02-19T02:18:51Z
+- **Tasks:** 2
+- **Files modified:** 12
+
+## Accomplishments
+- ClassificationJSONLParser that produces annotations with sentinel bbox values (0.0) and supports multi-label via array labels
+- FolderScanner detects classification JSONL in split dirs (Layout D) and flat (Layout E) with GCS support
+- Format-based parser dispatch in IngestionService with dataset_type stored on dataset record
+- PATCH /annotations/{id}/category endpoint for classification label editing
+- Classification-aware statistics (gt_annotations = distinct labeled images)
+
+## Task Commits
+
+Each task was committed atomically:
+
+1. **Task 1: Schema migration, Pydantic models, and ClassificationJSONLParser** - `5264e51` (feat)
+2. **Task 2: FolderScanner detection, IngestionService dispatch, and API endpoints** - `8af8a11` (feat)
+
+## Files Created/Modified
+- `app/ingestion/classification_jsonl_parser.py` - New parser extending BaseParser with sentinel bbox annotations
+- `app/repositories/duckdb_repo.py` - dataset_type column migration
+- `app/models/dataset.py` - dataset_type field on DatasetResponse
+- `app/models/scan.py` - format field on ImportRequest
+- `app/models/annotation.py` - CategoryUpdateRequest model
+- `app/ingestion/base_parser.py` - image_dir parameter on build_image_batches ABC
+- `app/services/folder_scanner.py` - Layout D/E detectors, GCS classification detection, _is_classification_jsonl
+- `app/services/ingestion.py` - Format dispatch, dataset_type on INSERT, format threading
+- `app/routers/ingestion.py` - Format passthrough, .jsonl in browse, updated error message
+- `app/routers/datasets.py` - dataset_type in SELECT and DatasetResponse mapping
+- `app/routers/annotations.py` - PATCH /annotations/{id}/category endpoint
+- `app/routers/statistics.py` - Classification-aware gt_annotations aggregation
+
+## Decisions Made
+- Classification JSONL layouts checked before COCO layouts since JSONL files are never COCO (more specific detection first)
+- Used sentinel bbox values (all 0.0) for classification annotations, matching the project decision to avoid nullable columns
+- gt_annotations stat for classification uses COUNT(DISTINCT sample_id) to represent "labeled images" rather than raw annotation count
+- Added .jsonl to browse endpoint extensions for file navigation
+
+## Deviations from Plan
+
+### Auto-fixed Issues
+
+**1. [Rule 2 - Missing Critical] Added .jsonl to browse endpoint file extensions**
+- **Found during:** Task 2 (API endpoints)
+- **Issue:** Browse endpoint only showed .json files, users couldn't see .jsonl files when navigating
+- **Fix:** Added ".jsonl" to _BROWSE_EXTENSIONS set
+- **Files modified:** app/routers/ingestion.py
+- **Verification:** Import and app start verified
+- **Committed in:** 8af8a11 (Task 2 commit)
+
+---
+
+**Total deviations:** 1 auto-fixed (1 missing critical)
+**Impact on plan:** Minor addition necessary for classification JSONL usability. No scope creep.
+
+## Issues Encountered
+None
+
+## User Setup Required
+None - no external service configuration required.
+
+## Next Phase Readiness
+- Backend fully supports classification dataset ingestion, ready for frontend display work in Plan 02
+- Parser dispatch is extensible for future formats (YOLO, VOC, etc.)
+- dataset_type field available for frontend to branch display logic
+
+---
+*Phase: 15-classification-ingestion-display*
+*Completed: 2026-02-18*
diff --git a/.planning/phases/15-classification-ingestion-display/15-02-PLAN.md b/.planning/phases/15-classification-ingestion-display/15-02-PLAN.md
new file mode 100644
index 0000000..3c9997c
--- /dev/null
+++ b/.planning/phases/15-classification-ingestion-display/15-02-PLAN.md
@@ -0,0 +1,232 @@
+---
+phase: 15-classification-ingestion-display
+plan: 02
+type: execute
+wave: 2
+depends_on: ["15-01"]
+files_modified:
+ - frontend/src/types/dataset.ts
+ - frontend/src/types/scan.ts
+ - frontend/src/app/datasets/[datasetId]/page.tsx
+ - frontend/src/components/grid/grid-cell.tsx
+ - frontend/src/components/grid/image-grid.tsx
+ - frontend/src/components/detail/sample-modal.tsx
+ - frontend/src/components/detail/annotation-list.tsx
+ - frontend/src/components/stats/stats-dashboard.tsx
+ - frontend/src/components/stats/annotation-summary.tsx
+ - frontend/src/components/ingest/scan-results.tsx
+autonomous: true
+
+must_haves:
+ truths:
+ - "User sees class label badges on grid thumbnails for classification datasets instead of bbox overlays"
+ - "User sees GT class label prominently in sample detail modal with a dropdown to change it"
+ - "Statistics dashboard shows 'Labeled Images' and 'Classes' instead of 'GT Annotations' and 'Categories' for classification datasets"
+ - "Detection-only elements (bbox area histogram, IoU slider) are hidden for classification datasets"
+ - "Scan results page shows 'Classification JSONL' format badge for classification datasets"
+ artifacts:
+ - path: "frontend/src/types/dataset.ts"
+ provides: "Dataset type with dataset_type field"
+ contains: "dataset_type"
+ - path: "frontend/src/components/grid/grid-cell.tsx"
+ provides: "ClassBadge rendering for classification datasets"
+ contains: "ClassBadge"
+ - path: "frontend/src/components/detail/sample-modal.tsx"
+ provides: "Class label display and dropdown editor"
+ contains: "classification"
+ - path: "frontend/src/components/stats/stats-dashboard.tsx"
+ provides: "Detection-only tab hiding for classification"
+ contains: "datasetType"
+ key_links:
+ - from: "frontend/src/app/datasets/[datasetId]/page.tsx"
+ to: "frontend/src/components/grid/image-grid.tsx"
+ via: "datasetType prop threading"
+ pattern: "datasetType"
+ - from: "frontend/src/components/grid/grid-cell.tsx"
+ to: "frontend/src/types/dataset.ts"
+ via: "dataset_type determines badge vs overlay"
+ pattern: "classification"
+ - from: "frontend/src/components/detail/sample-modal.tsx"
+ to: "PATCH /annotations/{id}/category"
+ via: "category update mutation"
+ pattern: "category"
+---
+
+
+Adapt the frontend to display classification datasets appropriately: class label badges on grid, class label with dropdown in detail modal, classification-aware statistics, and format badge in scan results.
+
+Purpose: Users browsing classification datasets see class-appropriate UI instead of detection-oriented displays (no bbox overlays, no area histograms). Classification labels are the primary annotation visual.
+Output: Updated grid, modal, stats, and scan results components with datasetType-aware branching.
+
+
+
+@/Users/ortizeg/.claude/get-shit-done/workflows/execute-plan.md
+@/Users/ortizeg/.claude/get-shit-done/templates/summary.md
+
+
+
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/phases/15-classification-ingestion-display/15-RESEARCH.md
+@.planning/phases/15-classification-ingestion-display/15-01-SUMMARY.md
+@frontend/src/types/dataset.ts
+@frontend/src/types/scan.ts
+@frontend/src/app/datasets/[datasetId]/page.tsx
+@frontend/src/components/grid/grid-cell.tsx
+@frontend/src/components/grid/image-grid.tsx
+@frontend/src/components/detail/sample-modal.tsx
+@frontend/src/components/detail/annotation-list.tsx
+@frontend/src/components/stats/stats-dashboard.tsx
+@frontend/src/components/stats/annotation-summary.tsx
+@frontend/src/components/ingest/scan-results.tsx
+
+
+
+
+
+ Task 1: Types, page threading, grid class badges, and scan results format badge
+
+ frontend/src/types/dataset.ts
+ frontend/src/types/scan.ts
+ frontend/src/app/datasets/[datasetId]/page.tsx
+ frontend/src/components/grid/grid-cell.tsx
+ frontend/src/components/grid/image-grid.tsx
+ frontend/src/components/ingest/scan-results.tsx
+
+
+ **1. TypeScript types**:
+ - `frontend/src/types/dataset.ts`: Add `dataset_type: string;` to the `Dataset` interface (after `prediction_count`). Default is `"detection"`.
+ - `frontend/src/types/scan.ts`: No structural change needed -- `ScanResult.format` is already a string and will carry `"classification_jsonl"`.
+
+ **2. Dataset page prop threading** (`frontend/src/app/datasets/[datasetId]/page.tsx`):
+ - The page fetches the dataset object which now includes `dataset_type`.
+ - Thread `datasetType={dataset.dataset_type}` as a prop to ``, ``, and `` (and any stats sub-components that need it).
+ - Also thread it to any component that renders differently for classification vs detection.
+
+ **3. Grid class badges** (`frontend/src/components/grid/grid-cell.tsx`):
+ - Add `datasetType?: string` prop to GridCell.
+ - When `datasetType === "classification"`:
+ - Do NOT render `` (skip bbox rendering entirely)
+ - Instead render a `ClassBadge` inline component:
+ ```tsx
+ function ClassBadge({ label }: { label?: string }) {
+ if (!label) return null;
+ return (
+
+
+ {label}
+
+
+ );
+ }
+ ```
+ - Extract the GT annotation's `category_name` from the annotations map for this sample: `const gtAnnotation = annotations?.find(a => a.source === "ground_truth");`
+ - Render ``
+ - When `datasetType !== "classification"` (or undefined): render existing `` as before (no change).
+
+ **4. ImageGrid prop threading** (`frontend/src/components/grid/image-grid.tsx`):
+ - Add `datasetType?: string` prop to ImageGrid.
+ - Pass it through to each ``.
+
+ **5. Scan results format badge** (`frontend/src/components/ingest/scan-results.tsx`):
+ - Where the format is displayed, show "Classification JSONL" when `format === "classification_jsonl"` and "COCO" when `format === "coco"`.
+ - Use the existing badge/styling pattern (likely a colored span). Example: a small badge showing the format type near the dataset name.
+
+
+ - `cd frontend && npx tsc --noEmit` passes without errors
+ - `cd frontend && npm run build` succeeds
+ - Grep confirms: `grep -r "ClassBadge" frontend/src/components/grid/grid-cell.tsx`
+ - Grep confirms: `grep -r "datasetType" frontend/src/app/datasets/*/page.tsx`
+
+ Dataset type flows from API through page to grid. Classification datasets show class label badges instead of bbox overlays. Scan results show format badge. TypeScript compiles cleanly.
+
+
+
+ Task 2: Detail modal class label display/edit and classification-aware statistics
+
+ frontend/src/components/detail/sample-modal.tsx
+ frontend/src/components/detail/annotation-list.tsx
+ frontend/src/components/stats/stats-dashboard.tsx
+ frontend/src/components/stats/annotation-summary.tsx
+
+
+ **1. Sample modal** (`frontend/src/components/detail/sample-modal.tsx`):
+ - Add `datasetType?: string` prop.
+ - Pass `datasetType` down to child components: `` in the render.
+ - When `datasetType === "classification"`:
+ - Show a prominent class label section above or instead of the annotation overlay. Display format:
+ ```
+ Class: [dropdown with all categories]
+ ```
+ - Extract GT annotation: `const gtAnnotation = annotations?.find(a => a.source === "ground_truth");`
+ - If predictions exist, also show: `Predicted: [predicted class label]` with confidence if available.
+ - The class dropdown uses the categories list (from `useFilterFacets` or a categories fetch). On change, call `PATCH /annotations/{gtAnnotation.id}/category` with the new `category_name`.
+ - Create a TanStack Query mutation hook inline or in a hooks file: `usePatchCategory` that calls `apiPatch(\`/annotations/\${annotationId}/category\`, { category_name })` and invalidates the annotation queries on success.
+ - Do NOT render the annotation overlay / bounding box editor for classification datasets. Hide the bbox editing canvas (react-konva editor). The image should display without any overlay.
+ - When `datasetType !== "classification"`: render everything as before (no change).
+
+ **2. Annotation list** (`frontend/src/components/detail/annotation-list.tsx`):
+ - Add `datasetType?: string` prop.
+ - When `datasetType === "classification"`:
+ - Hide the Bounding Box columns (bbox_x, bbox_y, bbox_w, bbox_h) and Area column from the table.
+ - Show: Class, Source, Confidence columns only.
+ - When detection: show all columns as before.
+
+ **3. Stats dashboard** (`frontend/src/components/stats/stats-dashboard.tsx`):
+ - Add `datasetType?: string` prop.
+ - When `datasetType === "classification"`:
+ - Hide the "Evaluation" tab entirely (no IoU-based evaluation for classification in this phase).
+ - Hide the "Error Analysis" sub-panel (detection-specific error categories: TP/FP/FN based on IoU).
+ - Keep: Class Distribution chart, Split Breakdown chart, Summary cards (with relabeled metrics).
+ - Hide: Any bbox area histogram or IoU-related controls.
+ - When detection: show all tabs/panels as before.
+
+ **4. Annotation summary** (`frontend/src/components/stats/annotation-summary.tsx`):
+ - Add `datasetType?: string` prop.
+ - When `datasetType === "classification"`:
+ - Swap summary card labels:
+ - "GT Annotations" -> "Labeled Images"
+ - "Categories" -> "Classes"
+ - Keep "Total Images" and "Predictions" labels as-is.
+ - When detection: show original labels.
+ - Use a conditional card definitions array pattern:
+ ```tsx
+ const cards = datasetType === "classification"
+ ? CLASSIFICATION_CARDS
+ : DETECTION_CARDS;
+ ```
+
+
+ - `cd frontend && npx tsc --noEmit` passes without errors
+ - `cd frontend && npm run build` succeeds
+ - Grep confirms classification branching: `grep -r "classification" frontend/src/components/detail/sample-modal.tsx`
+ - Grep confirms stats adaptation: `grep -r "classification" frontend/src/components/stats/stats-dashboard.tsx`
+ - Grep confirms annotation-summary adaptation: `grep -r "Labeled Images" frontend/src/components/stats/annotation-summary.tsx`
+
+ Detail modal shows class label with editable dropdown for classification datasets. Annotation list hides bbox columns for classification. Stats dashboard hides detection-only tabs/panels. Summary cards use classification-appropriate labels. PATCH mutation for category update wired. All TypeScript compiles cleanly.
+
+
+
+
+
+1. Classification dataset grid shows class label badges (no bbox overlays)
+2. Detail modal shows "Class: [dropdown]" for classification, with working category edit
+3. Stats dashboard hides Evaluation tab and Error Analysis for classification
+4. Summary cards show "Labeled Images" and "Classes" for classification
+5. Annotation list hides bbox/area columns for classification
+6. Scan results show "Classification JSONL" format badge
+7. Detection datasets are completely unaffected (no regression)
+8. TypeScript compiles and Next.js builds succeed
+
+
+
+- Classification datasets display class badges on grid thumbnails
+- Detail modal has class label display with dropdown editor that persists changes
+- Statistics dashboard shows only classification-relevant metrics
+- Detection workflow is unchanged
+- Frontend builds without errors
+
+
+
diff --git a/.planning/phases/15-classification-ingestion-display/15-02-SUMMARY.md b/.planning/phases/15-classification-ingestion-display/15-02-SUMMARY.md
new file mode 100644
index 0000000..3127e00
--- /dev/null
+++ b/.planning/phases/15-classification-ingestion-display/15-02-SUMMARY.md
@@ -0,0 +1,111 @@
+---
+phase: 15-classification-ingestion-display
+plan: 02
+subsystem: frontend, ui
+tags: [classification, react, tanstack-query, dataset-type, class-badge, dropdown-editor]
+
+requires:
+ - phase: 15-classification-ingestion-display
+ plan: 01
+ provides: "dataset_type field, PATCH /annotations/{id}/category, classification-aware statistics"
+provides:
+ - ClassBadge grid overlay for classification datasets
+ - Class label dropdown editor in detail modal with PATCH mutation
+ - Classification-aware statistics dashboard (hidden detection tabs)
+ - Classification-appropriate summary card labels
+ - Format badge in scan results for classification JSONL
+affects: [16-classification-evaluation, frontend-polish]
+
+tech-stack:
+ added: []
+ patterns: [datasetType-prop-threading, isClassification-branching-at-component-boundaries]
+
+key-files:
+ created: []
+ modified:
+ - frontend/src/types/dataset.ts
+ - frontend/src/app/datasets/[datasetId]/page.tsx
+ - frontend/src/components/grid/grid-cell.tsx
+ - frontend/src/components/grid/image-grid.tsx
+ - frontend/src/components/ingest/scan-results.tsx
+ - frontend/src/components/detail/sample-modal.tsx
+ - frontend/src/components/detail/annotation-list.tsx
+ - frontend/src/components/stats/stats-dashboard.tsx
+ - frontend/src/components/stats/annotation-summary.tsx
+
+key-decisions:
+ - "Thread datasetType from page level, branch at component boundaries with isClassification flag"
+ - "Hide entire edit toolbar and annotation editor for classification (no bbox editing needed)"
+ - "Hide Evaluation, Error Analysis, Worst Images, and Intelligence tabs for classification (IoU-based)"
+ - "Keep Near Duplicates tab visible for classification (embedding-based, not IoU-dependent)"
+
+patterns-established:
+ - "datasetType prop threading: page fetches dataset, threads type to all children"
+ - "isClassification branching: components check datasetType === 'classification' to show/hide detection UI"
+
+duration: 5min
+completed: 2026-02-18
+---
+
+# Phase 15 Plan 02: Classification Frontend Display Summary
+
+**Classification-aware grid badges, modal class dropdown editor, and detection-tab hiding via datasetType prop threading**
+
+## Performance
+
+- **Duration:** 5 min
+- **Started:** 2026-02-19T02:20:50Z
+- **Completed:** 2026-02-19T02:25:44Z
+- **Tasks:** 2
+- **Files modified:** 9
+
+## Accomplishments
+- Grid shows class label badges instead of bbox overlays for classification datasets
+- Detail modal displays class dropdown editor with PATCH category mutation and predicted class with confidence
+- Statistics dashboard hides detection-only tabs (Evaluation, Error Analysis, Worst Images, Intelligence)
+- Summary cards show "Labeled Images" and "Classes" labels for classification datasets
+- Annotation list hides bbox and area columns for classification
+- Scan results show "Classification JSONL" format badge
+
+## Task Commits
+
+Each task was committed atomically:
+
+1. **Task 1: Types, page threading, grid class badges, and scan results format badge** - `b96ce5e` (feat)
+2. **Task 2: Detail modal class label display/edit and classification-aware statistics** - `e7ad776` (feat)
+
+## Files Created/Modified
+- `frontend/src/types/dataset.ts` - Added dataset_type field to Dataset interface
+- `frontend/src/app/datasets/[datasetId]/page.tsx` - Thread datasetType prop to ImageGrid, SampleModal, StatsDashboard
+- `frontend/src/components/grid/grid-cell.tsx` - ClassBadge component, classification branching in overlay
+- `frontend/src/components/grid/image-grid.tsx` - datasetType prop acceptance and passthrough
+- `frontend/src/components/ingest/scan-results.tsx` - "Classification JSONL" friendly format badge
+- `frontend/src/components/detail/sample-modal.tsx` - Class dropdown editor, PATCH mutation, hide bbox editor/toolbar
+- `frontend/src/components/detail/annotation-list.tsx` - Hide bbox/area columns for classification
+- `frontend/src/components/stats/stats-dashboard.tsx` - Hide detection-only tabs for classification
+- `frontend/src/components/stats/annotation-summary.tsx` - Classification card labels (Labeled Images, Classes)
+
+## Decisions Made
+- Thread datasetType from page level, branch at component boundaries -- consistent pattern, easy to test
+- Hide entire edit toolbar and annotation editor for classification (no bounding boxes to edit)
+- Hide Evaluation/Error Analysis/Worst Images/Intelligence tabs for classification (all IoU-based detection features)
+- Keep Near Duplicates tab visible for classification since it uses embeddings, not IoU
+
+## Deviations from Plan
+
+None - plan executed exactly as written.
+
+## Issues Encountered
+None
+
+## User Setup Required
+None - no external service configuration required.
+
+## Next Phase Readiness
+- Frontend fully supports classification dataset display, ready for classification evaluation in Phase 16
+- datasetType prop threading pattern established for any future dataset-type-specific UI
+- PATCH /annotations/{id}/category wired end-to-end for label editing
+
+---
+*Phase: 15-classification-ingestion-display*
+*Completed: 2026-02-18*
diff --git a/.planning/phases/15-classification-ingestion-display/15-RESEARCH.md b/.planning/phases/15-classification-ingestion-display/15-RESEARCH.md
new file mode 100644
index 0000000..3d60dc5
--- /dev/null
+++ b/.planning/phases/15-classification-ingestion-display/15-RESEARCH.md
@@ -0,0 +1,555 @@
+# Phase 15: Classification Ingestion & Display - Research
+
+**Researched:** 2026-02-18
+**Domain:** Classification dataset ingestion, schema extension, frontend display adaptation
+**Confidence:** HIGH (this is internal codebase extension, not new technology)
+
+## Summary
+
+Phase 15 adds classification dataset support to a codebase currently built exclusively for object detection. The work spans four layers: (1) a new JSONL annotation parser and format auto-detection in the ingestion pipeline, (2) schema changes to track dataset type and store classification annotations using sentinel bbox values, (3) frontend grid/modal display changes to show class labels instead of bounding boxes, and (4) statistics dashboard adaptation to hide detection-only metrics.
+
+The codebase is well-structured with clear separation of concerns -- parsers in `app/ingestion/`, Pydantic models in `app/models/`, services in `app/services/`, and component-per-feature in `frontend/src/components/`. The existing `BaseParser` ABC and streaming batch pattern provide a natural extension point for a classification JSONL parser. The sentinel bbox approach (bbox values = 0.0) means the annotations table schema is untouched, avoiding null guards in 30+ SQL queries and frontend components.
+
+**Primary recommendation:** Extend the existing parser registry pattern with a `ClassificationJSONLParser` that produces annotation rows with sentinel bbox values (0.0), add `dataset_type VARCHAR DEFAULT 'detection'` to the datasets table, and use the `datasetType` prop threaded from the page level to branch rendering at component boundaries (grid cell, sample modal, stats dashboard).
+
+## Standard Stack
+
+### Core (already in use -- no new dependencies)
+
+| Library | Purpose | Status |
+|---------|---------|--------|
+| DuckDB | Schema storage, SQL queries | In use |
+| FastAPI | API layer | In use |
+| Pydantic | Request/response models | In use |
+| ijson | Streaming JSON parsing | In use (COCO parser) |
+| pandas | DataFrame batch construction | In use |
+| Next.js + React | Frontend framework | In use |
+| Zustand | State management | In use |
+| TanStack Query | Data fetching/caching | In use |
+| Recharts | Charts (class distribution) | In use |
+
+### Supporting (no new libraries needed)
+
+This phase requires zero new dependencies. Classification JSONL files are simple enough to parse with Python's built-in `json` module line-by-line, or with the existing `ijson` dependency if streaming is desired. The frontend changes are pure React component branching.
+
+### Alternatives Considered
+
+| Instead of | Could Use | Tradeoff |
+|------------|-----------|----------|
+| Sentinel bbox (0.0) | Nullable bbox columns | Nullable requires 30+ null guards in SQL queries, filter builder, evaluation, frontend annotation types. Sentinel avoids this entirely. |
+| Separate classification_annotations table | Shared annotations table with sentinels | Separate table would require duplicating all annotation queries, filter logic, statistics queries. Shared table is simpler. |
+| Dynamic format detection at query time | Stored `dataset_type` column | Stored column is a single lookup; dynamic detection requires scanning annotations for non-zero bboxes every time. |
+
+## Architecture Patterns
+
+### Recommended Change Map
+
+```
+Backend:
+ app/ingestion/
+ classification_jsonl_parser.py # NEW: ClassificationJSONLParser
+ app/services/
+ folder_scanner.py # MODIFY: detect JSONL + images layout
+ ingestion.py # MODIFY: dispatch to parser by format
+ evaluation.py # LEAVE (classification eval is Phase 16+)
+ app/repositories/
+ duckdb_repo.py # MODIFY: add dataset_type column
+ app/models/
+ dataset.py # MODIFY: add dataset_type field
+ scan.py # MODIFY: format can be "classification_jsonl"
+ annotation.py # NO CHANGE (sentinel bbox values fit existing schema)
+ statistics.py # POSSIBLY MODIFY: add labeled_images_count
+ app/routers/
+ ingestion.py # MODIFY: error message wording
+ statistics.py # MODIFY: classification-aware summary stats
+
+Frontend:
+ types/dataset.ts # MODIFY: add dataset_type field
+ types/scan.ts # MODIFY: format can include "classification_jsonl"
+ app/datasets/[datasetId]/page.tsx # MODIFY: thread datasetType prop
+ components/grid/grid-cell.tsx # MODIFY: show class badge instead of bbox overlay
+ components/grid/annotation-overlay.tsx # NO CHANGE (just not rendered for classification)
+ components/detail/sample-modal.tsx # MODIFY: show class label + dropdown
+ components/detail/annotation-list.tsx # MODIFY: hide bbox columns for classification
+ components/stats/stats-dashboard.tsx # MODIFY: hide detection-only tabs
+ components/stats/annotation-summary.tsx # MODIFY: classification-appropriate labels
+ components/ingest/scan-results.tsx # MODIFY: show format badge for classification
+```
+
+### Pattern 1: Sentinel BBox Values for Classification
+
+**What:** Classification annotations use bbox_x=0, bbox_y=0, bbox_w=0, bbox_h=0, area=0 as sentinel values. The `category_name` field carries the class label. One annotation per sample (for single-label classification).
+
+**When to use:** When inserting classification annotations into the shared annotations table.
+
+**Example:**
+```python
+# Classification annotation row (sentinel bboxes)
+{
+ "id": str(uuid.uuid4()),
+ "dataset_id": dataset_id,
+ "sample_id": sample_id,
+ "category_name": "dog", # The class label
+ "bbox_x": 0.0, # Sentinel
+ "bbox_y": 0.0, # Sentinel
+ "bbox_w": 0.0, # Sentinel
+ "bbox_h": 0.0, # Sentinel
+ "area": 0.0, # Sentinel
+ "is_crowd": False,
+ "source": "ground_truth",
+ "confidence": None,
+ "metadata": None,
+}
+```
+
+### Pattern 2: Parser Dispatch by Format
+
+**What:** The IngestionService currently hardcodes `COCOParser()`. Extend to dispatch by format string.
+
+**When to use:** When `ingest_with_progress` is called.
+
+**Example:**
+```python
+# In IngestionService.ingest_with_progress():
+if format == "coco":
+ parser = COCOParser(batch_size=1000)
+elif format == "classification_jsonl":
+ parser = ClassificationJSONLParser(batch_size=1000)
+else:
+ raise ValueError(f"Unsupported format: {format}")
+```
+
+### Pattern 3: Format Auto-Detection in FolderScanner
+
+**What:** The FolderScanner currently only detects COCO JSON files. Extend to detect classification JSONL files.
+
+**When to use:** During `FolderScanner.scan()`.
+
+**Detection heuristic:** Look for `.jsonl` files in the directory tree. A classification JSONL file contains lines like:
+```json
+{"filename": "image001.jpg", "label": "dog"}
+```
+Peek at the first few lines: if they parse as JSON with `filename` and `label` keys (no `bbox`/`annotations` key), classify as `classification_jsonl`.
+
+**Example:**
+```python
+@staticmethod
+def _is_classification_jsonl(file_path: Path) -> bool:
+ """Check if a file is a classification JSONL annotation file."""
+ try:
+ with open(file_path) as f:
+ for i, line in enumerate(f):
+ if i >= 5:
+ break
+ line = line.strip()
+ if not line:
+ continue
+ obj = json.loads(line)
+ if "label" in obj and ("filename" in obj or "file_name" in obj):
+ if "bbox" not in obj and "annotations" not in obj:
+ return True
+ return False
+ return False
+ except Exception:
+ return False
+```
+
+### Pattern 4: datasetType Prop Threading
+
+**What:** The dataset page fetches `dataset.dataset_type` and threads it as a prop to child components. Components branch at their boundary rather than deep inside.
+
+**When to use:** Any component whose rendering differs between detection and classification.
+
+**Example:**
+```tsx
+// page.tsx
+
+
+
+
+// grid-cell.tsx
+if (datasetType === "classification") {
+ // Show class label badge instead of AnnotationOverlay
+ const gtAnnotation = annotations.find(a => a.source === "ground_truth");
+ return ;
+} else {
+ return ;
+}
+```
+
+### Anti-Patterns to Avoid
+
+- **Checking dataset_type deep inside components:** Branch at component boundaries (GridCell, SampleModal, StatsDashboard), not inside utility functions or hooks that are shared across both types.
+- **Adding nullable bbox columns:** The sentinel approach was a prior decision. Do not add nullable bbox columns to the annotations table.
+- **Modifying the existing 560-line evaluation.py:** Classification evaluation is separate (~50 lines, Phase 16+). Do not touch `evaluation.py` in this phase.
+- **Storing dataset_type on samples:** It belongs on the datasets table -- one type per dataset, not per sample.
+
+## Don't Hand-Roll
+
+| Problem | Don't Build | Use Instead | Why |
+|---------|-------------|-------------|-----|
+| JSONL parsing | Custom streaming parser | Python `json.loads()` per line | JSONL files are small enough (one line per image), no need for ijson streaming |
+| Image dimension reading | Manual PIL/cv2 calls | Existing `ImageService` | Already handles dimension extraction during thumbnail generation |
+| SQL schema migration | Migration framework | `ALTER TABLE ... ADD COLUMN IF NOT EXISTS` | Already established pattern (see `duckdb_repo.py` lines 84-103) |
+| Frontend format badge | Custom badge component | Tailwind utility classes inline | Consistent with existing scan-results.tsx `splitColor()` pattern |
+
+**Key insight:** This phase is mostly wiring -- connecting an existing architecture to a new data shape. The risky parts are not technical but completeness: ensuring every SQL query, every frontend component, and every display path handles the classification case.
+
+## Common Pitfalls
+
+### Pitfall 1: JSONL Format Ambiguity
+
+**What goes wrong:** Different classification tools produce different JSONL schemas. Some use `"label"`, others `"class"`, `"category"`, or `"class_name"`. Some use `"filename"`, others `"file_name"`, `"image"`, or `"path"`.
+
+**Why it happens:** No industry standard for classification JSONL format.
+
+**How to avoid:** Support the most common key variants in the parser. Normalize on read:
+```python
+filename = obj.get("filename") or obj.get("file_name") or obj.get("image") or obj.get("path", "")
+label = obj.get("label") or obj.get("class") or obj.get("category") or obj.get("class_name", "unknown")
+```
+
+**Warning signs:** Parser silently produces zero annotations because key names don't match.
+
+### Pitfall 2: Classification Samples Without Annotations
+
+**What goes wrong:** If an image file exists in the directory but has no line in the JSONL, it gets inserted as a sample with zero annotations. The grid shows it with no badge, confusingly.
+
+**Why it happens:** JSONL may not list every image (unlabeled images are common in classification datasets).
+
+**How to avoid:** During ingestion, only insert samples that appear in the JSONL file. Or, insert all images but mark unlabeled ones clearly in the UI. Decision: follow the COCO parser pattern -- only insert samples listed in the annotation file.
+
+**Warning signs:** Image count in dataset doesn't match directory image count.
+
+### Pitfall 3: Detection-Only UI Elements Leaking Through
+
+**What goes wrong:** Classification datasets show bbox area histograms, IoU sliders, or empty bounding box overlays with sentinel values rendered as tiny dots at (0,0).
+
+**Why it happens:** Forgetting to gate UI elements on `datasetType`.
+
+**How to avoid:** Audit every component that references bbox values or detection-specific concepts:
+- `AnnotationOverlay` -- skip rendering when `datasetType === "classification"`
+- `annotation-list.tsx` -- hide Bounding Box and Area columns
+- `evaluation-panel.tsx` -- hide IoU slider, use accuracy instead of mAP
+- `stats-dashboard.tsx` -- rename "GT Annotations" to "Labeled Images"
+- `annotation-summary.tsx` -- swap card labels
+
+**Warning signs:** Sentinel bbox values (0,0,0,0) rendered visually anywhere.
+
+### Pitfall 4: Category Ingestion for Classification
+
+**What goes wrong:** The COCO parser extracts categories from a dedicated `categories` array. Classification JSONL files don't have one -- categories are implicitly defined by the set of unique labels.
+
+**Why it happens:** Different format, different category discovery mechanism.
+
+**How to avoid:** The ClassificationJSONLParser must do a first pass to collect unique labels, assign sequential category IDs, then do a second pass to emit annotation batches. Or, single pass collecting labels as encountered.
+
+**Warning signs:** Empty categories table for classification datasets, breaking filter facets.
+
+### Pitfall 5: Multi-Label Classification Collision
+
+**What goes wrong:** If a future dataset has multiple labels per image, the single-annotation-per-sample assumption breaks.
+
+**Why it happens:** Single-label is the common case, but multi-label exists.
+
+**How to avoid:** Design the JSONL parser to handle `"label": ["dog", "outdoor"]` by emitting multiple annotation rows per sample. The sentinel bbox approach supports this naturally (each annotation row has its own category_name). But for Phase 15, scope to single-label only and document the multi-label extension path.
+
+**Warning signs:** JSONL lines with array-valued `label` fields.
+
+## Code Examples
+
+### Classification JSONL Parser Structure
+
+```python
+class ClassificationJSONLParser(BaseParser):
+ """Parse a JSONL file where each line maps filename -> class label."""
+
+ @property
+ def format_name(self) -> str:
+ return "classification_jsonl"
+
+ def parse_categories(self, file_path: Path) -> dict[int, str]:
+ """First pass: collect unique labels -> sequential IDs."""
+ labels: set[str] = set()
+ with open(file_path) as f:
+ for line in f:
+ line = line.strip()
+ if not line:
+ continue
+ obj = json.loads(line)
+ label = obj.get("label") or obj.get("class") or obj.get("category", "unknown")
+ labels.add(label)
+ return {i: name for i, name in enumerate(sorted(labels))}
+
+ def build_image_batches(
+ self, file_path: Path, dataset_id: str, split: str | None = None, image_dir: str = ""
+ ) -> Iterator[pd.DataFrame]:
+ """Yield sample rows from JSONL. Each line = one image."""
+ batch = []
+ for i, line in enumerate(open(file_path)):
+ line = line.strip()
+ if not line:
+ continue
+ obj = json.loads(line)
+ filename = obj.get("filename") or obj.get("file_name", "")
+ sample_id = f"{split}_{i}" if split else str(i)
+ batch.append({
+ "id": sample_id,
+ "dataset_id": dataset_id,
+ "file_name": filename,
+ "width": obj.get("width", 0),
+ "height": obj.get("height", 0),
+ "thumbnail_path": None,
+ "split": split,
+ "metadata": None,
+ "image_dir": image_dir,
+ })
+ if len(batch) >= self.batch_size:
+ yield pd.DataFrame(batch)
+ batch = []
+ if batch:
+ yield pd.DataFrame(batch)
+
+ def build_annotation_batches(
+ self, file_path: Path, dataset_id: str, categories: dict[int, str], split: str | None = None
+ ) -> Iterator[pd.DataFrame]:
+ """Yield annotation rows with sentinel bbox values."""
+ batch = []
+ cat_name_to_id = {v: k for k, v in categories.items()}
+ for i, line in enumerate(open(file_path)):
+ line = line.strip()
+ if not line:
+ continue
+ obj = json.loads(line)
+ label = obj.get("label") or obj.get("class") or obj.get("category", "unknown")
+ sample_id = f"{split}_{i}" if split else str(i)
+ ann_id = f"{split}_ann_{i}" if split else f"ann_{i}"
+ batch.append({
+ "id": ann_id,
+ "dataset_id": dataset_id,
+ "sample_id": sample_id,
+ "category_name": label,
+ "bbox_x": 0.0,
+ "bbox_y": 0.0,
+ "bbox_w": 0.0,
+ "bbox_h": 0.0,
+ "area": 0.0,
+ "is_crowd": False,
+ "source": "ground_truth",
+ "confidence": None,
+ "metadata": None,
+ })
+ if len(batch) >= self.batch_size:
+ yield pd.DataFrame(batch)
+ batch = []
+ if batch:
+ yield pd.DataFrame(batch)
+```
+
+### Schema Migration (DuckDB)
+
+```python
+# In duckdb_repo.py initialize_schema():
+self.connection.execute(
+ "ALTER TABLE datasets ADD COLUMN IF NOT EXISTS dataset_type VARCHAR DEFAULT 'detection'"
+)
+```
+
+### Frontend Class Badge (Grid Cell)
+
+```tsx
+// Inside GridCell, replacing AnnotationOverlay for classification datasets:
+function ClassBadge({ label }: { label?: string }) {
+ if (!label) return null;
+ return (
+
+
+ {label}
+
+
+ );
+}
+```
+
+### Frontend Class Label in Detail Modal
+
+```tsx
+// In SampleModal, for classification datasets:
+// Show GT class label prominently with dropdown to change it
+
+ Class:
+
+
+```
+
+### Classification-Aware Statistics Summary
+
+```tsx
+// In AnnotationSummary, swap card definitions based on datasetType:
+const DETECTION_CARDS = [
+ { key: "total_images", label: "Total Images" },
+ { key: "gt_annotations", label: "GT Annotations" },
+ { key: "pred_annotations", label: "Predictions" },
+ { key: "total_categories", label: "Categories" },
+];
+
+const CLASSIFICATION_CARDS = [
+ { key: "total_images", label: "Total Images" },
+ { key: "gt_annotations", label: "Labeled Images" },
+ { key: "pred_annotations", label: "Predictions" },
+ { key: "total_categories", label: "Classes" },
+];
+```
+
+## Existing Codebase Surface Area
+
+### Files That MUST Change
+
+| File | Change | Reason |
+|------|--------|--------|
+| `app/repositories/duckdb_repo.py` | Add `dataset_type` column | INGEST-04 |
+| `app/ingestion/classification_jsonl_parser.py` | NEW file | INGEST-01 |
+| `app/services/folder_scanner.py` | Detect JSONL layouts | INGEST-02 |
+| `app/services/ingestion.py` | Parser dispatch, store dataset_type | INGEST-01, INGEST-02 |
+| `app/models/dataset.py` | Add `dataset_type` to response | INGEST-04 |
+| `app/models/scan.py` | Format can be `classification_jsonl` | INGEST-02 |
+| `app/routers/datasets.py` | Return dataset_type in responses | INGEST-04 |
+| `frontend/src/types/dataset.ts` | Add `dataset_type` field | INGEST-04 |
+| `frontend/src/types/scan.ts` | Format type update | INGEST-02 |
+| `frontend/src/app/datasets/[datasetId]/page.tsx` | Thread `datasetType` prop | DISP-01 through DISP-04 |
+| `frontend/src/components/grid/grid-cell.tsx` | Show class badge for classification | DISP-01 |
+| `frontend/src/components/detail/sample-modal.tsx` | Show class label + dropdown | DISP-02, DISP-03 |
+| `frontend/src/components/detail/annotation-list.tsx` | Hide bbox columns for classification | DISP-02 |
+| `frontend/src/components/stats/stats-dashboard.tsx` | Hide detection-only tabs | DISP-04 |
+| `frontend/src/components/stats/annotation-summary.tsx` | Classification-appropriate labels | DISP-04 |
+| `frontend/src/components/ingest/scan-results.tsx` | Format badge for classification | INGEST-02 |
+
+### Files That SHOULD NOT Change
+
+| File | Reason |
+|------|--------|
+| `app/services/evaluation.py` | Detection evaluation untouched; classification eval is separate (future phase) |
+| `app/ingestion/coco_parser.py` | COCO format unchanged |
+| `app/ingestion/prediction_parser.py` | Detection predictions unchanged |
+| `app/services/error_analysis.py` | Detection-specific error categories |
+| `app/ingestion/detection_annotation_parser.py` | Detection predictions unchanged |
+
+### Backend API Changes Needed
+
+1. **New annotation update endpoint for category_name** (DISP-03): Currently `PUT /annotations/{id}` only updates bbox. Need to add `PATCH /annotations/{id}/category` or extend the existing PUT to accept `category_name`.
+
+2. **Statistics endpoint** (DISP-04): The `GET /datasets/{id}/statistics` endpoint returns detection-centric summary stats. For classification datasets, `gt_annotations` should reflect "labeled images" (distinct sample_ids with GT annotations) rather than raw annotation count.
+
+3. **Dataset response**: `GET /datasets/{id}` needs to include `dataset_type`.
+
+### Classification JSONL Expected Format
+
+```jsonl
+{"filename": "img001.jpg", "label": "cat"}
+{"filename": "img002.jpg", "label": "dog"}
+{"filename": "img003.jpg", "label": "cat"}
+```
+
+Alternative accepted keys:
+- `filename` / `file_name` / `image` / `path`
+- `label` / `class` / `category` / `class_name`
+- Optional: `width`, `height`, `confidence`, `split`
+
+### Folder Layouts to Detect
+
+**Layout D (Classification JSONL):** Split directories with JSONL + images:
+```
+dataset/
+ train/
+ annotations.jsonl
+ img001.jpg
+ img002.jpg
+ val/
+ annotations.jsonl
+ img003.jpg
+```
+
+**Layout E (Flat Classification):** Single JSONL at root:
+```
+dataset/
+ labels.jsonl
+ images/
+ img001.jpg
+ img002.jpg
+```
+
+## State of the Art
+
+| Old Approach | Current Approach | When Changed | Impact |
+|--------------|------------------|--------------|--------|
+| Hard-coded COCO parser | Parser dispatch by format string | Phase 15 | Enables multi-format support |
+| No dataset_type tracking | `dataset_type` column on datasets | Phase 15 | Frontend can branch rendering |
+| Detection-only statistics | Type-aware statistics | Phase 15 | Classification users see relevant metrics |
+
+## Open Questions
+
+1. **Image dimensions for classification JSONL**
+ - What we know: COCO JSON includes width/height per image. Classification JSONL typically doesn't.
+ - What's unclear: Should the parser read image dimensions from disk during ingestion, or store 0/0 and resolve later during thumbnail generation?
+ - Recommendation: Read dimensions during thumbnail generation (existing `ImageService` path). Store 0/0 initially if not present in JSONL. The grid cell uses `object-cover` which doesn't need dimensions. The annotation overlay (not used for classification) needs dimensions. Detail modal image loads at full-res naturally.
+
+2. **Multi-label classification**
+ - What we know: Phase 15 scopes to single-label. Multi-label is a future extension.
+ - What's unclear: Should the JSONL parser error on array labels or silently take the first?
+ - Recommendation: If `label` is an array, emit one annotation row per label. This is forward-compatible and costs nothing with the sentinel bbox approach.
+
+3. **Classification prediction import**
+ - What we know: Detection predictions use `DetectionAnnotationParser` or `PredictionParser`. Classification predictions would be a different format.
+ - What's unclear: Is classification prediction import in scope for Phase 15?
+ - Recommendation: Out of scope. Phase 15 focuses on GT ingestion and display. Classification prediction import + evaluation are natural follow-ups.
+
+4. **Annotation update for category_name change (DISP-03)**
+ - What we know: Current `AnnotationUpdate` model only has bbox fields. Current `PUT /annotations/{id}` only updates bbox.
+ - What's unclear: Should we extend the existing endpoint or create a new one?
+ - Recommendation: Add a new `PATCH /annotations/{id}/category` endpoint or extend `AnnotationUpdate` to include optional `category_name`. Extending is simpler since the existing pattern already handles updates. A new field `category_name: str | None = None` on AnnotationUpdate, applied when present, is clean.
+
+## Sources
+
+### Primary (HIGH confidence)
+- **Codebase analysis** -- direct file reads of all affected files listed above
+ - `app/ingestion/base_parser.py` -- BaseParser ABC interface
+ - `app/ingestion/coco_parser.py` -- reference parser implementation
+ - `app/repositories/duckdb_repo.py` -- schema and migration pattern
+ - `app/services/ingestion.py` -- ingestion orchestration
+ - `app/services/folder_scanner.py` -- format detection heuristics
+ - `app/services/evaluation.py` -- evaluation pipeline (560 lines, leave alone)
+ - `app/models/` -- all Pydantic models
+ - `app/routers/` -- all API endpoints
+ - `frontend/src/components/` -- all display components
+ - `frontend/src/types/` -- all TypeScript type definitions
+ - `frontend/src/stores/` -- Zustand stores (filter, UI, ingest)
+
+### Secondary (MEDIUM confidence)
+- Prior decisions from phase description: sentinel bbox values, separate classification eval function, datasetType prop threading, parser registry
+
+## Metadata
+
+**Confidence breakdown:**
+- Standard stack: HIGH -- no new dependencies, all existing libraries
+- Architecture: HIGH -- extending well-established patterns in the codebase
+- Pitfalls: HIGH -- derived from direct codebase analysis, not external sources
+- Code examples: HIGH -- based on actual codebase patterns and verified file contents
+
+**Research date:** 2026-02-18
+**Valid until:** 2026-03-18 (stable -- internal codebase patterns, no external dependency risk)
diff --git a/.planning/phases/15-classification-ingestion-display/15-VERIFICATION.md b/.planning/phases/15-classification-ingestion-display/15-VERIFICATION.md
new file mode 100644
index 0000000..58773d9
--- /dev/null
+++ b/.planning/phases/15-classification-ingestion-display/15-VERIFICATION.md
@@ -0,0 +1,98 @@
+---
+phase: 15-classification-ingestion-display
+verified: 2026-02-19T02:31:00Z
+status: passed
+score: 5/5 must-haves verified
+re_verification: false
+---
+
+# Phase 15: Classification Ingestion & Display Verification Report
+
+**Phase Goal:** Users can import, browse, and inspect classification datasets with the same ease as detection datasets
+**Verified:** 2026-02-19T02:31:00Z
+**Status:** passed
+**Re-verification:** No — initial verification
+
+## Goal Achievement
+
+### Observable Truths
+
+| # | Truth | Status | Evidence |
+|----|-------|--------|----------|
+| 1 | User can point the ingestion wizard at a folder with JSONL annotations and images, and the system auto-detects it as a classification dataset | VERIFIED | `FolderScanner._try_layout_d` and `_try_layout_e` detect JSONL layouts before COCO; `_is_classification_jsonl` heuristic reads first 5 lines for filename+label keys. GCS path also supported via `_scan_gcs_classification`. `ScanResult.format="classification_jsonl"` returned. |
+| 2 | User can import multi-split classification datasets (train/valid/test) in a single operation, just like detection datasets | VERIFIED | `ImportRequest.format` field added (default `"coco"`, accepts `"classification_jsonl"`). `ingest_splits_with_progress(format=request.format)` threads format into per-split calls. `IngestionService` dispatches to `ClassificationJSONLParser` by format string. `dataset_type="classification"` stored in INSERT. |
+| 3 | User sees class label badges on grid thumbnails instead of bounding box overlays when browsing a classification dataset | VERIFIED | `GridCell` accepts `datasetType?: string`; when `"classification"` renders `` with GT `category_name` instead of ``. `ImageGrid` threads `datasetType` through. Page threads `dataset.dataset_type` to ``. |
+| 4 | User sees GT class label prominently in the sample detail modal and can change it via a dropdown | VERIFIED | `SampleModal` shows `{isClassification &&