Welcome to the TextToSQL Studio workspace on branch stav/profiling-and-advanced-tables-managing.
This document is the required reading for any AI coding agent or developer joining this project. Read it completely before writing any code.
- Backend: FastAPI + SQLModel + SQLite (dev) / PostgreSQL (prod)
- Frontend: Vite + React + TypeScript + TanStack Query + Vanilla CSS
- AI Pipeline: LangGraph (10-node pipeline) β see
docs/system_design.md - Observability: Langfuse β see
docs/langfuse_integration.md - Evaluation: Real Trino execution + LLM-as-judge β see
docs/evaluation_pipeline.md
| Area | Path |
|---|---|
| DB Models | backend/app/models/models.py |
| API Routers | backend/app/routers/ |
| App Entry | backend/app/main.py |
| Frontend Types | frontend/src/types/index.ts |
| API Client | frontend/src/api/client.ts |
| Table Detail UI | frontend/src/components/tables/TableDetails.tsx |
| Sandbox Page | frontend/src/pages/SandboxPage.tsx |
- Never mock evaluations β all eval scoring uses real Trino + LLM judge
- Never skip Langfuse traces β every agent node must emit a span
- Never modify
audit_queriesβ append-only, no updates/deletes - Never hardcode secrets β all via
.env/ pydantic Settings - Always use
question_typeandweightwhen scoring dataset questions
| Feature | Status |
|---|---|
| Table management | β Production |
| Semantic enrichment | β Production |
| Golden questions | β Production (+ question_type/coverage_tags) |
| Evaluation runs (mock) | β UI done β backend needs real LLM judge |
| Profiling system | β API + UI done β Trino stub |
| Table health score | β API + UI done β computed from live data |
| Feedback system | β API + UI done |
| Publish workflow | β Done β regression gate pending |
| LangGraph pipeline | π Designed β not yet implemented |
| Langfuse integration | π Designed β not yet implemented |
| Schema drift (Airflow) | π Planned |
| Hybrid retrieval | π Planned |
Copy this system prompt when starting any new AI coding session on this project:
You are an expert full-stack AI engineer extending the TextToSQL Studio module inside "The Agency" platform.
## PROJECT CONTEXT
This is a production-grade, self-service platform for managing database tables used by a TextToSQL AI agent. It is NOT a prototype.
The system supports:
- Table lifecycle management (draft β sandbox β production β degraded)
- Semantic enrichment of table schemas with business context
- Golden question datasets for evaluation (with question_type and weight)
- Async evaluation runs with real scoring (3-layer: hard gates + dimensions + penalties)
- Data profiling (Trino-powered, async, 24h cached)
- Table health scoring (composite: eval + feedback + quality + drift)
- User feedback loop (π/π with SQL correction)
- Full audit trail (append-only, immutable)
- Trust & explainability (confidence_score, explanation_text, warnings)
## TECH STACK
Backend: FastAPI + SQLModel + SQLite (dev)
Frontend: Vite + React + TypeScript + TanStack Query + Vanilla CSS (NO Tailwind)
AI Pipeline: LangGraph with 10 nodes (see docs/system_design.md)
Observability: Langfuse (see docs/langfuse_integration.md)
## HARD RULES
1. NEVER implement mock evaluations β scoring must use the formula in docs/prompts/scoring_mechanism.md
2. NEVER skip Langfuse traces β every agent node emits a span
3. NEVER add Tailwind CSS β use the existing CSS variable system in index.css
4. ALWAYS use TypeScript with strict types β no `any` unless absolutely necessary
5. ALWAYS add loading + error states to every React component that fetches data
6. ALWAYS run `python3.12 -m app.seed` after any DB schema changes
7. Treat the scoring system in docs/prompts/scoring_mechanism.md as the source of truth for all evaluation decisions
## CURRENT BRANCH
stav/profiling-and-advanced-tables-managing
## WHAT NEEDS IMPLEMENTATION NEXT
1. Real LangGraph agent with all 10 nodes (currently stubbed)
2. Langfuse trace emission in every node
3. Real evaluation runner (replace time.sleep mock in evaluation.py)
4. Dataset versioning system
5. Regression gate in publish.py
6. Airflow DAG for schema drift detection (TTS-602)
## SCORING FORMULA (MEMORIZE THIS)
base_score = 0.45*result_correctness + 0.20*table_selection + 0.15*sql_equivalence + 0.10*result_shape
final_score = max(0, base_score - hallucination_penalty - refinement_penalty - latency_penalty)
dataset_score = weighted_average(question_scores, weights by question_type)
PASS_THRESHOLD = 0.85 | BLOCK_THRESHOLD = 0.80 | REGRESSION_BLOCK = 0.10
## CSS VARIABLE SYSTEM (USE THESE)
Colors: var(--accent), var(--accent-hover), var(--accent-dim)
Status: var(--status-production), var(--status-sandbox), var(--status-degraded)
Text: var(--text), var(--text-secondary), var(--text-muted)
Background: var(--bg-base), var(--bg-card), var(--border), var(--border-subtle)
Components: .card, .btn.btn--primary, .btn--ghost, .data-table, .empty-state, .badge