Skip to content
/ nl2sql Public

NL2SQL is an enterprise-grade, multi-agent NL→SQL system that delivers accurate, safe, and deterministic SQL with schema retrieval, validation, and full observability.

Notifications You must be signed in to change notification settings

nadeem4/nl2sql

Repository files navigation

NL2SQL

NL2SQL is an enterprise-grade, agentic Natural Language to SQL pipeline built on LangGraph.

Unlike simple prompt-to-query solutions, this project employs a Map-Reduce Supervisor Architecture to handle complex, real-world database interactions. It features dynamic routing across multiple database engines (Postgres, MySQL, MSSQL, SQLite), vector-based schema retrieval for scalability, and a rigorous Plan-Validate-Execute loop to ensure query correctness and safety.

Designed for observability and reliability, it provides detailed performance metrics, step-by-step reasoning logs, and deterministic SQL generation, making it suitable for production environments where accuracy is paramount.

Feature Catalog

The system implements a sophisticated Agentic Architecture designed for enterprise scale.

Core Architecture

  • LangGraph Orchestration: Uses a stateful graph architecture (StateGraph) to manage complex agentic workflows, ensuring predictable control flow and state validation.
  • Hybrid Execution Lanes:
    • Fast Lane (Direct SQL): Optimizes for low latency (< 2s) on simple queries (e.g., "List 10 machines") by skipping complex planning.
    • Slow Lane (Agentic Loop): Engages a robust Plan-Validate-Execute-Refine loop for complex analytical queries requiring reasoning.
  • Map-Reduce Supervisor:
    • Supervisor (Decomposer): Intelligently breaks down complex user queries into sub-tasks (Map).
    • Parallel Execution: Executes sub-tasks dynamically across distributed workers (LangGraph branches).
    • Reducer (Aggregator): Synthesizes results from multiple branches into a coherent final answer.

Intelligent Routing & Retrieval

  • Vector-Based Routing (RAG):
    • Datasource Discovery: Uses vector search to match user queries to the correct database(s) from a registry of available connections.
    • Schema Retrieval: Dynamically fetches only relevant table schemas for the LLM context window, enabling support for databases with thousands of tables.
  • Intent Classification:
    • Determines query intent (TABULAR, KPI, SUMMARY) to select the optimal execution lane.
    • Enriches user queries with domain-specific synonyms and entity extraction.

Resilience & Reliability

  • Self-Correcting Agents: The "Slow Lane" implements a Reflection pattern. If generated SQL fails validation or execution, the system captures the error, summarizes it, and feeds it back to the Planner for correction (Retries).
  • Structured Output Enforcement: Uses OpenAI generic 'Tools' (Function Calling) to enforce strict JSON schemas for critical nodes (Planner, Decomposer), preventing parsing errors.
  • Pre-Execution Validation: A dedicated ValidatorNode performs static analysis on generated SQL to catch schema violations (e.g., non-existent columns) before execution.

Quick Start

Follow this guide to set up the environment and run your first query.

Prerequisites

  • Python 3.10+
  • Docker & Docker Compose
  • OpenAI API Key

1. Installation

git clone https://github.com/nadeem4/nl2sql.git
cd nl2sql
pip install -r requirements.txt

2. Infrastructure Setup

Start the database containers (Postgres, MySQL, MSSQL) and seed them with synthetic manufacturing data:

# Start containers
docker-compose up -d

# Seed data (wait ~10s for DBs to initialize)
python scripts/seed_databases.py --wait

3. Configuration

Create a .env file in the root directory:

OPENAI_API_KEY="sk-..."

4. Verify Setup

Run a query against the Postgres database to verify the pipeline:

python -m src.nl2sql.cli --id manufacturing_ops --query "List 5 machines"

Expected Output: A structured log of the AI's reasoning followed by a table of 5 machines.


Configuration

Datasources (configs/datasources.yaml)

Define connection strings and settings for each database.

manufacturing_sqlite:
  engine: sqlite
  connection_string: sqlite:///data/manufacturing.db
  tables: [] # Empty list = include all

manufacturing_ops:
  engine: postgres
  connection_string: postgresql+psycopg2://user:password@localhost:5432/manufacturing_ops

LLM Settings (configs/llm.yaml)

Map specific agents to different LLM providers/models.

default:
  provider: openai
  model: gpt-4o

agents:
  planner:
    provider: openai
    model: gpt-4o-mini # Use a cheaper model for planning

Routing Examples (configs/sample_questions.yaml)

Map datasource IDs to lists of sample questions to improve routing accuracy (Layer 1).

manufacturing_ops:
  - "List all machines"
  - "Which machines are offline?"
manufacturing_supply:
  - "Show top products by price"
  - "List all products"

Usage Guide

CLI Basics

The CLI (src.nl2sql.cli) is the main entry point.

  • --query "...": The natural language question.
  • --verbose: Display step-by-step AI reasoning.
  • --show-perf: Display detailed performance metrics (latency, tokens).
  • --vector-store <PATH>: Use vector search for schema selection (requires indexing).
  • --id <ID>: Optional. Force a specific datasource, bypassing the router (e.g., manufacturing_ops).
  • --no-exec: Generate and validate SQL without executing it.
  • --debug: Enable debug logging for verbose output.
  • --include-ids <ID>...: Benchmark only specific test case IDs.

Multi-Database Support

The system simulates a manufacturing enterprise distributed across 4 databases:

ID Engine Content Example Query
manufacturing_ops Postgres Machines, Maintenance, Employees List 5 machines with their serial numbers
manufacturing_supply MySQL Products, Inventory, Suppliers Show me top 3 products by price
manufacturing_history MSSQL Production Runs, Sales, Defects Count total production runs
manufacturing_ref SQLite Factories, Shifts, Machine Types List all factories and their locations

Sample Commands

Run these commands to test each database:

1. Postgres (Operations)

python -m src.nl2sql.cli --id manufacturing_ops --query "List 5 machines with their serial numbers"

2. MySQL (Supply Chain)

python -m src.nl2sql.cli --id manufacturing_supply --query "Show me top 3 products by price"

3. MSSQL (History)

python -m src.nl2sql.cli --id manufacturing_history --query "Count total production runs"

4. SQLite (Reference)

python -m src.nl2sql.cli --id manufacturing_ref --query "List all factories and their locations"

5. Cross-Database Query (Map-Reduce)

The system can automatically decompose complex queries into sub-queries, execute them in parallel, and aggregate the results.

Example 1: Sales vs Inventory (MSSQL + MySQL)

python -m src.nl2sql.cli --query "Compare sales from manufacturing_history and inventory from manufacturing_supply"

Vector Search (RAG)

For large schemas, use vector search to dynamically select relevant tables.

  1. Index the Schema:

    python -m src.nl2sql.cli --index --vector-store ./chroma_db
  2. Query with Context:

    python -m src.nl2sql.cli --query "Show top 5 products" --vector-store ./chroma_db

Observability & Logging

  • Stream Reasoning: Use --verbose to see the Intent, Planner, and Generator steps.
  • Debug Mode: Use --debug for verbose output.

Architecture

The Pipeline (LangGraph)

We use a hybrid Intent-Driven + Map-Reduce Architecture:

  1. Intent & Routing:
    • Intent Node: Classifies user goal and standardizes the query.
    • Decomposer (Supervisor): Uses vector search to find the right data source(s). If multiple are needed, it triggers Map-Reduce.
  2. Execution Lanes:
    • Fast Lane: Direct SQL generation (DirectSQLNode) for simple tabular data.
    • Slow Lane: Reasoning loop (Planner -> Validator -> Generator) for complex requests.
  3. Aggregation:
    • The AggregatorNode stitches results together, using an LLM only if necessary (Summary Mode).

For a deep dive into the Map-Reduce pattern, see docs/ARCHITECTURE_MAP_REDUCE.md.

flowchart TD
  %% Nodes
  user["User Query"] --> intent["Intent Node"]
  intent --> decomposer["Decomposer (Supervisor)"]
  
  subgraph Routing Layer
    decomposer -- Lookups --> vector_store[("Vector Store")]
    vector_store -. Context .-> decomposer
  end

  decomposer -- Single DB --> branch["Execution Branch"]
  decomposer -- "Map (Multi-DB)" --> parallel["Parallel Execution Branches"]
  parallel --> branch

  subgraph Worker Lane ["Worker Lane (Per Datasource)"]
    branch --> schema["Schema Node"]
    schema --> lane_logic{Fast or Slow?}
    
    %% Fast Lane
    lane_logic -- Fast --> directSQL["DirectSQL (AI)"]
    directSQL --> executor["Executor"]

    %% Slow Lane (ReAct Loop)
    lane_logic -- Slow --> planner["Planner (AI)"]
    planner --> validator["Validator"]
    
    validator -- "Pass" --> generator["Generator"]
    validator -- "Fail" --> summarizer["Summarizer (Reflect)"]
    
    generator --> executor
    executor -- "Error" --> summarizer
    
    summarizer -. "Feedback" .-> planner
  end
  
  executor -- "Success" --> aggregator["Aggregator (Reducer)"]
  aggregator --> answer["Final Answer"]
  
  %% Styling
  style user fill:#f6f8fa,stroke:#333
  style decomposer fill:#e1f5fe,stroke:#01579b
  style aggregator fill:#e1f5fe,stroke:#01579b
  style answer fill:#f6f8fa,stroke:#333
  style vector_store fill:#fff3e0,stroke:#e65100
  style summarizer fill:#fff9c4,stroke:#fbc02d
Loading

Vectorization Strategy

To support efficient querying across large or multiple databases, we use a two-tiered vectorization approach:

  1. Datasource Routing:

    • What: Indexes the description of each database.
    • Why: Determines which database contains the relevant data (e.g., "Sales" vs. "Inventory").
    • Strategy:
      • Layer 1 (Fast): Vector search against database descriptions and 200+ sample questions.
      • Layer 2 (Robust): Decomposer LLM uses retrieved context to form sub-queries.
  2. Schema Selection:

    • What: Indexes table metadata (columns, foreign keys, comments).
    • Why: Determines which tables are needed for the query within the selected database.

This allows the system to scale to hundreds of tables without overwhelming the LLM's context window.

Core Agents

  • Intent (AI): Entry point. Classifies intent (Table vs Summary) and standardizes query.
  • Decomposer (Supervisor): Orchestrates routing and query splitting.
  • DirectSQL (AI): Fast Lane. Generates SQL for simple queries without planning.
  • Planner (AI): Slow Lane. Generates a structured plan.
  • Validator (Code): Slow Lane. Verifies the plan against schema.
  • Generator (Code): Compiles plan to SQL.
  • Executor (Code): Runs the SQL.
  • Aggregator (AI): Synthesizes final results. Skips AI for simple table outputs.

Performance Breakdown

The CLI provides a detailed breakdown of time and token usage, including a top-level matrix and per-datasource details:

Enable this detailed view by adding the --show-perf flag.

Click to see Sample Performance Output

Top Level Performance

Metric Decomposer Aggregator Exec (manufacturing_ops) Total
Latency (s) 0.0000 0.0000 7.2700 7.2700
Token Usage 0 0 3029 3029

Performance: manufacturing_ops

Node Type Model Latency (s) Tokens
Intent AI gpt-4o-mini 1.9300 569
Planner AI gpt-4o-mini 4.9900 2456
Generator Non-AI - 0.0000 -
Executor Non-AI - 0.3500 -

Project Structure

  • src/: Core modules (nodes, langgraph_pipeline, datasource_config, llm_registry).
  • configs/: YAML configurations for datasources and LLMs.
  • scripts/: Utilities (e.g., seed_databases.py).
  • tests/: Unit and integration tests.

Evaluation Framework

We employ a rigorous evaluation framework to ensure the pipeline's correctness and stability.

For detailed documentation on metrics, architecture, and usage, see EVALUATION.md.

Key Metrics

  • Execution Accuracy: Verifies that generated SQL returns the correct data (not just string matching).
  • Stability (Pass@K): Measures pipeline reliability and routing consistency over multiple iterations.
  • Routing Accuracy: Tracks the performance of the 3-layer routing system.

Quick Benchmark

Run the full test suite with parallel execution:

python -m src.nl2sql.cli --benchmark --dataset tests/golden_dataset.yaml

To test stability (run each question 5 times):

python -m src.nl2sql.cli --benchmark --dataset tests/golden_dataset.yaml --iterations 5

Development

Running Tests

Run the test suite using pytest:

python -m pytest tests/

Adding New Engines

  1. Add the driver to requirements.txt.
  2. Add a connection profile to configs/datasources.yaml.
  3. (Optional) Add specific DDL/Seeding logic to scripts/seed_databases.py.

About

NL2SQL is an enterprise-grade, multi-agent NL→SQL system that delivers accurate, safe, and deterministic SQL with schema retrieval, validation, and full observability.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages