Skip to content

markusvankempen/SQL-Query-Generator-with-Self-Improving-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SQL Query Generator with Self-Improving AI

Author: Markus van Kempen (markus.van.kempen@gmail.com) Last Updated: January 24, 2026 Version: 2.0

BeeAI Agent Demo

A Streamlit application that converts natural language queries into SQL using IBM WatsonX LLMs with self-improving capabilities powered by embeddings.

Features

🧠 Self-Improving System

  • Embedding-based learning: Uses IBM Slate embeddings to find semantically similar past queries
  • Few-shot prompting: Automatically includes successful examples in prompts
  • Error avoidance: Learns from common mistakes to improve future queries
  • User feedback: πŸ‘/πŸ‘Ž buttons help the system learn what works

πŸ€– Four Query Generation Modes

Mode Description Best For
Direct SQL Prompt engineering + SQLGlot validation Speed-critical queries
LangChain Agent LangChain SQL chain with schema injection Complex queries
BeeAI Agent IBM BeeAI Framework with native WatsonX Highest reliability
Self-Improving Full learning system with pattern storage Continuous improvement
SQL Validation SQLGlot (AST Parsing) Robustness & Safety
Agent Frameworks LangChain, BeeAI Framework Flexibility & Extensibility

πŸ’¬ Hybrid Query Classification

Not all questions need SQL! The system uses a 4-layer hybrid approach:

Layer Status Description
1. Semantic βœ… IBM Slate embeddings (79 examples) - understands "per" = "by"
2. Keywords βœ… Database/general/help keyword matching
3. Schema-Aware βœ… Validates against actual DB content
4. Empty Analysis βœ… Explains why queries return no results

Query Types Handled:

  • Database queries β†’ Generate SQL (e.g., "show all products")
  • General questions β†’ Direct response (e.g., "what time is it")
  • Help questions β†’ System guidance (e.g., "what can you do")
  • Unknown entities β†’ Helpful feedback (e.g., "show BMW sales" β†’ "BMW not in database")

The semantic classifier understands that "show total revenue per country" = "revenue by country" = "sales grouped by country" without keyword rules!

See COMPARISON_RESULTS.md for full implementation details and what's missing.

πŸ” Product Disambiguation

When a search matches multiple products (e.g., "Wireless Mouse" matches both "Wireless Mouse" and "Wireless Mouse Pro"), the system shows a helpful disambiguation message.

⚑ Semantic Caching (NEW)

  • Memory + disk caching of embedding vectors
  • First query: API call (~450ms) β†’ Cached for reuse (<1ms)
  • Reduces WatsonX API calls by 80%+ for repeated patterns

πŸ’¬ Context Awareness (NEW)

Understands follow-up queries in conversation:

  • "show more" β†’ expands previous results
  • "same for USA" β†’ applies new filter to previous query
  • "top 5" β†’ limits previous results
  • "those products" β†’ resolves references to previous data

πŸ›‘οΈ Robust Validation (NEW)

  • SQLGlot-powered: AST-based parsing ensures only valid SQL is executed
  • Safety checks: Prevents SELECT * without tables, implicit cross-joins (ON TRUE), and missing join conditions
  • Fallback mechanism: Gracefully degrades if strict validation fails

πŸ“Š Schema Enrichment (NEW)

  • Semantic Context: Columns have descriptions (e.g., "status: 1=New")
  • Dynamic Sampling: Injects real valid values into prompt (e.g., country: 'USA', 'Germany')
  • Reduced Hallucinations: LLM knows exactly which values exist in the DB

Quick Start

1. Setup Environment

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

2. Configure Credentials

Create a .env file:

WATSONX_API_KEY=your_api_key
WATSONX_URL=https://us-south.ml.cloud.ibm.com
WATSONX_PROJECT_ID=your_project_id
DATABASE_PATH=./data/database.db

3. Initialize Database

python scripts/init_database.py

4. Run the App

streamlit run app.py

Project Structure

β”œβ”€β”€ app.py                    # Main Streamlit UI (entry point)
β”œβ”€β”€ watsonx_client.py         # Direct SQL mode with learning
β”œβ”€β”€ langchain_agent.py        # LangChain mode with learning
β”œβ”€β”€ beeai_agent.py            # BeeAI mode with learning
β”œβ”€β”€ self_improving_agent.py   # Full self-improving agent
β”œβ”€β”€ learning_store.py         # Embedding-based pattern storage
β”œβ”€β”€ query_classifier.py       # Hybrid query classification
β”œβ”€β”€ semantic_classifier.py    # Embedding-based intent detection + caching
β”œβ”€β”€ context_manager.py        # Conversation context awareness
β”œβ”€β”€ product_mapper.py         # Product disambiguation
β”œβ”€β”€ schema_loader.py          # Database schema utilities
β”œβ”€β”€ logging_config.py         # Centralized logging
β”‚
β”œβ”€β”€ scripts/                  # Utility scripts
β”‚   └── init_database.py      # Database initialization
β”‚
β”œβ”€β”€ tests/                    # Test suite
β”‚   β”œβ”€β”€ test_comparison.py    # SQLGlot comparison tests
β”‚   β”œβ”€β”€ test_self_improving.py    # 100-query test suite
β”‚   β”œβ”€β”€ test_model_comparison.py  # LLM model benchmark
β”‚   β”œβ”€β”€ test_mode_comparison.py   # Mode comparison tests
β”‚   └── test_model_mode_comparison.py  # Combined comparison
β”‚
β”œβ”€β”€ docs/                     # Documentation
β”‚   β”œβ”€β”€ COMPARISON_RESULTS.md # Detailed mode & model comparison
β”‚   └── test_results_summary.md   # Test findings
β”‚
β”œβ”€β”€ data/                     # Data storage
β”‚   β”œβ”€β”€ database.db           # SQLite database
β”‚   β”œβ”€β”€ learning.db           # Learning store (patterns, errors)
β”‚   β”œβ”€β”€ cache/                # Embedding cache
β”‚   └── results/              # Test comparison results (JSON)
β”‚
└── .cursor/rules/            # Cursor AI rules

Database Schema

Tables

  • customers: customer_id, customer_name, email, city, country, region
  • products: product_id, product_code, product_name, category, price
  • orders: order_id, customer_id, product_id, order_date, quantity, total_amount

Views

  • sales: Pre-joined view combining all tables for simplified queries

Self-Improving Architecture

User Query
    ↓
πŸ” Query Classifier ──────────────────┐
    β”‚                                  β”‚ (general/help)
    β”‚ (database)                       ↓
    ↓                              πŸ“ Direct Response
🧠 Embedding Search
    β”‚
    ↓
πŸ“š Find Similar Patterns (top 3)
    β”‚
    ↓
⚠️ Find Common Errors to Avoid
    β”‚
    ↓
πŸ“ Build Enhanced Prompt
    β”‚ (includes few-shot examples + error avoidance)
    ↓
πŸ€– Generate SQL β†’ Execute β†’ Display Results
    β”‚                           β”‚
    ↓                           ↓
πŸ’Ύ Store Success Pattern    ⚠️ Store Error Pattern
    β”‚
    ↓
πŸ‘/πŸ‘Ž User Feedback β†’ Updates Pattern Rating

Test Results (100 Queries)

Results using Direct SQL Mode with Self-Improving Learning enabled.

Metric Value
Success Rate 81%
Patterns Learned 81
Errors Logged 20
Embedding Utilization 99%

By Category

Category Success Rate
Product Filtering 100%
Basic Sales 100%
Basic Products 95%
Aggregation 90%
Complex Queries 80%

Model + Mode Comparison

Best Combinations (After Prompt Optimization)

Rank Combination Accuracy Speed Notes
πŸ₯‡ Granite 4 Small + LangChain 100% 604ms Best overall
πŸ₯‡ Granite 4 Small + BeeAI 100% ~650ms After prompt fix
πŸ₯‡ Mistral Small + BeeAI 100% 512ms Fastest 100% combo
πŸ₯ˆ Llama 70B + Any Mode 100% 1200ms+ Most accurate model

Accuracy by Mode (7 test queries)

Mode Accuracy Notes
LangChain 100% Retry logic compensates for errors
BeeAI 100% After prompt optimization
Direct SQL 86% Fast but struggles with complex

LLM Model Benchmark (6 queries across modes)

Model Direct SQL LangChain BeeAI
Llama 3.3 70B 100% 100% 100%
Mistral Small 24B 67% 100% 100%
Granite 4 Small 83% 100% 100%
Granite 3.3 8B 50% 100% 83%

Key Finding

LangChain achieves 100% accuracy with ALL models because it handles retries and schema injection automatically.

Success by Query Type

Model Simple Filter Aggregation Complex
Granite 4 Small 100% 100% 100% 75%
Mistral Small 24B 100% 100% 75% 75%
Llama 3.3 70B 100% 100% 75% 50%

All Supported Models

  • ibm/granite-4-h-small - Recommended - Best accuracy
  • ibm/granite-3-3-8b-instruct - Fast but struggles with complex
  • meta-llama/llama-3-3-70b-instruct - Powerful but slower
  • meta-llama/llama-3-405b-instruct - Largest available
  • mistralai/mistral-small-3-1-24b-instruct-2503 - Fastest
  • mistralai/mistral-medium-2505 - Balanced

API Keys & Services

  • IBM WatsonX: LLM inference (Granite, Llama, Mistral models)
  • IBM Slate Embeddings: ibm/slate-125m-english-rtrvr-v2 for semantic search

Example Queries

Product Queries:

  • "show all laptop products"
  • "list products under $100"
  • "what wireless products do we have"

Sales Queries:

  • "show sales by country"
  • "total revenue from laptops in USA"
  • "top 5 products by sales"

Customer Queries:

  • "how many customers are in Europe"
  • "list customers who ordered smartphones"

General Questions (no SQL):

  • "what time is it"
  • "hello"
  • "what can you do"

Running Tests

# Run all tests with pytest
pytest

# Run specific test file
pytest tests/test_model_comparison.py -v

LLM Model Comparison

# Quick test (5 queries, 3 models)
python tests/test_model_comparison.py --quick

# Full benchmark (12 queries, 4 models)
python tests/test_model_comparison.py

# Specific models
python tests/test_model_comparison.py --models ibm/granite-4-h-small meta-llama/llama-3-3-70b-instruct

Self-Improving Test

# Run 100-query test
python tests/test_self_improving.py --mode direct_sql --queries 100

# Quick test (10 queries)
python tests/test_self_improving.py --quick

Documentation

Document Description
ARCHITECTURE.md System architecture, data flow, components
API.md API reference for all modules
USAGE.md Usage guide with examples
COMPARISON_RESULTS.md Detailed mode & model comparison
test_results_summary.md Test findings
CONTRIBUTING.md Contribution guidelines
SHOWCASE.md Self-Improving Agent Showcase

License

MIT License

About

SQL-Query-Generator-with-Self-Improving-AI

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages