μ΄ λ¬Έμλ₯Ό μμμ μλλ‘ λ°λΌκ°λ©΄ μ€μΉ β API ν€ β μν DB β κΈ°λ³Έ νμ΄νλΌμΈ β Hook νΈλ μ΄μ± β 컀μ€ν°λ§μ΄μ§κΉμ§ λͺ¨λ 체νν μ μμ΅λλ€.
- μ€μΉ
- API ν€ μ€μ
- μν DB μΈν
- SQLAlchemyDB μ°κ²° μ€μ
- μμ μΉ΄νλ‘κ·Έ
- κΈ°λ³Έ μ¬μ© β BaselineNL2SQL
- 6-A. Anthropic Claude + SQLite
- 6-B. OpenAI GPT + SQLite
- 6-C. PostgreSQL μ°κ²°
- 6-D. μ§μ DB λ°©μΈ(dialect) λͺ©λ‘
- HookμΌλ‘ μ€ν μΆμ νκΈ°
- κ³ κΈ μ¬μ© β μ»΄ν¬λνΈ μ§μ μ‘°ν©
- 컀μ€ν°λ§μ΄μ§
- 9-A. μμ€ν ν둬ννΈ κ΅μ²΄
- 9-B. λλ§μ LLM μ°κ²°
- 9-C. λλ§μ DB μ°κ²°
- 9-D. 컀μ€ν μ»΄ν¬λνΈ λ§λ€κΈ°
- 9-E. 컀μ€ν νλ‘μ° λ§λ€κΈ°
- μλ¬ μ²λ¦¬
- μ 체 κΈ°λ₯ 체ν¬λ¦¬μ€νΈ
pip install lang2sqlκ°λ° νκ²½μμ uvλ₯Ό μ¬μ©νλ κ²½μ°:
uv sync
anthropic, sqlalchemyλ κΈ°λ³Έ μμ‘΄μ±μ ν¬ν¨λμ΄ μμ΄ λ³λ μ€μΉκ° νμ μμ΅λλ€.
ν¨ν€μ§ μ λ°μ΄νΈ ν
pyproject.tomlμ΄ λ³κ²½λμλ€λ©΄ λ°λμuv syncλ₯Ό λ€μ μ€ννμΈμ.
OpenAI, Anthropic SDKλ νκ²½λ³μλ₯Ό μλμΌλ‘ μ½μ΅λλ€.
api_keyλ₯Ό μ½λμ μ§μ μ°μ§ μμλ λ©λλ€.
# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
# OpenAI
export OPENAI_API_KEY="sk-..."νλ‘μ νΈ λ£¨νΈμ .env νμΌμ λ§λ€κ³ :
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...Python μ½λμμ λ‘λ:
from dotenv import load_dotenv
load_dotenv()νκ²½λ³μ μ΄λ¦μ΄ λ€λ₯Έ κ²½μ°(μ: OPEN_AI_KEY) api_keyλ‘ μ§μ μ λ¬νκ±°λ νμ€ μ΄λ¦μΌλ‘ 볡μ¬ν©λλ€:
import os
# λ°©λ² 1 β api_keyλ‘ μ§μ μ λ¬
llm = OpenAILLM(model="gpt-4o", api_key=os.environ["OPEN_AI_KEY"])
# λ°©λ² 2 β SDKκ° μ½λ νμ€ μ΄λ¦μΌλ‘ λ³΅μ¬ (μ΄ν api_key μλ΅ κ°λ₯)
os.environ["OPENAI_API_KEY"] = os.environ["OPEN_AI_KEY"]
llm = OpenAILLM(model="gpt-4o")νν λ¦¬μΌ μ½λλ₯Ό μ€μ DBλ‘ λ°λ‘ μ€νν΄λ³Ό μ μλλ‘ μν λ°μ΄ν°λ₯Ό μ 곡ν©λλ€. κ³ κ° 10λͺ , μν 12κ°, μ£Όλ¬Έ 44건, μ£Όλ¬Ένλͺ© 83κ°κ° ν¬ν¨λ©λλ€.
# νλ‘μ νΈ λ£¨νΈμμ μ€ν β sample.db νμΌμ΄ νμ¬ λλ ν 리μ μμ±λ©λλ€
python scripts/setup_sample_db.pyμ°κ²° μ€: sqlite:///sample.db
ν
μ΄λΈ μμ± μλ£: customers, products, orders, order_items
κ³ κ°: 10λͺ
μν: 12κ°
주문: 44건
μ£Όλ¬Έ νλͺ©: 83κ°
βββ κ²μ¦ 쿼리 κ²°κ³Ό βββββββββββββββββββββββββββββββ
μ 체 μ£Όλ¬Έ μ: 44건
gold λ±κΈ κ³ κ° μ: 3λͺ
μ¬κ³ 10κ° λ―Έλ§ μν: 5κ°
- 무μ λ§μ°μ€: 3κ°
- νλ μ§μ
: 4κ°
...
βββββββββββββββββββββββββββββββββββββββββββββββββ
μλ£! μλ URLλ‘ quickstart.mdλ₯Ό λ°λΌν΄ 보μΈμ:
sqlite:///sample.db
sample.dbλ μ€ν¬λ¦½νΈλ₯Ό μ€νν λλ ν 리μ μμ±λ©λλ€. Python μ½λλ κ°μ λλ ν 리μμ μ€νν΄μΌsqlite:///sample.dbλ‘ μ°κ²°λ©λλ€.
# 1. 컨ν
μ΄λ κΈ°λ (μ²μ ν λ²λ§)
docker compose -f docker/docker-compose-postgres.yml up -d
# 2. μν λ°μ΄ν° μ½μ
python scripts/setup_sample_db.py --db postgres컀μ€ν URLμ μ¬μ©νλ €λ©΄:
python scripts/setup_sample_db.py --url "postgresql://myuser:mypass@myhost:5432/mydb"
SQLAlchemyDBλ lang2sqlμ΄ SQLμ μ€μ DBμμ μ€νν λ μ¬μ©νλ DB μ΄λν°μ
λλ€.
SQLAlchemy URLλ§ λκΈ°λ©΄ μ°κ²°μ΄ μλ£λ©λλ€.
from lang2sql.integrations.db import SQLAlchemyDB
# SQLite β 3λ²μμ μμ±ν sample.dbμ λ°λ‘ μ°κ²°
db = SQLAlchemyDB("sqlite:///sample.db")
# PostgreSQL
db = SQLAlchemyDB("postgresql://postgres:postgres@localhost:5432/postgres")
# MySQL
db = SQLAlchemyDB("mysql+mysqlconnector://user:pass@localhost:3306/dbname")
# DuckDB (μΈλ©λͺ¨λ¦¬)
db = SQLAlchemyDB("duckdb:///:memory:")# execute()λ₯Ό μ§μ νΈμΆν΄μ νμΈ
rows = db.execute("SELECT name, grade FROM customers WHERE grade = 'gold'")
print(rows)
# [{'name': 'κΉμ² μ', 'grade': 'gold'}, {'name': 'λ°μν¬', 'grade': 'gold'}, ...]
# ν
μ΄λΈ λͺ©λ‘ νμΈ (SQLite)
print(db.execute("SELECT name FROM sqlite_master WHERE type='table'"))
# [{'name': 'customers'}, {'name': 'products'}, {'name': 'orders'}, {'name': 'order_items'}]μΉ΄νλ‘κ·Έλ μ΄λ€ ν
μ΄λΈμ΄ μλμ§ μλ €μ£Όλ λ©νλ°μ΄ν° λͺ©λ‘μ
λλ€.
KeywordRetrieverκ° μ΄ μΉ΄νλ‘κ·Έλ₯Ό BM25λ‘ κ²μν΄ μ§λ¬Έκ³Ό κ΄λ ¨λ ν
μ΄λΈμ μ°Ύμ΅λλ€.
setup_sample_db.pyλ‘ λ§λ DBμ μ»¬λΌ κ΅¬μ‘°κ° μΌμΉνλλ‘ μλμ κ°μ΄ μ μν©λλ€.
from lang2sql import CatalogEntry
CATALOG: list[CatalogEntry] = [
{
"name": "orders",
"description": "κ³ κ° μ£Όλ¬Έ μ 보 ν
μ΄λΈ. μ£Όλ¬Έ 건μ, κΈμ‘, λ μ§ μ‘°νμ μ¬μ©.",
"columns": {
"order_id": "μ£Όλ¬Έ κ³ μ ID (PK)",
"customer_id": "μ£Όλ¬Έν κ³ κ° ID (FK β customers)",
"order_date": "μ£Όλ¬Έ μΌμ (TIMESTAMP)",
"amount": "μ£Όλ¬Έ κΈμ‘ (DECIMAL)",
"status": "μ£Όλ¬Έ μν: pending / confirmed / shipped / cancelled",
},
},
{
"name": "customers",
"description": "κ³ κ° λ§μ€ν° λ°μ΄ν°. κ³ κ° μ΄λ¦, κ°μ
μΌ, λ±κΈ μ‘°νμ μ¬μ©.",
"columns": {
"customer_id": "κ³ κ° κ³ μ ID (PK)",
"name": "κ³ κ° μ΄λ¦",
"email": "μ΄λ©μΌ μ£Όμ",
"joined_at": "κ°μ
μΌμ (TIMESTAMP)",
"grade": "κ³ κ° λ±κΈ: bronze / silver / gold",
},
},
{
"name": "products",
"description": "μν μ 보 ν
μ΄λΈ. μνλͺ
, μΉ΄ν
κ³ λ¦¬, κ°κ²© μ‘°νμ μ¬μ©.",
"columns": {
"product_id": "μν κ³ μ ID (PK)",
"name": "μνλͺ
",
"category": "μΉ΄ν
κ³ λ¦¬: electronics / clothing / food",
"price": "νλ§€ κ°κ²© (DECIMAL)",
"stock": "νμ¬ μ¬κ³ μλ (INTEGER)",
},
},
{
"name": "order_items",
"description": "μ£Όλ¬Έλ³ μν κ΅¬μ± ν
μ΄λΈ. μ£Όλ¬Έμ ν¬ν¨λ μνκ³Ό μλ μ‘°νμ μ¬μ©.",
"columns": {
"item_id": "νλͺ© κ³ μ ID (PK)",
"order_id": "μ£Όλ¬Έ ID (FK β orders)",
"product_id": "μν ID (FK β products)",
"quantity": "μ£Όλ¬Έ μλ (INTEGER)",
"unit_price": "μ£Όλ¬Έ λΉμ λ¨κ° (DECIMAL)",
},
},
]λ°©λ² A β SQLAlchemy inspectλ‘ DB μ€ν€λ§ μ½κΈ°
# (νμ¬ λ―Έκ΅¬ν, μμ΄λμ΄ μμ)
from sqlalchemy import create_engine, inspect
engine = create_engine("sqlite:///sample.db")
insp = inspect(engine)
catalog = []
for table_name in insp.get_table_names():
columns = {
col["name"]: str(col["type"])
for col in insp.get_columns(table_name)
}
catalog.append({"name": table_name, "columns": columns})λ°©λ² B β DataHub λ± λ©νλ°μ΄ν° νλ«νΌ μ°λ
- μ΄λ―Έ
utils/data/μλμ DataHub μ°λ μ½λκ° μμ΅λλ€. - ν₯ν
CatalogEntryνμμΌλ‘ λ³ννλ μ΄λν°λ₯Ό μΆκ°ν μμ μ λλ€.
κ°μ₯ λΉ λ₯Έ μ¬μ©λ²μ λλ€. LLMκ³Ό DBλ§ μ°κ²°νλ©΄ μμ°μ΄ β SQL β μ€ν κ²°κ³Όλ₯Ό μ»μ΅λλ€.
db_dialectνλΌλ―Έν°λ₯Ό λ°λμ μ§μ νμΈμ. SQLiteλMONTH(),YEAR()κ°μ MySQL/PostgreSQL ν¨μλ₯Ό μ§μνμ§ μμ΅λλ€.db_dialectλ₯Ό μ§μ νλ©΄ ν΄λΉ DBμ λ§λ SQL ν¨μλ₯Ό μ¬μ©νλ ν둬ννΈκ° μλμΌλ‘ μ μ©λ©λλ€.
from lang2sql import BaselineNL2SQL
from lang2sql.integrations.llm import AnthropicLLM
from lang2sql.integrations.db import SQLAlchemyDB
pipeline = BaselineNL2SQL(
catalog=CATALOG,
llm=AnthropicLLM(model="claude-sonnet-4-6"),
db=SQLAlchemyDB("sqlite:///sample.db"),
db_dialect="sqlite", # β DB λ°©μΈ μ§μ
)
result = pipeline.run("μ΄λ²λ¬μ 10λ§μ μ΄μ μ£Όλ¬Έν κ³ κ° μ΄λ¦κ³Ό μ£Όλ¬Έ κΈμ‘μ μλ €μ€")
print(result)
# μμ μΆλ ₯: [{'name': 'κΉμ² μ', 'amount': Decimal('320000')}, ...]
result2 = pipeline.run("gold λ±κΈ κ³ κ° λͺ©λ‘μ μ΄λ¦ μμΌλ‘ 보μ¬μ€")
print(result2)
# μμ μΆλ ₯: [{'name': 'κΉμ² μ'}, {'name': 'λ°μν¬'}, {'name': 'μ΄λ―Όμ€'}]from lang2sql import BaselineNL2SQL
from lang2sql.integrations.llm import OpenAILLM
from lang2sql.integrations.db import SQLAlchemyDB
pipeline = BaselineNL2SQL(
catalog=CATALOG,
llm=OpenAILLM(model="gpt-4o"),
db=SQLAlchemyDB("sqlite:///sample.db"),
db_dialect="sqlite",
)
result = pipeline.run("μ¬κ³ κ° 10κ° λ―Έλ§μΈ μν λͺ©λ‘")
print(result)
# μμ μΆλ ₯: [{'name': '무μ λ§μ°μ€', 'stock': 3}, {'name': 'νλ μ§μ
', 'stock': 4}, ...]pipeline = BaselineNL2SQL(
catalog=CATALOG,
llm=AnthropicLLM(model="claude-sonnet-4-6"),
db=SQLAlchemyDB("postgresql://postgres:postgres@localhost:5432/postgres"),
db_dialect="postgresql",
)
result = pipeline.run("gold λ±κΈ κ³ κ°μ μ΄ μ£Όλ¬Έ κΈμ‘")db_dialect |
λμ DB | μ μ© λ΄μ© |
|---|---|---|
"sqlite" |
SQLite | strftime() μ¬μ©, MONTH()/YEAR() μ¬μ© μ ν¨ |
"postgresql" |
PostgreSQL | DATE_TRUNC, EXTRACT, INTERVAL |
"mysql" |
MySQL | MONTH(), YEAR(), DATE_FORMAT() |
"bigquery" |
Google BigQuery | DATE_TRUNC, EXTRACT, FORMAT_DATE |
"duckdb" |
DuckDB | DATE_TRUNC, EXTRACT, INTERVAL |
None / μλ΅ |
λ°©μΈ λ¬΄κ΄ | κΈ°λ³Έ ν둬ννΈ (λ μ§ ν¨μ λ―Έμ§μ ) |
MemoryHookμ λ¬λ©΄ κ° μ»΄ν¬λνΈκ° μΈμ , μΌλ§λ κ±Έλ Έλμ§, 무μμ λ°κ³ λ°ννλμ§ μ λΆ κΈ°λ‘λ©λλ€.
from lang2sql import BaselineNL2SQL, MemoryHook
from lang2sql.integrations.llm import AnthropicLLM
from lang2sql.integrations.db import SQLAlchemyDB
hook = MemoryHook()
pipeline = BaselineNL2SQL(
catalog=CATALOG,
llm=AnthropicLLM(model="claude-sonnet-4-6"),
db=SQLAlchemyDB("sqlite:///sample.db"),
db_dialect="sqlite",
hook=hook,
)
result = pipeline.run("μ£Όλ¬Έ 건μ")
for event in hook.snapshot():
dur = f" {event.duration_ms:.1f}ms" if event.duration_ms else ""
print(f"[{event.name}] {event.component:20s} phase={event.phase}{dur}")μμ μΆλ ₯:
[flow.run] BaselineNL2SQL phase=start
[component.run] KeywordRetriever phase=start
[component.run] KeywordRetriever phase=end 1.2ms
[component.run] SQLGenerator phase=start
[component.run] SQLGenerator phase=end 843.5ms
[component.run] SQLExecutor phase=start
[component.run] SQLExecutor phase=end 12.3ms
[flow.run] BaselineNL2SQL phase=end 857.0ms
events = hook.snapshot()
gen_events = [e for e in events if e.component == "SQLGenerator"]
for e in gen_events:
print(f" phase : {e.phase}")
print(f" duration_ms : {e.duration_ms}")
print(f" input : {e.input_summary}")
print(f" output : {e.output_summary}")hook.clear()
try:
pipeline.run("...")
except Exception:
pass
error_events = [e for e in hook.snapshot() if e.phase == "error"]
for e in error_events:
print(f"μ»΄ν¬λνΈ: {e.component}")
print(f"μλ¬: {e.error}")on_event(event) νλλ§ κ΅¬ννλ©΄ λ©λλ€.
class PrintHook:
def on_event(self, event):
if event.phase == "start":
print(f"βΆ {event.component} μμ")
elif event.phase == "end":
print(f"β {event.component} μλ£ ({event.duration_ms:.0f}ms)")
elif event.phase == "error":
print(f"β {event.component} μ€λ₯: {event.error}")
pipeline = BaselineNL2SQL(
catalog=CATALOG,
llm=AnthropicLLM(model="claude-sonnet-4-6"),
db=SQLAlchemyDB("sqlite:///sample.db"),
db_dialect="sqlite",
hook=PrintHook(),
)
result = pipeline.run("μ¬κ³ λΆμ‘± μν λͺ©λ‘")μμ μΆλ ₯:
βΆ KeywordRetriever μμ
β KeywordRetriever μλ£ (1ms)
βΆ SQLGenerator μμ
β SQLGenerator μλ£ (921ms)
βΆ SQLExecutor μμ
β SQLExecutor μλ£ (8ms)
νμ΄νλΌμΈμ μ°μ§ μκ³ μ»΄ν¬λνΈλ₯Ό μ§μ μ¬μ©ν μ μμ΅λλ€. κ° λ¨κ³ κ²°κ³Όλ₯Ό μ€κ°μ νμΈνκ±°λ 쑰건 λΆκΈ°λ₯Ό λ£κ³ μΆμ λ μ μ©ν©λλ€.
from lang2sql import KeywordRetriever, SQLGenerator, SQLExecutor
from lang2sql.integrations.llm import AnthropicLLM
from lang2sql.integrations.db import SQLAlchemyDB
retriever = KeywordRetriever(catalog=CATALOG, top_n=3)
generator = SQLGenerator(
llm=AnthropicLLM(model="claude-sonnet-4-6"),
db_dialect="sqlite",
)
executor = SQLExecutor(db=SQLAlchemyDB("sqlite:///sample.db"))
query = "gold λ±κΈ κ³ κ°μ μ΄λ² λ¬ μ£Όλ¬Έ μ΄μ‘"
# Step 1 β κ΄λ ¨ μ€ν€λ§ κ²μ
schemas = retriever.run(query)
print("κ²μλ μ€ν€λ§:")
for s in schemas:
print(f" - {s['name']}: {s.get('description', '')}")
# Step 2 β SQL μμ±
sql = generator.run(query, schemas)
print(f"\nμμ±λ SQL:\n{sql}")
# Step 3 β μ€ν
rows = executor.run(sql)
print(f"\nκ²°κ³Ό: {rows}")db_dialectμ μλ DB(Snowflake, Trino λ±)λ νΉλ³ν SQL μ€νμΌμ΄ νμν λ
system_promptλ‘ μ§μ μ§μ ν©λλ€. system_promptλ db_dialectλ³΄λ€ μ°μ ν©λλ€.
from lang2sql import SQLGenerator
from lang2sql.integrations.llm import AnthropicLLM
MY_PROMPT = """
You are a Snowflake SQL expert.
- Use DATEADD, DATEDIFF for date arithmetic
- Use TO_DATE() for date casting
- Use CURRENT_DATE() for today
- Return ONLY the SQL inside a ```sql ... ``` block
"""
generator = SQLGenerator(
llm=AnthropicLLM(model="claude-sonnet-4-6"),
system_prompt=MY_PROMPT, # db_dialect λμ μ§μ μ§μ
)
sql = generator.run("μ΄λ² λ¬ μ£Όλ¬Έ 건μ", schemas)ν둬ννΈ νμΌμ src/lang2sql/components/generation/prompts/ μλμ μμ΅λλ€.
μλ‘μ΄ dialectλ₯Ό μΆκ°νλ €λ©΄ ν΄λΉ κ²½λ‘μ {dialect}.md νμΌμ λ§λ€λ©΄ λ©λλ€.
invoke(messages) -> str νλλ§ κ΅¬ννλ©΄ μ΄λ€ LLMμ΄λ μ°κ²°λ©λλ€.
# μ: LangChain λͺ¨λΈ κ·Έλλ‘ μ¬μ©
from langchain_openai import ChatOpenAI
class LangChainLLM:
def __init__(self, model: str):
self._llm = ChatOpenAI(model=model)
def invoke(self, messages: list[dict]) -> str:
from langchain_core.messages import HumanMessage, SystemMessage
lc_msgs = []
for m in messages:
if m["role"] == "system":
lc_msgs.append(SystemMessage(content=m["content"]))
else:
lc_msgs.append(HumanMessage(content=m["content"]))
return self._llm.invoke(lc_msgs).content
pipeline = BaselineNL2SQL(
catalog=CATALOG,
llm=LangChainLLM("gpt-4o"),
db=SQLAlchemyDB("sqlite:///sample.db"),
db_dialect="sqlite",
)execute(sql) -> list[dict] νλλ§ κ΅¬ννλ©΄ λ©λλ€.
# μ: pandas DataFrameμ DuckDBλ‘ μΏΌλ¦¬
class PandasDB:
def __init__(self, dataframes: dict):
import duckdb
self._conn = duckdb.connect()
for name, df in dataframes.items():
self._conn.register(name, df)
def execute(self, sql: str) -> list[dict]:
result = self._conn.execute(sql).fetchdf()
return result.to_dict(orient="records")
import pandas as pd
pipeline = BaselineNL2SQL(
catalog=CATALOG,
llm=AnthropicLLM(model="claude-sonnet-4-6"),
db=PandasDB({"orders": pd.read_csv("orders.csv")}),
db_dialect="duckdb",
)BaseComponentλ₯Ό μμνκ³ _run()λ§ κ΅¬νν©λλ€.
Hook, μλ¬ μ²λ¦¬, νμ΄λ°μ μλμΌλ‘ μ²λ¦¬λ©λλ€.
from lang2sql.core.base import BaseComponent
from lang2sql.core.exceptions import ComponentError
class SQLValidator(BaseComponent):
"""μμ±λ SQLμ μνν ν€μλκ° μμΌλ©΄ ComponentErrorλ₯Ό λ°μμν΅λλ€."""
FORBIDDEN = {"DROP", "DELETE", "TRUNCATE", "UPDATE", "INSERT", "ALTER"}
def _run(self, sql: str) -> str:
tokens = set(sql.upper().split())
bad = tokens & self.FORBIDDEN
if bad:
raise ComponentError(self.name, f"μνν ν€μλ κ°μ§: {bad}")
return sql
validator = SQLValidator()
safe_sql = validator.run("SELECT COUNT(*) FROM orders") # OK
# validator.run("DROP TABLE orders") # ComponentError λ°μBaseFlowλ₯Ό μμνκ³ _run()μμ μμ Python μ μ΄νλ¦μΌλ‘ μ»΄ν¬λνΈλ₯Ό μ‘°ν©ν©λλ€.
from lang2sql.core.base import BaseFlow
from lang2sql.core.exceptions import ComponentError
class SafeNL2SQL(BaseFlow):
"""κ²μ¦ λ¨κ³λ₯Ό μΆκ°ν νμ΄νλΌμΈ."""
def __init__(self, *, catalog, llm, db, hook=None):
super().__init__(name="SafeNL2SQL", hook=hook)
self._retriever = KeywordRetriever(catalog=catalog, hook=hook)
self._generator = SQLGenerator(llm=llm, db_dialect="sqlite", hook=hook)
self._validator = SQLValidator(hook=hook)
self._executor = SQLExecutor(db=db, hook=hook)
def _run(self, query: str):
schemas = self._retriever.run(query)
sql = self._generator.run(query, schemas)
sql = self._validator.run(sql) # κ²μ¦ ν΅κ³Ό μμλ§ μ€ν
return self._executor.run(sql)
pipeline = SafeNL2SQL(
catalog=CATALOG,
llm=AnthropicLLM(model="claude-sonnet-4-6"),
db=SQLAlchemyDB("sqlite:///sample.db"),
)
result = pipeline.run("μ£Όλ¬Έ 건μ")class RetryNL2SQL(BaseFlow):
"""SQL μμ± μ€ν¨ μ μ΅λ 3λ² μ¬μλν©λλ€."""
def __init__(self, *, catalog, llm, db, hook=None):
super().__init__(name="RetryNL2SQL", hook=hook)
self._retriever = KeywordRetriever(catalog=catalog, hook=hook)
self._generator = SQLGenerator(llm=llm, db_dialect="sqlite", hook=hook)
self._executor = SQLExecutor(db=db, hook=hook)
def _run(self, query: str):
schemas = self._retriever.run(query)
last_error = None
for attempt in range(3):
try:
sql = self._generator.run(query, schemas)
return self._executor.run(sql)
except ComponentError as e:
last_error = e
print(f" μλ {attempt + 1} μ€ν¨: {e}")
raise last_errorfrom lang2sql import ComponentError, IntegrationMissingError, Lang2SQLError
try:
result = pipeline.run("μ£Όλ¬Έ 건μ")
except ComponentError as e:
# νΉμ μ»΄ν¬λνΈμμ λ°μν μλ¬
print(f"μ»΄ν¬λνΈ μ€λ₯: {e.component}")
print(f"λ©μμ§: {e}")
if e.cause:
print(f"μμΈ: {e.cause}")
except IntegrationMissingError as e:
# ν¨ν€μ§ λ―Έμ€μΉ (anthropic, sqlalchemy λ±)
print(f"ν¨ν€μ§ λ―Έμ€μΉ: {e}")
# β uv sync λλ pip install lang2sql μ¬μ€ν
except Lang2SQLError as e:
# κ·Έ μΈ lang2sql λλ©μΈ μλ¬
print(f"λλ©μΈ μ€λ₯: {e}")μλ¬ κ³μΈ΅:
Lang2SQLError
βββ ComponentError β μ»΄ν¬λνΈ μ€ν μ€ν¨ (component, cause μμ±)
βββ IntegrationMissingError β μ νμ ν¨ν€μ§ λ―Έμ€μΉ
βββ ValidationError β κ²μ¦ μ€ν¨
μμ£Ό λ°μνλ μλ¬:
| μλ¬ | μμΈ | ν΄κ²° |
|---|---|---|
IntegrationMissingError: anthropic |
anthropic λ―Έμ€μΉ | uv sync |
OpenAIError: api_key must be set |
νκ²½λ³μ λ―Έμ€μ | export OPENAI_API_KEY=... |
no such table: products |
DB URL μ€ν λλ sample.db λ―Έμμ± | python scripts/setup_sample_db.py μ€ν |
no such function: MONTH |
SQLiteμμ MySQL ν¨μ μ¬μ© | db_dialect="sqlite" μ§μ |
μ€μ API ν€λ DB μμ΄λ FakeLLM/FakeDBλ‘ μ 체 νλ¦μ νμΈν μ μμ΅λλ€.
"""
lang2sql μ 체 κΈ°λ₯ 체ν¬λ¦¬μ€νΈ
μλ μ½λλ₯Ό μμλλ‘ μ€ννλ©΄ λͺ¨λ κΈ°λ₯μ ν
μ€νΈν μ μμ΅λλ€.
"""
# ββ 0. μμ‘΄μ± μλ Fake ꡬν ββββββββββββββββββββββββββββββββββββββββββββββββββ
class FakeLLM:
"""μ€μ API ν€ μμ΄ ν
μ€νΈν μ μλ κ°μ§ LLM."""
def invoke(self, messages):
user_msg = next(m["content"] for m in reversed(messages) if m["role"] == "user")
if "μ£Όλ¬Έ" in user_msg:
return "```sql\nSELECT COUNT(*) AS cnt FROM orders\n```"
if "κ³ κ°" in user_msg:
return "```sql\nSELECT name FROM customers ORDER BY name\n```"
return "```sql\nSELECT 1\n```"
class FakeDB:
"""μ€μ DB μμ΄ ν
μ€νΈν μ μλ κ°μ§ DB."""
_data = {
"SELECT COUNT(*) AS cnt FROM orders": [{"cnt": 44}],
"SELECT name FROM customers ORDER BY name": [{"name": "κΉμ² μ"}, {"name": "λ°μν¬"}],
}
def execute(self, sql):
return self._data.get(sql, [{"result": "ok"}])
# ββ 1. μΉ΄νλ‘κ·Έ μ μ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
from lang2sql import CatalogEntry
catalog: list[CatalogEntry] = [
{
"name": "orders",
"description": "μ£Όλ¬Έ μ 보 ν
μ΄λΈ",
"columns": {"order_id": "PK", "customer_id": "FK", "amount": "κΈμ‘"},
},
{
"name": "customers",
"description": "κ³ κ° λ§μ€ν° λ°μ΄ν°",
"columns": {"customer_id": "PK", "name": "μ΄λ¦", "grade": "λ±κΈ"},
},
]
# ββ 2. KeywordRetriever λ¨λ
ν
μ€νΈ βββββββββββββββββββββββββββββββββββββββββββ
from lang2sql import KeywordRetriever
retriever = KeywordRetriever(catalog=catalog, top_n=2)
schemas = retriever.run("μ£Όλ¬Έ 건μ μ‘°ν")
print("β KeywordRetriever")
print(f" κ²μ κ²°κ³Ό: {[s['name'] for s in schemas]}")
# μμ: ['orders']
# ββ 3. SQLGenerator λ¨λ
ν
μ€νΈ βββββββββββββββββββββββββββββββββββββββββββββββ
from lang2sql import SQLGenerator
generator = SQLGenerator(llm=FakeLLM())
sql = generator.run("μ£Όλ¬Έ 건μ", schemas)
print("\nβ SQLGenerator")
print(f" μμ± SQL: {sql}")
# μμ: SELECT COUNT(*) AS cnt FROM orders
# ββ 4. SQLExecutor λ¨λ
ν
μ€νΈ ββββββββββββββββββββββββββββββββββββββββββββββββ
from lang2sql import SQLExecutor
executor = SQLExecutor(db=FakeDB())
rows = executor.run(sql)
print("\nβ SQLExecutor")
print(f" κ²°κ³Ό: {rows}")
# μμ: [{'cnt': 44}]
# ββ 5. BaselineNL2SQL (κΈ°λ³Έ νμ΄νλΌμΈ) ν
μ€νΈ ββββββββββββββββββββββββββββββββ
from lang2sql import BaselineNL2SQL
pipeline = BaselineNL2SQL(catalog=catalog, llm=FakeLLM(), db=FakeDB())
result = pipeline.run("μ£Όλ¬Έ 건μ")
print("\nβ BaselineNL2SQL")
print(f" κ²°κ³Ό: {result}")
# μμ: [{'cnt': 44}]
# ββ 6. db_dialect ν
μ€νΈ ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
pipeline_sqlite = BaselineNL2SQL(
catalog=catalog, llm=FakeLLM(), db=FakeDB(), db_dialect="sqlite"
)
result = pipeline_sqlite.run("μ£Όλ¬Έ 건μ")
print("\nβ db_dialect='sqlite'")
print(f" κ²°κ³Ό: {result}")
# ββ 7. MemoryHook νΈλ μ΄μ± ν
μ€νΈ βββββββββββββββββββββββββββββββββββββββββββββ
from lang2sql import MemoryHook
hook = MemoryHook()
pipeline_traced = BaselineNL2SQL(catalog=catalog, llm=FakeLLM(), db=FakeDB(), hook=hook)
pipeline_traced.run("μ£Όλ¬Έ 건μ")
events = hook.snapshot()
print("\nβ MemoryHook μ΄λ²€νΈ")
for e in events:
dur = f" {e.duration_ms:.1f}ms" if e.duration_ms else ""
print(f" [{e.name}] {e.component:20s} phase={e.phase}{dur}")
component_starts = [e for e in events if e.name == "component.run" and e.phase == "start"]
print(f"\n component start μ΄λ²€νΈ μ: {len(component_starts)} (μμ: 3)")
# ββ 8. μλ¬ μ²λ¦¬ ν
μ€νΈ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
from lang2sql import ComponentError
class BrokenLLM:
def invoke(self, messages):
return "SQL μμ΄ μΌλ° ν
μ€νΈλ§ λ°ν" # μ½λλΈλ‘ μμ
try:
bad_pipeline = BaselineNL2SQL(catalog=catalog, llm=BrokenLLM(), db=FakeDB())
bad_pipeline.run("μ£Όλ¬Έ")
except ComponentError as e:
print(f"\nβ ComponentError μ μ λ°μ")
print(f" μ»΄ν¬λνΈ: {e.component}")
print(f" λ©μμ§: {e}")
# ββ 9. 컀μ€ν
Hook ν
μ€νΈ βββββββββββββββββββββββββββββββββββββββββββββββββββββ
class PrintHook:
def on_event(self, event):
if event.phase == "start":
print(f" βΆ {event.component}")
elif event.phase == "end":
print(f" β {event.component} ({event.duration_ms:.0f}ms)")
print("\nβ 컀μ€ν
PrintHook")
BaselineNL2SQL(catalog=catalog, llm=FakeLLM(), db=FakeDB(), hook=PrintHook()).run("κ³ κ° λͺ©λ‘")
# ββ 10. 컀μ€ν
μ»΄ν¬λνΈ ν
μ€νΈ βββββββββββββββββββββββββββββββββββββββββββββββββ
from lang2sql.core.base import BaseComponent
class UpperCaseSQL(BaseComponent):
"""SQLμ λλ¬Έμλ‘ λ³ννλ νμ²λ¦¬ μ»΄ν¬λνΈ."""
def _run(self, sql: str) -> str:
return sql.upper()
upper = UpperCaseSQL()
print(f"\nβ 컀μ€ν
BaseComponent")
print(f" κ²°κ³Ό: {upper.run('select 1')}")
# μμ: SELECT 1
# ββ 11. public API import νμΈ ββββββββββββββββββββββββββββββββββββββββββββββββ
from lang2sql import (
CatalogEntry, LLMPort, DBPort,
KeywordRetriever, SQLGenerator, SQLExecutor,
BaselineNL2SQL,
TraceHook, MemoryHook, NullHook,
Lang2SQLError, ComponentError, IntegrationMissingError,
)
print("\nβ λͺ¨λ public import μ±κ³΅")
print("\n" + "="*50)
print("λͺ¨λ 체ν¬λ¦¬μ€νΈ ν΅κ³Ό! lang2sql μ€λΉ μλ£.")
print("="*50)BaselineNL2SQL.run("μμ°μ΄ μ§λ¬Έ")
β
βββ KeywordRetriever.run(query)
β βββ BM25 ν€μλ κ²μ β list[CatalogEntry]
β
βββ SQLGenerator.run(query, schemas)
β βββ _load_prompt(db_dialect) β prompts/{dialect}.md λ‘λ
β βββ _build_context(schemas) β μ€ν€λ§ ν
μ€νΈ ꡬμ±
β βββ llm.invoke(messages) β LLM νΈμΆ
β βββ _extract_sql(response) β ```sql...``` νμ±
β
βββ SQLExecutor.run(sql)
βββ db.execute(sql) β list[dict]
λͺ¨λ λ¨κ³μμ Hookμ΄ start / end / error μ΄λ²€νΈλ₯Ό κΈ°λ‘ν©λλ€.
μ»΄ν¬λνΈ νμ₯ ν¬μΈνΈ:
| μΈν°νμ΄μ€ | ꡬνν λ©μλ | μ©λ |
|---|---|---|
LLMPort |
invoke(messages) -> str |
LLM λ°±μλ κ΅μ²΄ |
DBPort |
execute(sql) -> list[dict] |
DB λ°±μλ κ΅μ²΄ |
BaseComponent |
_run(*args) -> Any |
μ μ»΄ν¬λνΈ μΆκ° |
BaseFlow |
_run(*args) -> Any |
μ νμ΄νλΌμΈ μ‘°ν© |
TraceHook |
on_event(event) -> None |
컀μ€ν λͺ¨λν°λ§ |
ν둬ννΈ νμΌ μμΉ:
src/lang2sql/components/generation/prompts/
βββ default.md β db_dialect λ―Έμ§μ μ
βββ sqlite.md
βββ postgresql.md
βββ mysql.md
βββ bigquery.md
βββ duckdb.md