Skip to content

Latest commit

Β 

History

History
938 lines (706 loc) Β· 29.3 KB

File metadata and controls

938 lines (706 loc) Β· 29.3 KB

lang2sql Quickstart β€” 처음 μ‚¬μš©μžλ₯Ό μœ„ν•œ νŠœν† λ¦¬μ–Ό

이 λ¬Έμ„œλ₯Ό μœ„μ—μ„œ μ•„λž˜λ‘œ 따라가면 μ„€μΉ˜ β†’ API ν‚€ β†’ μƒ˜ν”Œ DB β†’ κΈ°λ³Έ νŒŒμ΄ν”„λΌμΈ β†’ Hook νŠΈλ ˆμ΄μ‹± β†’ μ»€μŠ€ν„°λ§ˆμ΄μ§•κΉŒμ§€ λͺ¨λ‘ μ²΄ν—˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€.


λͺ©μ°¨

  1. μ„€μΉ˜
  2. API ν‚€ μ„€μ •
  3. μƒ˜ν”Œ DB μ„ΈνŒ…
  4. SQLAlchemyDB μ—°κ²° μ„€μ •
  5. 예제 μΉ΄νƒˆλ‘œκ·Έ
  6. κΈ°λ³Έ μ‚¬μš© β€” BaselineNL2SQL
    • 6-A. Anthropic Claude + SQLite
    • 6-B. OpenAI GPT + SQLite
    • 6-C. PostgreSQL μ—°κ²°
    • 6-D. 지원 DB λ°©μ–Έ(dialect) λͺ©λ‘
  7. Hook으둜 μ‹€ν–‰ μΆ”μ ν•˜κΈ°
  8. κ³ κΈ‰ μ‚¬μš© β€” μ»΄ν¬λ„ŒνŠΈ 직접 μ‘°ν•©
  9. μ»€μŠ€ν„°λ§ˆμ΄μ§•
    • 9-A. μ‹œμŠ€ν…œ ν”„λ‘¬ν”„νŠΈ ꡐ체
    • 9-B. λ‚˜λ§Œμ˜ LLM μ—°κ²°
    • 9-C. λ‚˜λ§Œμ˜ DB μ—°κ²°
    • 9-D. μ»€μŠ€ν…€ μ»΄ν¬λ„ŒνŠΈ λ§Œλ“€κΈ°
    • 9-E. μ»€μŠ€ν…€ ν”Œλ‘œμš° λ§Œλ“€κΈ°
  10. μ—λŸ¬ 처리
  11. 전체 κΈ°λŠ₯ 체크리슀트

1. μ„€μΉ˜

pip install lang2sql

개발 ν™˜κ²½μ—μ„œ uvλ₯Ό μ‚¬μš©ν•˜λŠ” 경우:

uv sync

anthropic, sqlalchemyλŠ” κΈ°λ³Έ μ˜μ‘΄μ„±μ— ν¬ν•¨λ˜μ–΄ μžˆμ–΄ 별도 μ„€μΉ˜κ°€ ν•„μš” μ—†μŠ΅λ‹ˆλ‹€.

νŒ¨ν‚€μ§€ μ—…λ°μ΄νŠΈ ν›„ pyproject.toml이 λ³€κ²½λ˜μ—ˆλ‹€λ©΄ λ°˜λ“œμ‹œ uv syncλ₯Ό λ‹€μ‹œ μ‹€ν–‰ν•˜μ„Έμš”.


2. API ν‚€ μ„€μ •

OpenAI, Anthropic SDKλŠ” ν™˜κ²½λ³€μˆ˜λ₯Ό μžλ™μœΌλ‘œ μ½μŠ΅λ‹ˆλ‹€. api_keyλ₯Ό μ½”λ“œμ— 직접 μ“°μ§€ μ•Šμ•„λ„ λ©λ‹ˆλ‹€.

방법 A β€” ν™˜κ²½λ³€μˆ˜ (ꢌμž₯)

# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

# OpenAI
export OPENAI_API_KEY="sk-..."

방법 B β€” .env 파일

ν”„λ‘œμ νŠΈ λ£¨νŠΈμ— .env νŒŒμΌμ„ λ§Œλ“€κ³ :

ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

Python μ½”λ“œμ—μ„œ λ‘œλ“œ:

from dotenv import load_dotenv
load_dotenv()

방법 C β€” ν‚€ 이름이 λ‹€λ₯Ό λ•Œ

ν™˜κ²½λ³€μˆ˜ 이름이 λ‹€λ₯Έ 경우(예: OPEN_AI_KEY) api_key둜 직접 μ „λ‹¬ν•˜κ±°λ‚˜ ν‘œμ€€ μ΄λ¦„μœΌλ‘œ λ³΅μ‚¬ν•©λ‹ˆλ‹€:

import os

# 방법 1 β€” api_key둜 직접 전달
llm = OpenAILLM(model="gpt-4o", api_key=os.environ["OPEN_AI_KEY"])

# 방법 2 β€” SDKκ°€ μ½λŠ” ν‘œμ€€ μ΄λ¦„μœΌλ‘œ 볡사 (이후 api_key μƒλž΅ κ°€λŠ₯)
os.environ["OPENAI_API_KEY"] = os.environ["OPEN_AI_KEY"]
llm = OpenAILLM(model="gpt-4o")

3. μƒ˜ν”Œ DB μ„ΈνŒ…

νŠœν† λ¦¬μ–Ό μ½”λ“œλ₯Ό μ‹€μ œ DB둜 λ°”λ‘œ μ‹€ν–‰ν•΄λ³Ό 수 μžˆλ„λ‘ μƒ˜ν”Œ 데이터λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€. 고객 10λͺ…, μƒν’ˆ 12개, μ£Όλ¬Έ 44건, μ£Όλ¬Έν•­λͺ© 83κ°œκ°€ ν¬ν•¨λ©λ‹ˆλ‹€.

SQLite (μ„œλ²„ λΆˆν•„μš” β€” λ°”λ‘œ μ‹œμž‘ κ°€λŠ₯)

# ν”„λ‘œμ νŠΈ λ£¨νŠΈμ—μ„œ μ‹€ν–‰ β€” sample.db 파일이 ν˜„μž¬ 디렉토리에 μƒμ„±λ©λ‹ˆλ‹€
python scripts/setup_sample_db.py
μ—°κ²° 쀑: sqlite:///sample.db
ν…Œμ΄λΈ” 생성 μ™„λ£Œ: customers, products, orders, order_items
  고객:       10λͺ…
  μƒν’ˆ:       12개
  주문:       44건
  μ£Όλ¬Έ ν•­λͺ©:  83개

─── 검증 쿼리 κ²°κ³Ό ───────────────────────────────
  전체 μ£Όλ¬Έ 수:          44건
  gold λ“±κΈ‰ 고객 수:     3λͺ…
  재고 10개 미만 μƒν’ˆ:   5개
    - 무선 마우슀: 3개
    - ν›„λ“œ μ§‘μ—…: 4개
    ...
─────────────────────────────────────────────────
μ™„λ£Œ! μ•„λž˜ URL둜 quickstart.mdλ₯Ό 따라해 λ³΄μ„Έμš”:
  sqlite:///sample.db

sample.dbλŠ” 슀크립트λ₯Ό μ‹€ν–‰ν•œ 디렉토리에 μƒμ„±λ©λ‹ˆλ‹€. Python μ½”λ“œλ„ 같은 λ””λ ‰ν† λ¦¬μ—μ„œ μ‹€ν–‰ν•΄μ•Ό sqlite:///sample.db둜 μ—°κ²°λ©λ‹ˆλ‹€.

PostgreSQL (Docker μ‚¬μš©)

# 1. μ»¨ν…Œμ΄λ„ˆ 기동 (처음 ν•œ 번만)
docker compose -f docker/docker-compose-postgres.yml up -d

# 2. μƒ˜ν”Œ 데이터 μ‚½μž…
python scripts/setup_sample_db.py --db postgres

μ»€μŠ€ν…€ URL을 μ‚¬μš©ν•˜λ €λ©΄:

python scripts/setup_sample_db.py --url "postgresql://myuser:mypass@myhost:5432/mydb"

4. SQLAlchemyDB μ—°κ²° μ„€μ •

SQLAlchemyDBλŠ” lang2sql이 SQL을 μ‹€μ œ DBμ—μ„œ μ‹€ν–‰ν•  λ•Œ μ‚¬μš©ν•˜λŠ” DB μ–΄λŒ‘ν„°μž…λ‹ˆλ‹€. SQLAlchemy URL만 λ„˜κΈ°λ©΄ 연결이 μ™„λ£Œλ©λ‹ˆλ‹€.

from lang2sql.integrations.db import SQLAlchemyDB

# SQLite β€” 3λ²ˆμ—μ„œ μƒμ„±ν•œ sample.db에 λ°”λ‘œ μ—°κ²°
db = SQLAlchemyDB("sqlite:///sample.db")

# PostgreSQL
db = SQLAlchemyDB("postgresql://postgres:postgres@localhost:5432/postgres")

# MySQL
db = SQLAlchemyDB("mysql+mysqlconnector://user:pass@localhost:3306/dbname")

# DuckDB (인메λͺ¨λ¦¬)
db = SQLAlchemyDB("duckdb:///:memory:")

μ—°κ²° 확인

# execute()λ₯Ό 직접 ν˜ΈμΆœν•΄μ„œ 확인
rows = db.execute("SELECT name, grade FROM customers WHERE grade = 'gold'")
print(rows)
# [{'name': 'κΉ€μ² μˆ˜', 'grade': 'gold'}, {'name': 'λ°•μ˜ν¬', 'grade': 'gold'}, ...]

# ν…Œμ΄λΈ” λͺ©λ‘ 확인 (SQLite)
print(db.execute("SELECT name FROM sqlite_master WHERE type='table'"))
# [{'name': 'customers'}, {'name': 'products'}, {'name': 'orders'}, {'name': 'order_items'}]

5. 예제 μΉ΄νƒˆλ‘œκ·Έ

μΉ΄νƒˆλ‘œκ·ΈλŠ” μ–΄λ–€ ν…Œμ΄λΈ”μ΄ μžˆλŠ”μ§€ μ•Œλ €μ£ΌλŠ” 메타데이터 λͺ©λ‘μž…λ‹ˆλ‹€. KeywordRetrieverκ°€ 이 μΉ΄νƒˆλ‘œκ·Έλ₯Ό BM25둜 검색해 질문과 κ΄€λ ¨λœ ν…Œμ΄λΈ”μ„ μ°ΎμŠ΅λ‹ˆλ‹€.

ν˜„μž¬ 방식 β€” 직접 μ •μ˜

setup_sample_db.py둜 λ§Œλ“  DB와 컬럼 ꡬ쑰가 μΌμΉ˜ν•˜λ„λ‘ μ•„λž˜μ™€ 같이 μ •μ˜ν•©λ‹ˆλ‹€.

from lang2sql import CatalogEntry

CATALOG: list[CatalogEntry] = [
    {
        "name": "orders",
        "description": "고객 μ£Όλ¬Έ 정보 ν…Œμ΄λΈ”. μ£Όλ¬Έ 건수, κΈˆμ•‘, λ‚ μ§œ μ‘°νšŒμ— μ‚¬μš©.",
        "columns": {
            "order_id":    "주문 고유 ID (PK)",
            "customer_id": "μ£Όλ¬Έν•œ 고객 ID (FK β†’ customers)",
            "order_date":  "μ£Όλ¬Έ μΌμ‹œ (TIMESTAMP)",
            "amount":      "μ£Όλ¬Έ κΈˆμ•‘ (DECIMAL)",
            "status":      "μ£Όλ¬Έ μƒνƒœ: pending / confirmed / shipped / cancelled",
        },
    },
    {
        "name": "customers",
        "description": "고객 λ§ˆμŠ€ν„° 데이터. 고객 이름, κ°€μž…μΌ, λ“±κΈ‰ μ‘°νšŒμ— μ‚¬μš©.",
        "columns": {
            "customer_id":  "고객 고유 ID (PK)",
            "name":         "고객 이름",
            "email":        "이메일 μ£Όμ†Œ",
            "joined_at":    "κ°€μž… μΌμ‹œ (TIMESTAMP)",
            "grade":        "고객 λ“±κΈ‰: bronze / silver / gold",
        },
    },
    {
        "name": "products",
        "description": "μƒν’ˆ 정보 ν…Œμ΄λΈ”. μƒν’ˆλͺ…, μΉ΄ν…Œκ³ λ¦¬, 가격 μ‘°νšŒμ— μ‚¬μš©.",
        "columns": {
            "product_id":  "μƒν’ˆ 고유 ID (PK)",
            "name":        "μƒν’ˆλͺ…",
            "category":    "μΉ΄ν…Œκ³ λ¦¬: electronics / clothing / food",
            "price":       "판맀 가격 (DECIMAL)",
            "stock":       "ν˜„μž¬ 재고 μˆ˜λŸ‰ (INTEGER)",
        },
    },
    {
        "name": "order_items",
        "description": "주문별 μƒν’ˆ ꡬ성 ν…Œμ΄λΈ”. 주문에 ν¬ν•¨λœ μƒν’ˆκ³Ό μˆ˜λŸ‰ μ‘°νšŒμ— μ‚¬μš©.",
        "columns": {
            "item_id":    "ν•­λͺ© 고유 ID (PK)",
            "order_id":   "μ£Όλ¬Έ ID (FK β†’ orders)",
            "product_id": "μƒν’ˆ ID (FK β†’ products)",
            "quantity":   "μ£Όλ¬Έ μˆ˜λŸ‰ (INTEGER)",
            "unit_price": "μ£Όλ¬Έ λ‹Ήμ‹œ 단가 (DECIMAL)",
        },
    },
]

ν–₯ν›„ λ°©ν–₯ β€” μΉ΄νƒˆλ‘œκ·Έ μžλ™ 생성 (λ―Έκ΅¬ν˜„, μ œμ•ˆ)

방법 A β€” SQLAlchemy inspect둜 DB μŠ€ν‚€λ§ˆ 읽기

# (ν˜„μž¬ λ―Έκ΅¬ν˜„, 아이디어 μ˜ˆμ‹œ)
from sqlalchemy import create_engine, inspect

engine = create_engine("sqlite:///sample.db")
insp = inspect(engine)

catalog = []
for table_name in insp.get_table_names():
    columns = {
        col["name"]: str(col["type"])
        for col in insp.get_columns(table_name)
    }
    catalog.append({"name": table_name, "columns": columns})

방법 B β€” DataHub λ“± 메타데이터 ν”Œλž«νΌ 연동

  • 이미 utils/data/ μ•„λž˜μ— DataHub 연동 μ½”λ“œκ°€ μžˆμŠ΅λ‹ˆλ‹€.
  • ν–₯ν›„ CatalogEntry ν˜•μ‹μœΌλ‘œ λ³€ν™˜ν•˜λŠ” μ–΄λŒ‘ν„°λ₯Ό μΆ”κ°€ν•  μ˜ˆμ •μž…λ‹ˆλ‹€.

6. κΈ°λ³Έ μ‚¬μš© β€” BaselineNL2SQL

κ°€μž₯ λΉ λ₯Έ μ‚¬μš©λ²•μž…λ‹ˆλ‹€. LLMκ³Ό DB만 μ—°κ²°ν•˜λ©΄ μžμ—°μ–΄ β†’ SQL β†’ μ‹€ν–‰ κ²°κ³Όλ₯Ό μ–»μŠ΅λ‹ˆλ‹€.

db_dialect νŒŒλΌλ―Έν„°λ₯Ό λ°˜λ“œμ‹œ μ§€μ •ν•˜μ„Έμš”. SQLiteλŠ” MONTH(), YEAR() 같은 MySQL/PostgreSQL ν•¨μˆ˜λ₯Ό μ§€μ›ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€. db_dialectλ₯Ό μ§€μ •ν•˜λ©΄ ν•΄λ‹Ή DB에 λ§žλŠ” SQL ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•˜λŠ” ν”„λ‘¬ν”„νŠΈκ°€ μžλ™μœΌλ‘œ μ μš©λ©λ‹ˆλ‹€.

6-A. Anthropic Claude + SQLite

from lang2sql import BaselineNL2SQL
from lang2sql.integrations.llm import AnthropicLLM
from lang2sql.integrations.db import SQLAlchemyDB

pipeline = BaselineNL2SQL(
    catalog=CATALOG,
    llm=AnthropicLLM(model="claude-sonnet-4-6"),
    db=SQLAlchemyDB("sqlite:///sample.db"),
    db_dialect="sqlite",   # ← DB λ°©μ–Έ μ§€μ •
)

result = pipeline.run("μ΄λ²ˆλ‹¬μ— 10λ§Œμ› 이상 μ£Όλ¬Έν•œ 고객 이름과 μ£Όλ¬Έ κΈˆμ•‘μ„ μ•Œλ €μ€˜")
print(result)
# μ˜ˆμ‹œ 좜λ ₯: [{'name': 'κΉ€μ² μˆ˜', 'amount': Decimal('320000')}, ...]

result2 = pipeline.run("gold λ“±κΈ‰ 고객 λͺ©λ‘μ„ 이름 순으둜 λ³΄μ—¬μ€˜")
print(result2)
# μ˜ˆμ‹œ 좜λ ₯: [{'name': 'κΉ€μ² μˆ˜'}, {'name': 'λ°•μ˜ν¬'}, {'name': '이민쀀'}]

6-B. OpenAI GPT + SQLite

from lang2sql import BaselineNL2SQL
from lang2sql.integrations.llm import OpenAILLM
from lang2sql.integrations.db import SQLAlchemyDB

pipeline = BaselineNL2SQL(
    catalog=CATALOG,
    llm=OpenAILLM(model="gpt-4o"),
    db=SQLAlchemyDB("sqlite:///sample.db"),
    db_dialect="sqlite",
)

result = pipeline.run("μž¬κ³ κ°€ 10개 미만인 μƒν’ˆ λͺ©λ‘")
print(result)
# μ˜ˆμ‹œ 좜λ ₯: [{'name': '무선 마우슀', 'stock': 3}, {'name': 'ν›„λ“œ μ§‘μ—…', 'stock': 4}, ...]

6-C. PostgreSQL μ—°κ²°

pipeline = BaselineNL2SQL(
    catalog=CATALOG,
    llm=AnthropicLLM(model="claude-sonnet-4-6"),
    db=SQLAlchemyDB("postgresql://postgres:postgres@localhost:5432/postgres"),
    db_dialect="postgresql",
)

result = pipeline.run("gold λ“±κΈ‰ 고객의 총 μ£Όλ¬Έ κΈˆμ•‘")

6-D. 지원 DB λ°©μ–Έ(dialect) λͺ©λ‘

db_dialect λŒ€μƒ DB 적용 λ‚΄μš©
"sqlite" SQLite strftime() μ‚¬μš©, MONTH()/YEAR() μ‚¬μš© μ•ˆ 함
"postgresql" PostgreSQL DATE_TRUNC, EXTRACT, INTERVAL
"mysql" MySQL MONTH(), YEAR(), DATE_FORMAT()
"bigquery" Google BigQuery DATE_TRUNC, EXTRACT, FORMAT_DATE
"duckdb" DuckDB DATE_TRUNC, EXTRACT, INTERVAL
None / μƒλž΅ λ°©μ–Έ 무관 κΈ°λ³Έ ν”„λ‘¬ν”„νŠΈ (λ‚ μ§œ ν•¨μˆ˜ λ―Έμ§€μ •)

7. Hook으둜 μ‹€ν–‰ μΆ”μ ν•˜κΈ°

MemoryHook을 달면 각 μ»΄ν¬λ„ŒνŠΈκ°€ μ–Έμ œ, μ–Όλ§ˆλ‚˜ κ±Έλ ΈλŠ”μ§€, 무엇을 λ°›κ³  λ°˜ν™˜ν–ˆλŠ”μ§€ μ „λΆ€ κΈ°λ‘λ©λ‹ˆλ‹€.

from lang2sql import BaselineNL2SQL, MemoryHook
from lang2sql.integrations.llm import AnthropicLLM
from lang2sql.integrations.db import SQLAlchemyDB

hook = MemoryHook()

pipeline = BaselineNL2SQL(
    catalog=CATALOG,
    llm=AnthropicLLM(model="claude-sonnet-4-6"),
    db=SQLAlchemyDB("sqlite:///sample.db"),
    db_dialect="sqlite",
    hook=hook,
)

result = pipeline.run("주문 건수")

for event in hook.snapshot():
    dur = f" {event.duration_ms:.1f}ms" if event.duration_ms else ""
    print(f"[{event.name}] {event.component:20s} phase={event.phase}{dur}")

μ˜ˆμƒ 좜λ ₯:

[flow.run]      BaselineNL2SQL       phase=start
[component.run] KeywordRetriever     phase=start
[component.run] KeywordRetriever     phase=end    1.2ms
[component.run] SQLGenerator         phase=start
[component.run] SQLGenerator         phase=end    843.5ms
[component.run] SQLExecutor          phase=start
[component.run] SQLExecutor          phase=end    12.3ms
[flow.run]      BaselineNL2SQL       phase=end    857.0ms

이벀트 상세 정보

events = hook.snapshot()

gen_events = [e for e in events if e.component == "SQLGenerator"]
for e in gen_events:
    print(f"  phase       : {e.phase}")
    print(f"  duration_ms : {e.duration_ms}")
    print(f"  input       : {e.input_summary}")
    print(f"  output      : {e.output_summary}")

μ—λŸ¬ 좔적

hook.clear()

try:
    pipeline.run("...")
except Exception:
    pass

error_events = [e for e in hook.snapshot() if e.phase == "error"]
for e in error_events:
    print(f"μ»΄ν¬λ„ŒνŠΈ: {e.component}")
    print(f"μ—λŸ¬:     {e.error}")

μ»€μŠ€ν…€ Hook λ§Œλ“€κΈ°

on_event(event) ν•˜λ‚˜λ§Œ κ΅¬ν˜„ν•˜λ©΄ λ©λ‹ˆλ‹€.

class PrintHook:
    def on_event(self, event):
        if event.phase == "start":
            print(f"β–Ά {event.component} μ‹œμž‘")
        elif event.phase == "end":
            print(f"βœ“ {event.component} μ™„λ£Œ ({event.duration_ms:.0f}ms)")
        elif event.phase == "error":
            print(f"βœ— {event.component} 였λ₯˜: {event.error}")


pipeline = BaselineNL2SQL(
    catalog=CATALOG,
    llm=AnthropicLLM(model="claude-sonnet-4-6"),
    db=SQLAlchemyDB("sqlite:///sample.db"),
    db_dialect="sqlite",
    hook=PrintHook(),
)

result = pipeline.run("재고 λΆ€μ‘± μƒν’ˆ λͺ©λ‘")

μ˜ˆμƒ 좜λ ₯:

β–Ά KeywordRetriever μ‹œμž‘
βœ“ KeywordRetriever μ™„λ£Œ (1ms)
β–Ά SQLGenerator μ‹œμž‘
βœ“ SQLGenerator μ™„λ£Œ (921ms)
β–Ά SQLExecutor μ‹œμž‘
βœ“ SQLExecutor μ™„λ£Œ (8ms)

8. κ³ κΈ‰ μ‚¬μš© β€” μ»΄ν¬λ„ŒνŠΈ 직접 μ‘°ν•©

νŒŒμ΄ν”„λΌμΈμ„ μ“°μ§€ μ•Šκ³  μ»΄ν¬λ„ŒνŠΈλ₯Ό 직접 μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€. 각 단계 κ²°κ³Όλ₯Ό 쀑간에 ν™•μΈν•˜κ±°λ‚˜ 쑰건 λΆ„κΈ°λ₯Ό λ„£κ³  싢을 λ•Œ μœ μš©ν•©λ‹ˆλ‹€.

from lang2sql import KeywordRetriever, SQLGenerator, SQLExecutor
from lang2sql.integrations.llm import AnthropicLLM
from lang2sql.integrations.db import SQLAlchemyDB

retriever = KeywordRetriever(catalog=CATALOG, top_n=3)
generator = SQLGenerator(
    llm=AnthropicLLM(model="claude-sonnet-4-6"),
    db_dialect="sqlite",
)
executor  = SQLExecutor(db=SQLAlchemyDB("sqlite:///sample.db"))

query = "gold λ“±κΈ‰ 고객의 이번 달 μ£Όλ¬Έ 총앑"

# Step 1 β€” κ΄€λ ¨ μŠ€ν‚€λ§ˆ 검색
schemas = retriever.run(query)
print("κ²€μƒ‰λœ μŠ€ν‚€λ§ˆ:")
for s in schemas:
    print(f"  - {s['name']}: {s.get('description', '')}")

# Step 2 β€” SQL 생성
sql = generator.run(query, schemas)
print(f"\nμƒμ„±λœ SQL:\n{sql}")

# Step 3 β€” μ‹€ν–‰
rows = executor.run(sql)
print(f"\nκ²°κ³Ό: {rows}")

9. μ»€μŠ€ν„°λ§ˆμ΄μ§•

9-A. μ‹œμŠ€ν…œ ν”„λ‘¬ν”„νŠΈ ꡐ체

db_dialect에 μ—†λŠ” DB(Snowflake, Trino λ“±)λ‚˜ νŠΉλ³„ν•œ SQL μŠ€νƒ€μΌμ΄ ν•„μš”ν•  λ•Œ system_prompt둜 직접 μ§€μ •ν•©λ‹ˆλ‹€. system_promptλŠ” db_dialect보닀 μš°μ„ ν•©λ‹ˆλ‹€.

from lang2sql import SQLGenerator
from lang2sql.integrations.llm import AnthropicLLM

MY_PROMPT = """
You are a Snowflake SQL expert.
- Use DATEADD, DATEDIFF for date arithmetic
- Use TO_DATE() for date casting
- Use CURRENT_DATE() for today
- Return ONLY the SQL inside a ```sql ... ``` block
"""

generator = SQLGenerator(
    llm=AnthropicLLM(model="claude-sonnet-4-6"),
    system_prompt=MY_PROMPT,  # db_dialect λŒ€μ‹  직접 μ§€μ •
)

sql = generator.run("이번 달 μ£Όλ¬Έ 건수", schemas)

ν”„λ‘¬ν”„νŠΈ νŒŒμΌμ€ src/lang2sql/components/generation/prompts/ μ•„λž˜μ— μžˆμŠ΅λ‹ˆλ‹€. μƒˆλ‘œμš΄ dialectλ₯Ό μΆ”κ°€ν•˜λ €λ©΄ ν•΄λ‹Ή κ²½λ‘œμ— {dialect}.md νŒŒμΌμ„ λ§Œλ“€λ©΄ λ©λ‹ˆλ‹€.

9-B. λ‚˜λ§Œμ˜ LLM μ—°κ²°

invoke(messages) -> str ν•˜λ‚˜λ§Œ κ΅¬ν˜„ν•˜λ©΄ μ–΄λ–€ LLM이든 μ—°κ²°λ©λ‹ˆλ‹€.

# 예: LangChain λͺ¨λΈ κ·ΈλŒ€λ‘œ μ‚¬μš©
from langchain_openai import ChatOpenAI

class LangChainLLM:
    def __init__(self, model: str):
        self._llm = ChatOpenAI(model=model)

    def invoke(self, messages: list[dict]) -> str:
        from langchain_core.messages import HumanMessage, SystemMessage
        lc_msgs = []
        for m in messages:
            if m["role"] == "system":
                lc_msgs.append(SystemMessage(content=m["content"]))
            else:
                lc_msgs.append(HumanMessage(content=m["content"]))
        return self._llm.invoke(lc_msgs).content


pipeline = BaselineNL2SQL(
    catalog=CATALOG,
    llm=LangChainLLM("gpt-4o"),
    db=SQLAlchemyDB("sqlite:///sample.db"),
    db_dialect="sqlite",
)

9-C. λ‚˜λ§Œμ˜ DB μ—°κ²°

execute(sql) -> list[dict] ν•˜λ‚˜λ§Œ κ΅¬ν˜„ν•˜λ©΄ λ©λ‹ˆλ‹€.

# 예: pandas DataFrame을 DuckDB둜 쿼리
class PandasDB:
    def __init__(self, dataframes: dict):
        import duckdb
        self._conn = duckdb.connect()
        for name, df in dataframes.items():
            self._conn.register(name, df)

    def execute(self, sql: str) -> list[dict]:
        result = self._conn.execute(sql).fetchdf()
        return result.to_dict(orient="records")


import pandas as pd

pipeline = BaselineNL2SQL(
    catalog=CATALOG,
    llm=AnthropicLLM(model="claude-sonnet-4-6"),
    db=PandasDB({"orders": pd.read_csv("orders.csv")}),
    db_dialect="duckdb",
)

9-D. μ»€μŠ€ν…€ μ»΄ν¬λ„ŒνŠΈ λ§Œλ“€κΈ°

BaseComponentλ₯Ό μƒμ†ν•˜κ³  _run()만 κ΅¬ν˜„ν•©λ‹ˆλ‹€. Hook, μ—λŸ¬ 처리, 타이밍은 μžλ™μœΌλ‘œ μ²˜λ¦¬λ©λ‹ˆλ‹€.

from lang2sql.core.base import BaseComponent
from lang2sql.core.exceptions import ComponentError


class SQLValidator(BaseComponent):
    """μƒμ„±λœ SQL에 μœ„ν—˜ν•œ ν‚€μ›Œλ“œκ°€ 있으면 ComponentErrorλ₯Ό λ°œμƒμ‹œν‚΅λ‹ˆλ‹€."""

    FORBIDDEN = {"DROP", "DELETE", "TRUNCATE", "UPDATE", "INSERT", "ALTER"}

    def _run(self, sql: str) -> str:
        tokens = set(sql.upper().split())
        bad = tokens & self.FORBIDDEN
        if bad:
            raise ComponentError(self.name, f"μœ„ν—˜ν•œ ν‚€μ›Œλ“œ 감지: {bad}")
        return sql


validator = SQLValidator()
safe_sql = validator.run("SELECT COUNT(*) FROM orders")   # OK
# validator.run("DROP TABLE orders")                      # ComponentError λ°œμƒ

9-E. μ»€μŠ€ν…€ ν”Œλ‘œμš° λ§Œλ“€κΈ°

BaseFlowλ₯Ό μƒμ†ν•˜κ³  _run()μ—μ„œ 순수 Python μ œμ–΄νλ¦„μœΌλ‘œ μ»΄ν¬λ„ŒνŠΈλ₯Ό μ‘°ν•©ν•©λ‹ˆλ‹€.

from lang2sql.core.base import BaseFlow
from lang2sql.core.exceptions import ComponentError


class SafeNL2SQL(BaseFlow):
    """검증 단계λ₯Ό μΆ”κ°€ν•œ νŒŒμ΄ν”„λΌμΈ."""

    def __init__(self, *, catalog, llm, db, hook=None):
        super().__init__(name="SafeNL2SQL", hook=hook)
        self._retriever = KeywordRetriever(catalog=catalog, hook=hook)
        self._generator = SQLGenerator(llm=llm, db_dialect="sqlite", hook=hook)
        self._validator = SQLValidator(hook=hook)
        self._executor  = SQLExecutor(db=db, hook=hook)

    def _run(self, query: str):
        schemas = self._retriever.run(query)
        sql     = self._generator.run(query, schemas)
        sql     = self._validator.run(sql)   # 검증 톡과 μ‹œμ—λ§Œ μ‹€ν–‰
        return  self._executor.run(sql)


pipeline = SafeNL2SQL(
    catalog=CATALOG,
    llm=AnthropicLLM(model="claude-sonnet-4-6"),
    db=SQLAlchemyDB("sqlite:///sample.db"),
)
result = pipeline.run("주문 건수")

μž¬μ‹œλ„ 둜직 μ˜ˆμ‹œ

class RetryNL2SQL(BaseFlow):
    """SQL 생성 μ‹€νŒ¨ μ‹œ μ΅œλŒ€ 3번 μž¬μ‹œλ„ν•©λ‹ˆλ‹€."""

    def __init__(self, *, catalog, llm, db, hook=None):
        super().__init__(name="RetryNL2SQL", hook=hook)
        self._retriever = KeywordRetriever(catalog=catalog, hook=hook)
        self._generator = SQLGenerator(llm=llm, db_dialect="sqlite", hook=hook)
        self._executor  = SQLExecutor(db=db, hook=hook)

    def _run(self, query: str):
        schemas = self._retriever.run(query)

        last_error = None
        for attempt in range(3):
            try:
                sql = self._generator.run(query, schemas)
                return self._executor.run(sql)
            except ComponentError as e:
                last_error = e
                print(f"  μ‹œλ„ {attempt + 1} μ‹€νŒ¨: {e}")

        raise last_error

10. μ—λŸ¬ 처리

from lang2sql import ComponentError, IntegrationMissingError, Lang2SQLError

try:
    result = pipeline.run("주문 건수")
except ComponentError as e:
    # νŠΉμ • μ»΄ν¬λ„ŒνŠΈμ—μ„œ λ°œμƒν•œ μ—λŸ¬
    print(f"μ»΄ν¬λ„ŒνŠΈ 였λ₯˜: {e.component}")
    print(f"λ©”μ‹œμ§€: {e}")
    if e.cause:
        print(f"원인: {e.cause}")

except IntegrationMissingError as e:
    # νŒ¨ν‚€μ§€ λ―Έμ„€μΉ˜ (anthropic, sqlalchemy λ“±)
    print(f"νŒ¨ν‚€μ§€ λ―Έμ„€μΉ˜: {e}")
    # β†’ uv sync λ˜λŠ” pip install lang2sql μž¬μ‹€ν–‰

except Lang2SQLError as e:
    # κ·Έ μ™Έ lang2sql 도메인 μ—λŸ¬
    print(f"도메인 였λ₯˜: {e}")

μ—λŸ¬ 계측:

Lang2SQLError
β”œβ”€β”€ ComponentError          β€” μ»΄ν¬λ„ŒνŠΈ μ‹€ν–‰ μ‹€νŒ¨ (component, cause 속성)
β”œβ”€β”€ IntegrationMissingError β€” 선택적 νŒ¨ν‚€μ§€ λ―Έμ„€μΉ˜
└── ValidationError         β€” 검증 μ‹€νŒ¨

자주 λ°œμƒν•˜λŠ” μ—λŸ¬:

μ—λŸ¬ 원인 ν•΄κ²°
IntegrationMissingError: anthropic anthropic λ―Έμ„€μΉ˜ uv sync
OpenAIError: api_key must be set ν™˜κ²½λ³€μˆ˜ λ―Έμ„€μ • export OPENAI_API_KEY=...
no such table: products DB URL μ˜€νƒ€ λ˜λŠ” sample.db 미생성 python scripts/setup_sample_db.py μ‹€ν–‰
no such function: MONTH SQLiteμ—μ„œ MySQL ν•¨μˆ˜ μ‚¬μš© db_dialect="sqlite" μ§€μ •

11. 전체 κΈ°λŠ₯ 체크리슀트

μ‹€μ œ API ν‚€λ‚˜ DB 없이도 FakeLLM/FakeDB둜 전체 흐름을 확인할 수 μžˆμŠ΅λ‹ˆλ‹€.

"""
lang2sql 전체 κΈ°λŠ₯ 체크리슀트
μ•„λž˜ μ½”λ“œλ₯Ό μˆœμ„œλŒ€λ‘œ μ‹€ν–‰ν•˜λ©΄ λͺ¨λ“  κΈ°λŠ₯을 ν…ŒμŠ€νŠΈν•  수 μžˆμŠ΅λ‹ˆλ‹€.
"""

# ── 0. μ˜μ‘΄μ„± μ—†λŠ” Fake κ΅¬ν˜„ ──────────────────────────────────────────────────

class FakeLLM:
    """μ‹€μ œ API ν‚€ 없이 ν…ŒμŠ€νŠΈν•  수 μžˆλŠ” κ°€μ§œ LLM."""
    def invoke(self, messages):
        user_msg = next(m["content"] for m in reversed(messages) if m["role"] == "user")
        if "μ£Όλ¬Έ" in user_msg:
            return "```sql\nSELECT COUNT(*) AS cnt FROM orders\n```"
        if "고객" in user_msg:
            return "```sql\nSELECT name FROM customers ORDER BY name\n```"
        return "```sql\nSELECT 1\n```"

class FakeDB:
    """μ‹€μ œ DB 없이 ν…ŒμŠ€νŠΈν•  수 μžˆλŠ” κ°€μ§œ DB."""
    _data = {
        "SELECT COUNT(*) AS cnt FROM orders": [{"cnt": 44}],
        "SELECT name FROM customers ORDER BY name": [{"name": "κΉ€μ² μˆ˜"}, {"name": "λ°•μ˜ν¬"}],
    }
    def execute(self, sql):
        return self._data.get(sql, [{"result": "ok"}])


# ── 1. μΉ΄νƒˆλ‘œκ·Έ μ •μ˜ ──────────────────────────────────────────────────────────

from lang2sql import CatalogEntry

catalog: list[CatalogEntry] = [
    {
        "name": "orders",
        "description": "μ£Όλ¬Έ 정보 ν…Œμ΄λΈ”",
        "columns": {"order_id": "PK", "customer_id": "FK", "amount": "κΈˆμ•‘"},
    },
    {
        "name": "customers",
        "description": "고객 λ§ˆμŠ€ν„° 데이터",
        "columns": {"customer_id": "PK", "name": "이름", "grade": "λ“±κΈ‰"},
    },
]


# ── 2. KeywordRetriever 단독 ν…ŒμŠ€νŠΈ ───────────────────────────────────────────

from lang2sql import KeywordRetriever

retriever = KeywordRetriever(catalog=catalog, top_n=2)
schemas = retriever.run("주문 건수 쑰회")
print("βœ“ KeywordRetriever")
print(f"  검색 κ²°κ³Ό: {[s['name'] for s in schemas]}")
# μ˜ˆμƒ: ['orders']


# ── 3. SQLGenerator 단독 ν…ŒμŠ€νŠΈ ───────────────────────────────────────────────

from lang2sql import SQLGenerator

generator = SQLGenerator(llm=FakeLLM())
sql = generator.run("주문 건수", schemas)
print("\nβœ“ SQLGenerator")
print(f"  생성 SQL: {sql}")
# μ˜ˆμƒ: SELECT COUNT(*) AS cnt FROM orders


# ── 4. SQLExecutor 단독 ν…ŒμŠ€νŠΈ ────────────────────────────────────────────────

from lang2sql import SQLExecutor

executor = SQLExecutor(db=FakeDB())
rows = executor.run(sql)
print("\nβœ“ SQLExecutor")
print(f"  κ²°κ³Ό: {rows}")
# μ˜ˆμƒ: [{'cnt': 44}]


# ── 5. BaselineNL2SQL (κΈ°λ³Έ νŒŒμ΄ν”„λΌμΈ) ν…ŒμŠ€νŠΈ ────────────────────────────────

from lang2sql import BaselineNL2SQL

pipeline = BaselineNL2SQL(catalog=catalog, llm=FakeLLM(), db=FakeDB())
result = pipeline.run("주문 건수")
print("\nβœ“ BaselineNL2SQL")
print(f"  κ²°κ³Ό: {result}")
# μ˜ˆμƒ: [{'cnt': 44}]


# ── 6. db_dialect ν…ŒμŠ€νŠΈ ──────────────────────────────────────────────────────

pipeline_sqlite = BaselineNL2SQL(
    catalog=catalog, llm=FakeLLM(), db=FakeDB(), db_dialect="sqlite"
)
result = pipeline_sqlite.run("주문 건수")
print("\nβœ“ db_dialect='sqlite'")
print(f"  κ²°κ³Ό: {result}")


# ── 7. MemoryHook νŠΈλ ˆμ΄μ‹± ν…ŒμŠ€νŠΈ ─────────────────────────────────────────────

from lang2sql import MemoryHook

hook = MemoryHook()
pipeline_traced = BaselineNL2SQL(catalog=catalog, llm=FakeLLM(), db=FakeDB(), hook=hook)
pipeline_traced.run("주문 건수")

events = hook.snapshot()
print("\nβœ“ MemoryHook 이벀트")
for e in events:
    dur = f" {e.duration_ms:.1f}ms" if e.duration_ms else ""
    print(f"  [{e.name}] {e.component:20s} phase={e.phase}{dur}")

component_starts = [e for e in events if e.name == "component.run" and e.phase == "start"]
print(f"\n  component start 이벀트 수: {len(component_starts)}  (μ˜ˆμƒ: 3)")


# ── 8. μ—λŸ¬ 처리 ν…ŒμŠ€νŠΈ ────────────────────────────────────────────────────────

from lang2sql import ComponentError

class BrokenLLM:
    def invoke(self, messages):
        return "SQL 없이 일반 ν…μŠ€νŠΈλ§Œ λ°˜ν™˜"  # μ½”λ“œλΈ”λ‘ μ—†μŒ

try:
    bad_pipeline = BaselineNL2SQL(catalog=catalog, llm=BrokenLLM(), db=FakeDB())
    bad_pipeline.run("μ£Όλ¬Έ")
except ComponentError as e:
    print(f"\nβœ“ ComponentError 정상 λ°œμƒ")
    print(f"  μ»΄ν¬λ„ŒνŠΈ: {e.component}")
    print(f"  λ©”μ‹œμ§€:   {e}")


# ── 9. μ»€μŠ€ν…€ Hook ν…ŒμŠ€νŠΈ ─────────────────────────────────────────────────────

class PrintHook:
    def on_event(self, event):
        if event.phase == "start":
            print(f"  β–Ά {event.component}")
        elif event.phase == "end":
            print(f"  βœ“ {event.component} ({event.duration_ms:.0f}ms)")

print("\nβœ“ μ»€μŠ€ν…€ PrintHook")
BaselineNL2SQL(catalog=catalog, llm=FakeLLM(), db=FakeDB(), hook=PrintHook()).run("고객 λͺ©λ‘")


# ── 10. μ»€μŠ€ν…€ μ»΄ν¬λ„ŒνŠΈ ν…ŒμŠ€νŠΈ ─────────────────────────────────────────────────

from lang2sql.core.base import BaseComponent

class UpperCaseSQL(BaseComponent):
    """SQL을 λŒ€λ¬Έμžλ‘œ λ³€ν™˜ν•˜λŠ” ν›„μ²˜λ¦¬ μ»΄ν¬λ„ŒνŠΈ."""
    def _run(self, sql: str) -> str:
        return sql.upper()

upper = UpperCaseSQL()
print(f"\nβœ“ μ»€μŠ€ν…€ BaseComponent")
print(f"  κ²°κ³Ό: {upper.run('select 1')}")
# μ˜ˆμƒ: SELECT 1


# ── 11. public API import 확인 ────────────────────────────────────────────────

from lang2sql import (
    CatalogEntry, LLMPort, DBPort,
    KeywordRetriever, SQLGenerator, SQLExecutor,
    BaselineNL2SQL,
    TraceHook, MemoryHook, NullHook,
    Lang2SQLError, ComponentError, IntegrationMissingError,
)
print("\nβœ“ λͺ¨λ“  public import 성곡")

print("\n" + "="*50)
print("λͺ¨λ“  체크리슀트 톡과! lang2sql μ€€λΉ„ μ™„λ£Œ.")
print("="*50)

μ°Έκ³ : μ•„ν‚€ν…μ²˜ ν•œλˆˆμ— 보기

BaselineNL2SQL.run("μžμ—°μ–΄ 질문")
β”‚
β”œβ”€β”€ KeywordRetriever.run(query)
β”‚   └── BM25 ν‚€μ›Œλ“œ 검색 β†’ list[CatalogEntry]
β”‚
β”œβ”€β”€ SQLGenerator.run(query, schemas)
β”‚   β”œβ”€β”€ _load_prompt(db_dialect)  β†’ prompts/{dialect}.md λ‘œλ“œ
β”‚   β”œβ”€β”€ _build_context(schemas)   β†’ μŠ€ν‚€λ§ˆ ν…μŠ€νŠΈ ꡬ성
β”‚   β”œβ”€β”€ llm.invoke(messages)      β†’ LLM 호좜
β”‚   └── _extract_sql(response)    β†’ ```sql...``` νŒŒμ‹±
β”‚
└── SQLExecutor.run(sql)
    └── db.execute(sql)           β†’ list[dict]

λͺ¨λ“  λ‹¨κ³„μ—μ„œ Hook이 start / end / error 이벀트λ₯Ό κΈ°λ‘ν•©λ‹ˆλ‹€.

μ»΄ν¬λ„ŒνŠΈ ν™•μž₯ 포인트:

μΈν„°νŽ˜μ΄μŠ€ κ΅¬ν˜„ν•  λ©”μ„œλ“œ μš©λ„
LLMPort invoke(messages) -> str LLM λ°±μ—”λ“œ ꡐ체
DBPort execute(sql) -> list[dict] DB λ°±μ—”λ“œ ꡐ체
BaseComponent _run(*args) -> Any μƒˆ μ»΄ν¬λ„ŒνŠΈ μΆ”κ°€
BaseFlow _run(*args) -> Any μƒˆ νŒŒμ΄ν”„λΌμΈ μ‘°ν•©
TraceHook on_event(event) -> None μ»€μŠ€ν…€ λͺ¨λ‹ˆν„°λ§

ν”„λ‘¬ν”„νŠΈ 파일 μœ„μΉ˜:

src/lang2sql/components/generation/prompts/
β”œβ”€β”€ default.md      ← db_dialect λ―Έμ§€μ • μ‹œ
β”œβ”€β”€ sqlite.md
β”œβ”€β”€ postgresql.md
β”œβ”€β”€ mysql.md
β”œβ”€β”€ bigquery.md
└── duckdb.md