Skip to content

feat: Group 1 Critical Refactoring and Docker Setup Optimization#15

Open
stavp-star wants to merge 6 commits into
mainfrom
stav/critic-bug-fixes
Open

feat: Group 1 Critical Refactoring and Docker Setup Optimization#15
stavp-star wants to merge 6 commits into
mainfrom
stav/critic-bug-fixes

Conversation

@stavp-star

@stavp-star stavp-star commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

This PR resolves the 5 production-blocking bugs in Group 1 (TTS-G1-01 through TTS-G1-05):

  1. Parallelized schema profiling with concurrency limits.
  2. Eliminated os.environ mutations by creating a global Langfuse manager.
  3. Added a startup fail-fast check for the LLM judge.
  4. Hardened evaluation error handling and state tracking.
  5. Prevented schema hallucinations with a Redis cache layer.

It also optimizes Docker Compose settings to build and run out-of-the-box using HTTPS for public dependencies, removing any SSH deploy_key requirements.

Summary by CodeRabbit

Release Notes

  • New Features

    • Schema exploration now validates predicted tables (cached + live checks) and retries when hallucinations are suspected.
    • Added /health/judge and /health/esca endpoints.
    • Refinement now preserves detailed error history and improves handling of inline previews.
  • Configuration

    • Added Redis URL, profile fetch concurrency, optional OpenAI API key, and Langfuse credential/base URL fields.
  • Infrastructure / Behavior

    • Switched builds to HTTPS-only dependency syncing (no SSH deploy keys).
    • OpenAI key is required at startup; enhanced tracing integration and improved SQL handling (incl. parameter support).

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds table-hallucination detection to schema_explorer_node using Redis caching and Trino information_schema validation with conditional graph loop-back, centralizes Langfuse observability into core/langfuse.py and wires it via FastAPI DI, refactors refiner and finalizer nodes to use async context managers and an LLM factory, migrates both Dockerfiles from SSH deploy-key to HTTPS dependency sync, and adds OpenAI API key startup validation with health endpoints.

Changes

Runtime Contracts and Shared Utilities

Layer / File(s) Summary
State and settings extensions
agent/src/agent/state.py, agent/src/agent/config.py, backend/app/config.py, core/src/core/config.py
AgentState gains optional fields for last_error, hallucinated_tables, esca_write_failed, inline_result_rows, error_history, and schema_explorer_retry_count; AgentSettings adds PROFILE_FETCH_CONCURRENCY and REDIS_URL; backend Settings adds OPENAI_API_KEY; CoreSettings adds Langfuse credentials (LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_BASE_URL).
LLM factory and Esca/SQL utilities
agent/src/agent/llm.py, agent/src/agent/utils/esca.py, agent/src/agent/utils/sql.py, core/src/core/trino.py
New get_llm() module provides factory for ChatOpenAI instances; new get_esca_client() async context manager ensures client lifecycle; new clean_sql() normalizes LLM SQL; execute_query_sync now accepts optional params for parameterized queries.

Langfuse Observability Centralization

Layer / File(s) Summary
Core Langfuse configuration and singleton handler
core/src/core/langfuse.py
Module-level CallbackHandler singleton initialized from settings with error handling; exports get_langfuse_handler() for FastAPI dependency injection.
Langfuse DI wiring in consumers
agent/src/agent/routers/chat.py, backend/app/services/llm_judge.py
Chat endpoint receives injected langfuse_handler via Depends(get_langfuse_handler) and conditionally includes it in callbacks; evaluate_with_llm decorated with @observe(name="evaluate_with_llm") and reads API key directly from settings.

Schema Hallucination Detection and Graph Routing

Layer / File(s) Summary
Graph routing and rejection router state
agent/src/agent/graph.py
rejection_router_node now initializes downstream fields (sql_query, schema_plan, raw_data_ref, trino_error) alongside feedback_route; new route_schema_explorer function loops back to schema_explorer when hallucinated_tables is present and retry count allows, otherwise advances to query_builder.
Schema explorer: concurrent profiling and hallucination detection
agent/src/agent/nodes/schema_explorer.py
SchemaExplorerOutput gains tables_used field; profile fetching refactored to semaphore-limited asyncio.gather; post-LLM validation checks each table via Redis cache then Trino information_schema.tables, accumulating hallucinated_tables and returning feedback/error state or clean state; Esca persistence switched to get_esca_client() context manager.
Query builder LLM adoption
agent/src/agent/nodes/query_builder.py
Switches to get_llm("query_builder") factory; preserves chain invocation and interactive approval flow.

Refiner and Finalizer Node Updates

Layer / File(s) Summary
Refiner node: async query execution and error tracking
agent/src/agent/nodes/refiner.py
Uses get_llm("refiner"); new build_refiner_schema_context helper derives schema context from table profiles; refiner_node executes Trino queries via asyncio.to_thread, maintains error_history across refinements, injects schema context and error history into LLM prompts, and uses get_esca_client() context manager for persistence with esca_write_failed flag tracking.
Finalizer node: preview selection and context managers
agent/src/agent/nodes/finalizer.py
Uses get_llm("finalizer") and get_esca_client() context manager; get_esca_preview fetches head via load_head and builds preview JSON; finalizer_node selects preview from inline results when ESCA write failed or no reference exists, otherwise from ESCA.

Agent Entry Points and Extraction

Layer / File(s) Summary
MCP server and extractor formatting
agent/src/agent/mcp_server.py, agent/src/agent/nodes/extractor.py
MCP server resume/new-session handling preserves logic with reformatting; extractor imports and node wiring adjusted while enrichment behavior unchanged.

Backend Infrastructure: Validation and Health Checks

Layer / File(s) Summary
OpenAI API key startup validation and health endpoints
backend/app/main.py, backend/app/routers/health.py
lifespan checks OPENAI_API_KEY presence and raises RuntimeError on startup if missing; new /health/judge endpoint exposes key presence status; new /health/esca endpoint resolves ESCA config, performs ping() or load_head() connectivity check with 5s timeout, returns HTTP 503 on failure (except 404), always closes client.
Backend service cleanup and removals
backend/app/routers/orchestration.py, backend/app/routers/publish.py, backend/app/routers/evaluation.py, backend/app/services/langfuse_client.py, backend/app/services/evaluator.py, backend/app/services/profiling_engine.py, core/src/core/models/models.py
Stubbed orchestration helpers and local publish regression checker removed; deprecated Langfuse client compatibility methods deleted; Langfuse context usage replaced with langfuse_client conditioned on trace presence; profiling/trino service imports simplified; core models drop EvalResultStatus enum and EvaluationHistoryMetricRead DTO.

Deployment Infrastructure

Layer / File(s) Summary
HTTPS dependency sync and compose cleanup
agent/Dockerfile, backend/Dockerfile, docker-compose.yml, frontend/.env, .gitignore
Both Dockerfiles install git without openssh-client and run plain HTTPS uv sync; docker-compose.yml removes deploy_key secret references and activates backend startup command for app seeding/infra init; frontend API URL env entry retained; .gitignore adds deployment/scratch artifacts.
Infrastructure seeding updates
backend/app/infra_init.py
Resolves MinIO/Trino/OpenMetadata from environment variables instead of hardcoded hostnames; removes _TRINO_CATALOG constant; updates several Trino seed SQL literals for formatting consistency; Iceberg catalog connection string prefers DATABASE_URL.

Development Tooling

Layer / File(s) Summary
Ruff linting and import normalization
agent/pyproject.toml, backend/pyproject.toml, core/pyproject.toml, backend/app/routers/*, backend/app/services/*
Adds ruff>=0.3.0 to agent dev dependencies, enables TID lint rules with project-specific banned-API guidance; suppresses TID251 in llm.py and esca.py; adds local path overrides for python-core-utils; backend routers and services apply corresponding import-order restructuring (core imports before FastAPI/SQLModel).

Sequence Diagram(s)

sequenceDiagram
  participant schema_explorer_node
  participant Redis
  participant Trino as Trino<br/>(information_schema)
  participant LLM
  participant route_schema_explorer
  
  schema_explorer_node->>schema_explorer_node: asyncio.gather concurrent profile fetches<br/>(semaphore-limited by PROFILE_FETCH_CONCURRENCY)
  schema_explorer_node->>LLM: invoke with aggregated profiles
  LLM-->>schema_explorer_node: structured plan with tables_used
  loop for each table in tables_used
    schema_explorer_node->>Redis: GET table_exists:{table}
    alt cache miss
      schema_explorer_node->>Trino: query information_schema.tables (asyncio.to_thread)
      Trino-->>schema_explorer_node: row found or empty result
      schema_explorer_node->>Redis: SET table_exists key
    end
  end
  alt hallucinated_tables found
    schema_explorer_node-->>route_schema_explorer: return hallucinated_tables + feedback + last_error
    route_schema_explorer-->>schema_explorer_node: loop back (if retry count allows)
  else no hallucinations
    schema_explorer_node-->>route_schema_explorer: return schema_plan + hallucinated_tables=None
    route_schema_explorer-->>route_schema_explorer: proceed to query_builder
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • StavPonte11/text2sql-onboarding#10: Further modifies the agent's graph routing and node logic (agent/src/agent/graph.py, refiner, schema_explorer), and also migrates dependency sync approach in agent/Dockerfile.
  • StavPonte11/text2sql-onboarding#13: Modifies agent/src/agent/nodes/schema_explorer.py with overlapping changes to table-candidate logic and state/output fields.

Poem

🐇 Tables checked twice by Redis and Trino's keen eye,
Hallucinations caught in the loop as it rounds by,
SSH keys fading—HTTPS stands tall,
Langfuse observes each token's call,
Async refiner threads spin with care,
The rabbit ensures quality everywhere! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title effectively summarizes the primary objective of a comprehensive refactoring and optimization effort addressing five production-blocking bugs and Docker setup improvements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch stav/critic-bug-fixes

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
agent/src/agent/nodes/refiner.py (1)

58-58: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clear last_error on successful refinement execution.

After a successful run, stale last_error may remain in state and be misinterpreted by downstream consumers/telemetry.

Suggested fix
-        return {"trino_error": None, "raw_data_ref": raw_ref}
+        return {"trino_error": None, "last_error": None, "raw_data_ref": raw_ref}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/nodes/refiner.py` at line 58, The return statement in the
successful refinement execution path clears the `trino_error` by setting it to
None, but does not clear the `last_error` field. This can cause stale errors
from previous failed attempts to persist in the state and mislead downstream
consumers. Add `last_error: None` to the return dictionary alongside
`trino_error: None` and `raw_data_ref: raw_ref` to ensure all error state is
properly reset after successful execution.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agent/Dockerfile`:
- Line 4: The apt-get install commands in both Dockerfiles increase image size
by installing unnecessary recommended packages and retaining package metadata.
In agent/Dockerfile at line 4, modify the RUN command that installs git to
include the --no-install-recommends flag and append a cleanup command to remove
/var/lib/apt/lists/* in the same layer to ensure the metadata is not cached.
Apply the identical pattern to backend/Dockerfile at line 5 for the apt-get
install command there to maintain consistency and achieve the same image size
reduction benefits across both images.

In `@agent/src/agent/config.py`:
- Line 16: The PROFILE_FETCH_CONCURRENCY field allows zero or negative values
which will cause asyncio.Semaphore to block indefinitely. Add validation to the
PROFILE_FETCH_CONCURRENCY field definition to enforce that it must be a positive
integer (greater than 0), either through a Pydantic validator, field constraint,
or similar mechanism depending on the configuration framework being used.

In `@agent/src/agent/graph.py`:
- Around line 79-85: The route_schema_explorer function creates an unbounded
loop by repeatedly routing back to "schema_explorer" when hallucinated_tables
persist. Add a retry budget mechanism to prevent infinite looping: introduce a
counter field in the AgentState (such as schema_explorer_retry_count) to track
how many times the schema explorer has looped, increment this counter each time
route_schema_explorer routes back to itself, and modify the
route_schema_explorer function to check if the retry count has exceeded a
defined maximum threshold (e.g., 3 attempts). When the threshold is exceeded,
return "query_builder" regardless of whether hallucinated_tables exist, ensuring
the workflow eventually progresses and respects runtime/request budgets.

In `@agent/src/agent/nodes/schema_explorer.py`:
- Around line 372-380: When the hallucination check passes (in the else block),
you are only resetting hallucinated_tables to None but leaving feedback and
last_error with stale values from a previous iteration. This causes downstream
routing to misroute based on old feedback. In the else block where
hallucinated_tables is set to None, also explicitly set feedback and last_error
to None so that stale error state does not persist and affect downstream routing
decisions.
- Line 355: The SQL query construction on line 355 of schema_explorer.py
directly interpolates the `cat`, `sch`, and `tbl` variables into the SQL string
using f-string formatting, creating a SQL injection vulnerability since these
values come from untrusted model output in `tables_used`. Replace the direct
string interpolation with parameterized query placeholders and pass the catalog,
schema, and table name values as separate bound parameters to the database query
execution method, ensuring they are treated as data values rather than
executable SQL code.
- Around line 363-371: The table verification code currently fails open when
validation infrastructure fails—it logs exceptions and continues without marking
problematic tables as hallucinated. In the exception handler for information
schema checks (the inner try-except catching Exception as e), append t_name to
the hallucinated list instead of just printing the error. In the outer exception
handler for the Redis/Trino verification (catching Exception as e), ensure
tables are marked as hallucinated or otherwise fail in a closed manner so that
invalid tables do not pass through when verification infrastructure is degraded.
Replace print() calls with appropriate logging to maintain production-grade
error reporting.

In `@backend/app/main.py`:
- Around line 43-44: Remove the redundant import of settings at line 43 since
settings is already imported at line 12 at the top of the file. Additionally,
replace the getattr call with direct attribute access to settings.OPENAI_API_KEY
since OPENAI_API_KEY is now a defined field in the Settings class, simplifying
the conditional check from getattr(settings, "OPENAI_API_KEY", None) to simply
settings.OPENAI_API_KEY.
- Around line 44-46: There is a logical contradiction between the startup guard
in main.py that unconditionally raises RuntimeError when OPENAI_API_KEY is
missing and the fallback logic in llm_judge.py that gracefully returns 0.0
scores when the key is absent. Decide whether OPENAI_API_KEY is truly required
for the application to function: if it is required, remove the fallback logic in
llm_judge.py (the section that returns 0.0 scores when the key is missing); if
graceful degradation is intended, remove or conditionally apply the startup
guard in main.py so the server can start without the key and let the fallback
logic in llm_judge.py handle the absence gracefully.

In `@backend/app/routers/health.py`:
- Around line 182-188: The error branch in the get_judge_health() function is
unreachable dead code because the startup guard in main.py prevents the server
from starting if OPENAI_API_KEY is missing. Remove the conditional check for
api_key and the error branch (the if not api_key block), and simplify the
function to always return the success response. Also remove the unnecessary
getattr call and the import statement since OPENAI_API_KEY is guaranteed to
exist if the server is running.

In `@core/src/core/langfuse.py`:
- Around line 4-12: In core/src/core/langfuse.py at lines 4-12, wrap the
_langfuse_handler CallbackHandler initialization in a try-except block and set
_langfuse_handler to None or a no-op handler if initialization fails, then
modify the get_langfuse_handler function to return None or the no-op handler
when creation fails. In agent/src/agent/routers/chat.py at lines 50-55, add a
conditional check to only include langfuse_handler in the callbacks list when it
is available and not None; if it is None or unavailable, invoke the graph
without including it in the callbacks parameter.

In `@frontend/.env`:
- Line 1: The VITE_API_URL environment variable in the frontend configuration
has been incorrectly set to http://localhost:3000, which is the frontend's own
port, creating a circular proxy that routes API requests back to the frontend
instead of to the backend. Revert VITE_API_URL back to http://localhost:8000 to
properly target the backend service, or if the backend port has changed, update
it to the correct backend URL and ensure all related configuration files are
updated consistently.

---

Outside diff comments:
In `@agent/src/agent/nodes/refiner.py`:
- Line 58: The return statement in the successful refinement execution path
clears the `trino_error` by setting it to None, but does not clear the
`last_error` field. This can cause stale errors from previous failed attempts to
persist in the state and mislead downstream consumers. Add `last_error: None` to
the return dictionary alongside `trino_error: None` and `raw_data_ref: raw_ref`
to ensure all error state is properly reset after successful execution.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 84a6d3a8-12f2-45a3-9598-8ab51bf20d51

📥 Commits

Reviewing files that changed from the base of the PR and between 3a80177 and 0928097.

📒 Files selected for processing (16)
  • agent/Dockerfile
  • agent/src/agent/config.py
  • agent/src/agent/graph.py
  • agent/src/agent/nodes/refiner.py
  • agent/src/agent/nodes/schema_explorer.py
  • agent/src/agent/routers/chat.py
  • agent/src/agent/state.py
  • backend/Dockerfile
  • backend/app/config.py
  • backend/app/main.py
  • backend/app/routers/health.py
  • backend/app/services/llm_judge.py
  • core/src/core/config.py
  • core/src/core/langfuse.py
  • docker-compose.yml
  • frontend/.env

Comment thread agent/Dockerfile Outdated
Comment thread agent/src/agent/config.py Outdated
Comment thread agent/src/agent/graph.py Outdated
Comment thread agent/src/agent/nodes/schema_explorer.py Outdated
Comment thread agent/src/agent/nodes/schema_explorer.py
Comment thread backend/app/main.py Outdated
Comment thread backend/app/main.py Outdated
Comment thread backend/app/routers/health.py Outdated
Comment on lines +182 to +188
@router.get("/judge/health")
def get_judge_health():
from app.config import settings
api_key = getattr(settings, "OPENAI_API_KEY", None)
if not api_key:
return {"status": "error", "message": "OPENAI_API_KEY is missing"}
return {"status": "ok", "message": "LLM Judge is configured"}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Error branch is unreachable dead code.

The startup guard in main.py (lines 44-46) prevents the server from starting if OPENAI_API_KEY is missing, which means this endpoint will never execute the error branch at lines 186-187. The server won't be running to serve this response. Additionally, the getattr call is unnecessary since OPENAI_API_KEY is now a defined field in Settings.

♻️ Proposed simplification (remove unreachable branch)
 `@router.get`("/judge/health")
 def get_judge_health():
-    from app.config import settings
-    api_key = getattr(settings, "OPENAI_API_KEY", None)
-    if not api_key:
-        return {"status": "error", "message": "OPENAI_API_KEY is missing"}
     return {"status": "ok", "message": "LLM Judge is configured"}

Alternatively, if you want to keep the health check for consistency with other services, you can simplify it to always return OK since the startup guard ensures the key is present:

 `@router.get`("/judge/health")
 def get_judge_health():
-    from app.config import settings
-    api_key = getattr(settings, "OPENAI_API_KEY", None)
-    if not api_key:
-        return {"status": "error", "message": "OPENAI_API_KEY is missing"}
-    return {"status": "ok", "message": "LLM Judge is configured"}
+    return {"status": "ok", "message": "LLM Judge is configured"}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@router.get("/judge/health")
def get_judge_health():
from app.config import settings
api_key = getattr(settings, "OPENAI_API_KEY", None)
if not api_key:
return {"status": "error", "message": "OPENAI_API_KEY is missing"}
return {"status": "ok", "message": "LLM Judge is configured"}
`@router.get`("/judge/health")
def get_judge_health():
return {"status": "ok", "message": "LLM Judge is configured"}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/health.py` around lines 182 - 188, The error branch in
the get_judge_health() function is unreachable dead code because the startup
guard in main.py prevents the server from starting if OPENAI_API_KEY is missing.
Remove the conditional check for api_key and the error branch (the if not
api_key block), and simplify the function to always return the success response.
Also remove the unnecessary getattr call and the import statement since
OPENAI_API_KEY is guaranteed to exist if the server is running.

Comment thread core/src/core/langfuse.py Outdated
Comment thread frontend/.env Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/src/core/trino.py (1)

51-75: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Signature reordering breaks existing positional callers.

Changing execute_query_sync to (sql, params, table_id) breaks existing positional call sites (for example, execute_query_sync(generated_sql, table_id)), which now pass table_id as query params. That can mis-execute queries or fail unexpectedly.

Backward-compatible fix
-def execute_query_sync(sql: str, params: tuple | dict | list | None = None, table_id: str = "") -> TrinoExecutionResult:
+def execute_query_sync(sql: str, table_id: str = "", params: tuple | dict | list | None = None) -> TrinoExecutionResult:
@@
-        if params:
+        if params is not None:
             cur.execute(sql, params)
         else:
             cur.execute(sql)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/core/trino.py` around lines 51 - 75, The function execute_query_sync
currently has parameters ordered as (sql, params, table_id), but existing call
sites pass arguments positionally as execute_query_sync(generated_sql,
table_id), causing table_id to be interpreted as params instead. Reorder the
function signature parameters to place table_id before params, changing the
signature to (sql, table_id, params) so that existing positional callers
continue to work correctly without requiring code changes at those call sites.
♻️ Duplicate comments (1)
agent/src/agent/nodes/schema_explorer.py (1)

358-360: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Catalog identifier is still interpolated from untrusted LLM output.

The query at Line 358 still injects cat directly into SQL. Since tables_used comes from model output, this remains an injection path in the validation step. Validate/whitelist identifier tokens (or quote validated identifiers) before composing the catalog-qualified table path.

Suggested hardening
+import re
+IDENT_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")

                     if len(parts) == 3:
                         cat, sch, tbl = parts
-                        sql = f"SELECT 1 FROM {cat}.information_schema.tables WHERE table_schema = ? AND table_name = ?"
+                        if not IDENT_RE.fullmatch(cat):
+                            hallucinated.append(t_name)
+                            continue
+                        sql = (
+                            f'SELECT 1 FROM "{cat}".information_schema.tables '
+                            "WHERE table_schema = ? AND table_name = ?"
+                        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/nodes/schema_explorer.py` around lines 358 - 360, The SQL
query construction at line 358 in the schema_explorer.py file contains a SQL
injection vulnerability where the catalog identifier `cat` is directly
interpolated into the f-string without any validation or quoting. Since `cat`
originates from untrusted LLM output (via `tables_used`), this creates an
injection path in the validation step. Validate or whitelist the catalog
identifier token before composing the catalog-qualified table path, or
alternatively quote the validated identifier using appropriate SQL quoting
mechanisms (such as backticks or bracket notation depending on the database
system) to ensure special characters cannot break out of the identifier context.
Apply this fix before the SQL query is passed to the `execute_query_sync`
function call.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agent/pyproject.toml`:
- Around line 31-35: The `uv sync` command in the Dockerfile is installing dev
dependencies into the production image because the `--no-dev` flag is not
specified. Modify the `uv sync` command to include the `--no-dev` flag to
exclude dev dependencies (pytest and ruff from the [dependency-groups].dev
section) from being installed in the production agent image, reducing image size
and attack surface.

In `@agent/src/agent/nodes/refiner.py`:
- Around line 81-90: The exception handling in the refiner function only
protects the client.save_data() call, but not the get_esca_client() acquisition
itself. If the async context manager fails to acquire the ESCA client, the
exception will propagate before reaching the try block, skipping the fallback
state (esca_write_failed flag and logging). Move the try-except block to wrap
the entire async with statement starting from get_esca_client(), so that both
client acquisition failures and save_data failures are caught and handled
consistently with the same fallback behavior.

In `@agent/src/agent/utils/sql.py`:
- Around line 11-14: The SQL query fence-stripping logic removes opening fences
before normalizing whitespace, which causes leading whitespace (like newlines)
to prevent proper fence removal and result in invalid SQL. To fix this in the
query cleaning section: first strip the input query to normalize whitespace,
then update the opening-fence regex pattern to match both `\`\`\`sql` and plain
`\`\`\`` variants in a single combined pattern (use an optional group for the
sql keyword), and update the closing-fence pattern to strip any whitespace
immediately before the closing `\`\`\`` as well.

In `@backend/app/routers/health.py`:
- Around line 194-200: The try-except block in the health check endpoint only
catches ImportError when importing agent.config, but importing can fail for
other reasons (validation errors, etc.) which would bypass the fallback and
break the endpoint. Additionally, defaulting to dummy values like "dummy" and
"http://localhost:8000" masks misconfiguration and returns misleading health
status. Broaden the exception handling to catch general exceptions (Exception or
a more specific base class) beyond just ImportError, and when ESCA settings are
absent or the import fails for any reason, return a config-error health response
indicating that ESCA is not properly configured, rather than injecting dummy
defaults and continuing as if the service is healthy.

In `@core/src/core/langfuse.py`:
- Around line 10-11: The exception handler that catches errors during Langfuse
handler initialization (in the except block at lines 10-11) silently sets
_langfuse_handler to None without logging the error details. Add logging within
this exception handler to capture and record the actual error before falling
back to _langfuse_handler = None. This will ensure initialization failures are
visible in production logs while maintaining the graceful fallback behavior.

---

Outside diff comments:
In `@core/src/core/trino.py`:
- Around line 51-75: The function execute_query_sync currently has parameters
ordered as (sql, params, table_id), but existing call sites pass arguments
positionally as execute_query_sync(generated_sql, table_id), causing table_id to
be interpreted as params instead. Reorder the function signature parameters to
place table_id before params, changing the signature to (sql, table_id, params)
so that existing positional callers continue to work correctly without requiring
code changes at those call sites.

---

Duplicate comments:
In `@agent/src/agent/nodes/schema_explorer.py`:
- Around line 358-360: The SQL query construction at line 358 in the
schema_explorer.py file contains a SQL injection vulnerability where the catalog
identifier `cat` is directly interpolated into the f-string without any
validation or quoting. Since `cat` originates from untrusted LLM output (via
`tables_used`), this creates an injection path in the validation step. Validate
or whitelist the catalog identifier token before composing the catalog-qualified
table path, or alternatively quote the validated identifier using appropriate
SQL quoting mechanisms (such as backticks or bracket notation depending on the
database system) to ensure special characters cannot break out of the identifier
context. Apply this fix before the SQL query is passed to the
`execute_query_sync` function call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 76d9bfe0-cb19-48b7-9f46-1f77824453bc

📥 Commits

Reviewing files that changed from the base of the PR and between 0928097 and 341808d.

📒 Files selected for processing (19)
  • agent/Dockerfile
  • agent/pyproject.toml
  • agent/src/agent/config.py
  • agent/src/agent/graph.py
  • agent/src/agent/llm.py
  • agent/src/agent/nodes/finalizer.py
  • agent/src/agent/nodes/refiner.py
  • agent/src/agent/nodes/schema_explorer.py
  • agent/src/agent/routers/chat.py
  • agent/src/agent/state.py
  • agent/src/agent/utils/esca.py
  • agent/src/agent/utils/sql.py
  • backend/Dockerfile
  • backend/app/main.py
  • backend/app/routers/health.py
  • backend/app/services/llm_judge.py
  • core/src/core/langfuse.py
  • core/src/core/trino.py
  • frontend/.env

Comment thread agent/pyproject.toml
Comment thread agent/src/agent/nodes/refiner.py Outdated
Comment thread agent/src/agent/utils/sql.py Outdated
Comment thread backend/app/routers/health.py Outdated
Comment thread core/src/core/langfuse.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
agent/src/agent/nodes/extractor.py (1)

105-130: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Move concurrent.futures import to module level.

The import concurrent.futures statement at line 110 is inside the extractor_node function body. Python convention places imports at the module level unless there's a specific reason (e.g., avoiding circular dependencies or lazy loading). Moving it to the top improves code organization and consistency.

♻️ Proposed fix

Add the import at the module level (after line 3):

 import abc
 import datetime
+import concurrent.futures
 import requests
 from typing import List

Remove the local import from the function:

 def extractor_node(state: AgentState):
     """Enrich the user query with additional context to help downstream phases."""
     user_query = state["user_query"]
     active_extractors = state.get("active_extractors") or []
-
-    import concurrent.futures

     extractors: List[BaseExtractor] = [TimeExtractor(), LLMExtractor()]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/nodes/extractor.py` around lines 105 - 130, The import
statement for concurrent.futures is currently located inside the extractor_node
function body, which violates Python convention. Move the import
concurrent.futures statement to the module level at the top of the file with
other imports, and remove it from inside the extractor_node function. This
improves code organization and consistency by following standard Python import
conventions.
backend/app/routers/publish.py (1)

50-60: 🧹 Nitpick | 🔵 Trivial

Golden question threshold discrepancy across endpoints is intentional.

The endpoint requires at least 3 golden questions (line 54), while evaluation.py:trigger_evaluation_run and orchestration.py require only 1 question. This stricter threshold for production publish is by design—higher quality gates for publishing vs. lighter validation for evaluation runs. Consider adding an inline comment clarifying this policy difference if not self-evident to future maintainers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/publish.py` around lines 50 - 60, Add an inline comment
in the golden questions validation check where len(questions) < 3 is evaluated
to clarify that this stricter threshold (3 questions required) is intentionally
higher than the requirement in other endpoints like evaluation.py and
orchestration.py (which require only 1 question) because publish gates require
higher quality standards for production. This will help future maintainers
understand the policy difference and why the threshold differs across endpoints.
♻️ Duplicate comments (1)
agent/src/agent/nodes/schema_explorer.py (1)

405-411: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Validate all three identifier parts, not just the catalog.

The IDENT_RE validation only applies to cat (catalog), but sch and tbl are passed directly to the parameterized query. While parameterized queries protect against injection in the WHERE clause values, the catalog name is still interpolated into the SQL string via f-string. The current validation is correct for the catalog, but for defense-in-depth, consider validating all three parts before constructing the query.

🛡️ Suggested defense-in-depth validation
                    if len(parts) == 3:
                        cat, sch, tbl = parts
-                       if not IDENT_RE.fullmatch(cat):
+                       if not all(IDENT_RE.fullmatch(p) for p in (cat, sch, tbl)):
                            hallucinated.append(t_name)
                            continue
                        sql = f'SELECT 1 FROM "{cat}".information_schema.tables WHERE table_schema = ? AND table_name = ?'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/nodes/schema_explorer.py` around lines 405 - 411, The
identifier validation using IDENT_RE is only applied to the catalog parameter
(cat) but not to the schema (sch) and table (tbl) parameters. For
defense-in-depth security, validate all three identifier components before
constructing the SQL query. Add IDENT_RE.fullmatch validation checks for both
sch and tbl similar to the existing validation for cat, ensuring that if any
identifier fails validation, the entry is marked as hallucinated and the loop
continues before the SQL query is constructed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agent/src/agent/nodes/finalizer.py`:
- Around line 68-87: The code attempts to derive column information from the
first row of data in the inline fallback path, but Trino rows are lists of lists
(not dicts), so the isinstance check fails and produces empty columns. The
refiner must capture and pass the result.columns metadata alongside result.rows
when storing inline_result_rows, and the finalizer must use the passed
inline_result_columns directly instead of attempting to derive columns from the
data structure. Update the refiner to store inline_result_columns alongside
inline_result_rows in the success path, then update the finalizer to use
inline_result_columns directly when constructing the preview_info dict.

In `@agent/src/agent/nodes/schema_explorer.py`:
- Around line 53-66: The function `get_query_embedding` uses
`urllib.request.Request` and `urllib.request.urlopen` but the `urllib.request`
module is not imported at the top of the file. Add `import urllib.request` to
the import section at the beginning of the file to resolve the `NameError` that
will occur at runtime when this function is called.

In `@backend/app/routers/health.py`:
- Around line 211-234: The EscaClient instantiation is currently outside the try
block, which means if the constructor fails with an invalid URL or
initialization error, the exception will bypass error handling and the finally
block cleanup will not execute. Move the line that instantiates EscaClient
(client = EscaClient(api_key=api_key, base_url=base_url)) inside the try block
before the _ping_esca() function definition, so that any client initialization
errors are properly caught by the existing exception handlers and the finally
block guarantees cleanup via await client.close().

In `@core/src/core/langfuse.py`:
- Around line 13-14: In the exception handler for the Langfuse CallbackHandler
initialization in the except block, replace the logger.error() call with
logger.exception() to automatically capture the full stack trace. This provides
more complete debugging information than manually formatting the error message
with logger.error(). You can simplify the message itself since the traceback
will be included automatically.

---

Outside diff comments:
In `@agent/src/agent/nodes/extractor.py`:
- Around line 105-130: The import statement for concurrent.futures is currently
located inside the extractor_node function body, which violates Python
convention. Move the import concurrent.futures statement to the module level at
the top of the file with other imports, and remove it from inside the
extractor_node function. This improves code organization and consistency by
following standard Python import conventions.

In `@backend/app/routers/publish.py`:
- Around line 50-60: Add an inline comment in the golden questions validation
check where len(questions) < 3 is evaluated to clarify that this stricter
threshold (3 questions required) is intentionally higher than the requirement in
other endpoints like evaluation.py and orchestration.py (which require only 1
question) because publish gates require higher quality standards for production.
This will help future maintainers understand the policy difference and why the
threshold differs across endpoints.

---

Duplicate comments:
In `@agent/src/agent/nodes/schema_explorer.py`:
- Around line 405-411: The identifier validation using IDENT_RE is only applied
to the catalog parameter (cat) but not to the schema (sch) and table (tbl)
parameters. For defense-in-depth security, validate all three identifier
components before constructing the SQL query. Add IDENT_RE.fullmatch validation
checks for both sch and tbl similar to the existing validation for cat, ensuring
that if any identifier fails validation, the entry is marked as hallucinated and
the loop continues before the SQL query is constructed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: e597dae2-ab0a-4208-8a87-0d948fb32d24

📥 Commits

Reviewing files that changed from the base of the PR and between 341808d and 63ca448.

⛔ Files ignored due to path filters (1)
  • agent/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (37)
  • agent/Dockerfile
  • agent/src/agent/config.py
  • agent/src/agent/graph.py
  • agent/src/agent/llm.py
  • agent/src/agent/mcp_server.py
  • agent/src/agent/nodes/extractor.py
  • agent/src/agent/nodes/finalizer.py
  • agent/src/agent/nodes/query_builder.py
  • agent/src/agent/nodes/refiner.py
  • agent/src/agent/nodes/schema_explorer.py
  • agent/src/agent/routers/chat.py
  • agent/src/agent/state.py
  • agent/src/agent/utils/esca.py
  • agent/src/agent/utils/sql.py
  • backend/app/infra_init.py
  • backend/app/routers/admin_approval.py
  • backend/app/routers/admin_auth.py
  • backend/app/routers/audit.py
  • backend/app/routers/enrichment.py
  • backend/app/routers/evaluation.py
  • backend/app/routers/feedback.py
  • backend/app/routers/health.py
  • backend/app/routers/orchestration.py
  • backend/app/routers/publish.py
  • backend/app/routers/questions.py
  • backend/app/routers/scopes.py
  • backend/app/services/auth.py
  • backend/app/services/evaluator.py
  • backend/app/services/join_detection.py
  • backend/app/services/langfuse_client.py
  • backend/app/services/llm_judge.py
  • backend/app/services/profiling_engine.py
  • backend/app/services/scoring.py
  • backend/app/services/trino_client.py
  • core/src/core/langfuse.py
  • core/src/core/models/models.py
  • core/src/core/trino.py
💤 Files with no reviewable changes (4)
  • backend/app/services/trino_client.py
  • backend/app/services/scoring.py
  • backend/app/routers/orchestration.py
  • backend/app/services/langfuse_client.py

Comment on lines +68 to 87
if esca_write_failed or not raw_data_ref:
if inline_result_rows is not None:
limit = 5
preview_rows = inline_result_rows[:limit]
columns = (
list(preview_rows[0].keys())
if preview_rows and isinstance(preview_rows[0], dict)
else []
)
preview_info = {
"columns": columns,
"preview_rows": preview_rows,
"preview_count": len(preview_rows),
"total_rows": len(inline_result_rows),
}
preview_str = json.dumps(preview_info, indent=2)
else:
preview_str = "No data reference found."
else:
preview_str = await get_esca_preview(raw_data_ref)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Inline fallback produces empty columns when ESCA write fails.

When esca_write_failed is true, the code attempts to derive columns from inline_result_rows by checking if the first row is a dict (line 74). However, result.rows from Trino are lists of lists (not dicts), so isinstance(preview_rows[0], dict) will be False and columns will be [].

The refiner stores inline_result_rows = result.rows but doesn't store result.columns. For the fallback preview to be useful, the refiner should also pass inline_result_columns or the finalizer needs another way to access column metadata.

🔧 Suggested fix in refiner.py

Store columns alongside rows in the success path:

# In refiner.py success path (around line 108)
 return {
     "trino_error": None,
     "last_error": None,
     "raw_data_ref": raw_ref,
     "esca_write_failed": esca_write_failed,
     "inline_result_rows": inline_result_rows,
+    "inline_result_columns": result.columns,
     "error_history": error_history,
 }

Then in finalizer.py:

     if esca_write_failed or not raw_data_ref:
         if inline_result_rows is not None:
             limit = 5
             preview_rows = inline_result_rows[:limit]
-            columns = (
-                list(preview_rows[0].keys())
-                if preview_rows and isinstance(preview_rows[0], dict)
-                else []
-            )
+            columns = state.get("inline_result_columns", [])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if esca_write_failed or not raw_data_ref:
if inline_result_rows is not None:
limit = 5
preview_rows = inline_result_rows[:limit]
columns = (
list(preview_rows[0].keys())
if preview_rows and isinstance(preview_rows[0], dict)
else []
)
preview_info = {
"columns": columns,
"preview_rows": preview_rows,
"preview_count": len(preview_rows),
"total_rows": len(inline_result_rows),
}
preview_str = json.dumps(preview_info, indent=2)
else:
preview_str = "No data reference found."
else:
preview_str = await get_esca_preview(raw_data_ref)
if esca_write_failed or not raw_data_ref:
if inline_result_rows is not None:
limit = 5
preview_rows = inline_result_rows[:limit]
columns = state.get("inline_result_columns", [])
preview_info = {
"columns": columns,
"preview_rows": preview_rows,
"preview_count": len(preview_rows),
"total_rows": len(inline_result_rows),
}
preview_str = json.dumps(preview_info, indent=2)
else:
preview_str = "No data reference found."
else:
preview_str = await get_esca_preview(raw_data_ref)
🧰 Tools
🪛 ast-grep (0.43.0)

[info] 82-82: use jsonify instead of json.dumps for JSON output
Context: json.dumps(preview_info, indent=2)
Note: Security best practice.

(use-jsonify)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/nodes/finalizer.py` around lines 68 - 87, The code attempts
to derive column information from the first row of data in the inline fallback
path, but Trino rows are lists of lists (not dicts), so the isinstance check
fails and produces empty columns. The refiner must capture and pass the
result.columns metadata alongside result.rows when storing inline_result_rows,
and the finalizer must use the passed inline_result_columns directly instead of
attempting to derive columns from the data structure. Update the refiner to
store inline_result_columns alongside inline_result_rows in the success path,
then update the finalizer to use inline_result_columns directly when
constructing the preview_info dict.

Comment thread agent/src/agent/nodes/schema_explorer.py
Comment on lines +211 to +234
from esca_sdk import EscaClient

client = EscaClient(api_key=api_key, base_url=base_url)

async def _ping_esca():
if hasattr(client, "ping"):
return await client.ping()
else:
return await client.load_head("health_check_dummy_id")

try:
await asyncio.wait_for(_ping_esca(), timeout=5.0)
return {"status": "ok", "message": "Esca is reachable"}
except TimeoutError:
raise HTTPException(
status_code=503, detail="Esca connection timed out after 5.0s"
)
except Exception as e:
err_str = str(e).lower()
if "404" in err_str or "not found" in err_str:
return {"status": "ok", "message": "Esca is reachable"}
raise HTTPException(status_code=503, detail=f"Esca health check failed: {e}")
finally:
await client.close()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

EscaClient instantiation outside try block leads to unhandled errors.

If EscaClient(...) constructor fails (invalid URL, initialization error), the exception bypasses the health check error handling and propagates as an unhandled 500 error. The finally block won't execute since the try hasn't been entered yet.

Move client instantiation inside the try block to ensure proper error handling and cleanup:

🐛 Proposed fix
     from esca_sdk import EscaClient

-    client = EscaClient(api_key=api_key, base_url=base_url)
+    client = None

     async def _ping_esca():
         if hasattr(client, "ping"):
             return await client.ping()
         else:
             return await client.load_head("health_check_dummy_id")

     try:
+        client = EscaClient(api_key=api_key, base_url=base_url)
         await asyncio.wait_for(_ping_esca(), timeout=5.0)
         return {"status": "ok", "message": "Esca is reachable"}
     except TimeoutError:
         raise HTTPException(
             status_code=503, detail="Esca connection timed out after 5.0s"
         )
     except Exception as e:
         err_str = str(e).lower()
         if "404" in err_str or "not found" in err_str:
             return {"status": "ok", "message": "Esca is reachable"}
         raise HTTPException(status_code=503, detail=f"Esca health check failed: {e}")
     finally:
-        await client.close()
+        if client is not None:
+            await client.close()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/health.py` around lines 211 - 234, The EscaClient
instantiation is currently outside the try block, which means if the constructor
fails with an invalid URL or initialization error, the exception will bypass
error handling and the finally block cleanup will not execute. Move the line
that instantiates EscaClient (client = EscaClient(api_key=api_key,
base_url=base_url)) inside the try block before the _ping_esca() function
definition, so that any client initialization errors are properly caught by the
existing exception handlers and the finally block guarantees cleanup via await
client.close().

Comment thread core/src/core/langfuse.py
Comment on lines +13 to +14
except Exception as e:
logger.error(f"Failed to initialize Langfuse CallbackHandler: {e}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Use logger.exception() to automatically capture the traceback.

When logging within an exception handler, logger.exception() is preferred over logger.error() because it automatically includes the full stack trace, which significantly aids debugging initialization failures.

🔧 Recommended improvement
 except Exception as e:
-    logger.error(f"Failed to initialize Langfuse CallbackHandler: {e}")
+    logger.exception("Failed to initialize Langfuse CallbackHandler")
     _langfuse_handler = None
🧰 Tools
🪛 Ruff (0.15.17)

[warning] 13-13: Do not catch blind exception: Exception

(BLE001)


[warning] 14-14: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


[warning] 14-14: Logging statement uses f-string

(G004)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/core/langfuse.py` around lines 13 - 14, In the exception handler for
the Langfuse CallbackHandler initialization in the except block, replace the
logger.error() call with logger.exception() to automatically capture the full
stack trace. This provides more complete debugging information than manually
formatting the error message with logger.error(). You can simplify the message
itself since the traceback will be included automatically.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
agent/src/agent/nodes/schema_explorer.py (1)

230-240: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Duplicate import of langfuse_client.

langfuse_client is already imported at line 22 (from agent.langfuse_client import langfuse_client). This redundant import inside the exception handler is unnecessary.

♻️ Suggested removal
            except Exception as e:
                esca_id = None
-               from agent.langfuse_client import langfuse_client
-
                if langfuse_client and langfuse_client.get_current_observation_id():
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/nodes/schema_explorer.py` around lines 230 - 240, Remove the
redundant import statement `from agent.langfuse_client import langfuse_client`
from inside the exception handler block in the schema_explorer.py file. The
langfuse_client is already imported at the module level (line 22), so it is
already available for use throughout the file. Simply delete the import line and
use the already-imported langfuse_client variable directly in the exception
handling logic.
agent/src/agent/nodes/refiner.py (1)

101-106: ⚠️ Potential issue | 🟡 Minor

Add check for get_current_observation_id() result before using in span() call.

The condition at line 101 only checks langfuse_client.get_current_trace_id(), but get_current_observation_id() may return None even when a trace exists. The same error handling pattern in schema_explorer.py:232 correctly checks both conditions: if langfuse_client and langfuse_client.get_current_observation_id():. Apply the same defensive check here to avoid passing id=None to the span() call.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/nodes/refiner.py` around lines 101 - 106, The condition
checking whether to call langfuse_client.span() only verifies that
get_current_trace_id() returns a truthy value, but does not check whether
get_current_observation_id() returns None. Update the if condition to check both
langfuse_client.get_current_trace_id() AND
langfuse_client.get_current_observation_id() before proceeding to the span()
call, following the same defensive pattern used elsewhere in schema_explorer.py.
This prevents passing id=None to the span() method.
backend/app/services/langfuse_client.py (1)

100-103: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale documentation references removed method.

The docstring on lines 100-103 still references get_prompt_as_langchain(name) as an additional method, but this method was removed from the class. Update the docstring to remove this reference.

📝 Proposed fix
     Implements Connection: connect(), is_alive(), logout().

     Additional methods (present in the main app's LangfuseTracer):
         get_prompt(name)                  — fetch a prompt by name.
-        get_prompt_as_langchain(name)     — fetch and convert to a LangChain prompt.
     """
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/services/langfuse_client.py` around lines 100 - 103, The
docstring for the LangfuseTracer class contains a stale reference to the
`get_prompt_as_langchain(name)` method in the "Additional methods" section, but
this method has been removed from the class. Remove the line referencing
`get_prompt_as_langchain(name)` from the docstring, keeping only the
documentation for the `get_prompt(name)` method that still exists in the class.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.gitignore:
- Line 42: The .gitignore file at line 42 contains an entry that ignores the
python-core-utils directory, but this directory is required by the build/runtime
(Docker COPY command and uv source mappings). Remove the python-core-utils/ line
from .gitignore to ensure the directory is tracked by git and available for
builds.

In `@agent/src/agent/nodes/refiner.py`:
- Around line 8-9: Remove the duplicate import statement of langfuse_client from
agent.langfuse_client in the refiner.py file. The import statement appears twice
consecutively on lines 8 and 9 - delete one of these duplicate import lines to
keep only a single import of langfuse_client from agent.langfuse_client.

In `@backend/app/infra_init.py`:
- Line 552: The fallback database URL in the `os.getenv()` call for
`DATABASE_URL` uses `localhost` as the hostname, which resolves to the container
itself rather than the Postgres service when running inside the backend
container. Replace `localhost` in the hardcoded default connection string with
the appropriate service hostname (typically `postgres` or `db`) that is
resolvable from within the container network, ensuring the connection can reach
the actual Postgres service.
- Around line 33-35: The new environment-driven Trino configuration variables
(_TRINO_HOST, _TRINO_PORT, _TRINO_USER) defined at the module level are not
being used in the _ensure_om_service function, which still hardcodes the Trino
connection details as "trino:8080" with user "trino". Replace the hardcoded
values in _ensure_om_service with the corresponding environment-driven
configuration variables (_TRINO_HOST, _TRINO_PORT, _TRINO_USER) to ensure
OpenMetadata service registration respects the new environment configuration
contract.
- Line 27: The `os` import on line 27 is not at the top of the file, which
violates Ruff's E402 rule and can cause CI failures. Move the `os` import to the
top of the file where other module imports are located, ensuring all import
statements are grouped together at the beginning of the file before any other
code.

In `@backend/app/routers/evaluation.py`:
- Around line 113-117: The Langfuse trace write operations at lines 113–117 and
185–188 in the evaluation.py file are not protected from failures, which means
any transient Langfuse observability issue will abort the entire evaluation
execution and prevent returning successful results. Wrap both the
langfuse_client.trace() calls in try-except blocks (or similar error handling)
to isolate them from the critical evaluation path, ensuring that Langfuse
failures are caught and logged but do not interrupt evaluation completion,
database commits, or function returns. This is especially critical at the second
site (lines 185–188) where the database has already been committed and scores
computed.

In `@backend/pyproject.toml`:
- Line 13: The langfuse dependency constraint at line 13 in
backend/pyproject.toml is specified as "langfuse>=2.0.0" which lacks an upper
bound, allowing any future major version to be installed and potentially
introducing breaking changes. Modify the version constraint to include an upper
bound by changing it to "langfuse>=2.0.0,<3.0.0" (or similar bounded range) to
ensure reproducible behavior and prevent silent breakage from major version
updates while still allowing compatible patch and minor version updates.

---

Outside diff comments:
In `@agent/src/agent/nodes/refiner.py`:
- Around line 101-106: The condition checking whether to call
langfuse_client.span() only verifies that get_current_trace_id() returns a
truthy value, but does not check whether get_current_observation_id() returns
None. Update the if condition to check both
langfuse_client.get_current_trace_id() AND
langfuse_client.get_current_observation_id() before proceeding to the span()
call, following the same defensive pattern used elsewhere in schema_explorer.py.
This prevents passing id=None to the span() method.

In `@agent/src/agent/nodes/schema_explorer.py`:
- Around line 230-240: Remove the redundant import statement `from
agent.langfuse_client import langfuse_client` from inside the exception handler
block in the schema_explorer.py file. The langfuse_client is already imported at
the module level (line 22), so it is already available for use throughout the
file. Simply delete the import line and use the already-imported langfuse_client
variable directly in the exception handling logic.

In `@backend/app/services/langfuse_client.py`:
- Around line 100-103: The docstring for the LangfuseTracer class contains a
stale reference to the `get_prompt_as_langchain(name)` method in the "Additional
methods" section, but this method has been removed from the class. Remove the
line referencing `get_prompt_as_langchain(name)` from the docstring, keeping
only the documentation for the `get_prompt(name)` method that still exists in
the class.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 8eae59ad-1ecc-4f29-8ba1-a19da085bd3d

📥 Commits

Reviewing files that changed from the base of the PR and between 63ca448 and 6a45199.

⛔ Files ignored due to path filters (2)
  • agent/uv.lock is excluded by !**/*.lock
  • backend/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • .gitignore
  • agent/Dockerfile
  • agent/pyproject.toml
  • agent/src/agent/nodes/extractor.py
  • agent/src/agent/nodes/refiner.py
  • agent/src/agent/nodes/schema_explorer.py
  • backend/Dockerfile
  • backend/app/infra_init.py
  • backend/app/routers/evaluation.py
  • backend/app/routers/orchestration.py
  • backend/app/services/evaluator.py
  • backend/app/services/langfuse_client.py
  • backend/app/services/llm_judge.py
  • backend/pyproject.toml
  • core/pyproject.toml

Comment thread .gitignore
agent/scratch/*
fix_agent_langfuse.py
fix_langfuse.py
python-core-utils/

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not ignore a directory that is required by build/runtime dependency resolution.

Line 42 ignores python-core-utils/, but this PR also requires it in Docker build context (COPY python-core-utils ...) and uv source mappings (../python-core-utils). This creates a high risk of out-of-the-box build failures.

Suggested patch
-python-core-utils/
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
python-core-utils/
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.gitignore at line 42, The .gitignore file at line 42 contains an entry that
ignores the python-core-utils directory, but this directory is required by the
build/runtime (Docker COPY command and uv source mappings). Remove the
python-core-utils/ line from .gitignore to ensure the directory is tracked by
git and available for builds.

Comment on lines 8 to +9
from agent.langfuse_client import langfuse_client
from agent.langfuse_client import langfuse_client

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove duplicate import.

The same import appears twice on consecutive lines:

from agent.langfuse_client import langfuse_client
from agent.langfuse_client import langfuse_client
🐛 Proposed fix
 from agent.config import settings
 from agent.langfuse_client import langfuse_client
-from agent.langfuse_client import langfuse_client
 from langchain_core.prompts import ChatPromptTemplate
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/nodes/refiner.py` around lines 8 - 9, Remove the duplicate
import statement of langfuse_client from agent.langfuse_client in the refiner.py
file. The import statement appears twice consecutively on lines 8 and 9 - delete
one of these duplicate import lines to keep only a single import of
langfuse_client from agent.langfuse_client.

Comment thread backend/app/infra_init.py
_MINIO_HOST = "minio:9000"
_MINIO_ACCESS_KEY = "admin"
_MINIO_SECRET_KEY = "password123"
import os

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Move os import to the top import block.

Line 27 triggers Ruff E402 and can fail lint-gated CI.

Suggested patch
 import base64
 import logging
+import os
 import time
 from typing import Any
@@
-import os
 _MINIO_HOST = os.getenv("MINIO_HOST", "localhost:9000")
🧰 Tools
🪛 Ruff (0.15.17)

[error] 27-27: Module level import not at top of file

(E402)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/infra_init.py` at line 27, The `os` import on line 27 is not at
the top of the file, which violates Ruff's E402 rule and can cause CI failures.
Move the `os` import to the top of the file where other module imports are
located, ensuring all import statements are grouped together at the beginning of
the file before any other code.

Source: Linters/SAST tools

Comment thread backend/app/infra_init.py
Comment on lines +33 to +35
_TRINO_HOST = os.getenv("TRINO_HOST", "localhost")
_TRINO_PORT = int(os.getenv("TRINO_PORT", "8080"))
_TRINO_USER = os.getenv("TRINO_USER", "trino")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate env-driven Trino settings into OpenMetadata service registration.

Lines 33–35 introduce _TRINO_HOST/_TRINO_PORT/_TRINO_USER, but _ensure_om_service still hardcodes trino:8080 and trino (Lines 797–799). This breaks the new config contract in non-default environments.

Suggested patch
             "connection": {
                 "config": {
                     "type": "Trino",
-                    "hostPort": "trino:8080",
-                    "username": "trino",
+                    "hostPort": f"{_TRINO_HOST}:{_TRINO_PORT}",
+                    "username": _TRINO_USER,
                     "catalog": "minio",
                 }
             },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_TRINO_HOST = os.getenv("TRINO_HOST", "localhost")
_TRINO_PORT = int(os.getenv("TRINO_PORT", "8080"))
_TRINO_USER = os.getenv("TRINO_USER", "trino")
"connection": {
"config": {
"type": "Trino",
"hostPort": f"{_TRINO_HOST}:{_TRINO_PORT}",
"username": _TRINO_USER,
"catalog": "minio",
}
},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/infra_init.py` around lines 33 - 35, The new environment-driven
Trino configuration variables (_TRINO_HOST, _TRINO_PORT, _TRINO_USER) defined at
the module level are not being used in the _ensure_om_service function, which
still hardcodes the Trino connection details as "trino:8080" with user "trino".
Replace the hardcoded values in _ensure_om_service with the corresponding
environment-driven configuration variables (_TRINO_HOST, _TRINO_PORT,
_TRINO_USER) to ensure OpenMetadata service registration respects the new
environment configuration contract.

Comment thread backend/app/infra_init.py

# We use psycopg2 directly since it's already installed via pyproject.toml
conn_str = "postgresql://postgres:postgres@db:5432/text2sql"
conn_str = os.getenv("DATABASE_URL", "postgresql://postgres:postgres@localhost:5432/text2sql")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix the container fallback for DATABASE_URL.

Line 552 defaults to localhost, but this script runs inside the backend container; localhost points to itself, not the Postgres service. If DATABASE_URL is missing/mis-scoped, startup fails.

Suggested patch
-    conn_str = os.getenv("DATABASE_URL", "postgresql://postgres:postgres@localhost:5432/text2sql")
+    conn_str = os.getenv("DATABASE_URL", "postgresql://postgres:postgres@db:5432/text2sql")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/infra_init.py` at line 552, The fallback database URL in the
`os.getenv()` call for `DATABASE_URL` uses `localhost` as the hostname, which
resolves to the container itself rather than the Postgres service when running
inside the backend container. Replace `localhost` in the hardcoded default
connection string with the appropriate service hostname (typically `postgres` or
`db`) that is resolvable from within the container network, ensuring the
connection can reach the actual Postgres service.

Comment on lines +113 to 117
if langfuse_client.client and langfuse_client.client.get_current_trace_id():
langfuse_client.client.trace(id=langfuse_client.client.get_current_trace_id(),
metadata={"table_id": table_id, "run_id": run_id},
tags=["eval-run", f"table:{table_id}"],
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file structure
fd evaluation.py --type f

Repository: StavPonte11/text2sql-onboarding

Length of output: 109


🏁 Script executed:

# Check the file length and examine the relevant sections
wc -l backend/app/routers/evaluation.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 114


🏁 Script executed:

# Read the relevant sections around lines 113-117 and 185-188
cat -n backend/app/routers/evaluation.py | sed -n '100,130p'

Repository: StavPonte11/text2sql-onboarding

Length of output: 1322


🏁 Script executed:

# Check the second location around line 185
cat -n backend/app/routers/evaluation.py | sed -n '175,200p'

Repository: StavPonte11/text2sql-onboarding

Length of output: 1253


🏁 Script executed:

# Check imports and logger setup at the top of the file
cat -n backend/app/routers/evaluation.py | head -30

Repository: StavPonte11/text2sql-onboarding

Length of output: 1064


🏁 Script executed:

# Check for logger initialization in the module
cat -n backend/app/routers/evaluation.py | sed -n '1,50p'

Repository: StavPonte11/text2sql-onboarding

Length of output: 1801


🏁 Script executed:

# Look at the function containing lines 113-117 to understand context
cat -n backend/app/routers/evaluation.py | sed -n '60,120p'

Repository: StavPonte11/text2sql-onboarding

Length of output: 2355


🏁 Script executed:

# Check the function signature and earlier setup
cat -n backend/app/routers/evaluation.py | sed -n '80,115p'

Repository: StavPonte11/text2sql-onboarding

Length of output: 1503


🏁 Script executed:

# Check the broader context around line 185 - what happens after the trace call
cat -n backend/app/routers/evaluation.py | sed -n '150,190p'

Repository: StavPonte11/text2sql-onboarding

Length of output: 1928


Isolate Langfuse trace writes from the evaluation critical path.

Lines 113–117 and 185–188 perform external Langfuse trace updates without protection. A transient observability failure will abort evaluation execution/state updates. This is especially problematic at line 185–188, where the database is already committed and scores computed; a Langfuse failure here prevents the function from returning despite successful evaluation.

Suggested patch
-    if langfuse_client.client and langfuse_client.client.get_current_trace_id():
-        langfuse_client.client.trace(id=langfuse_client.client.get_current_trace_id(), 
-        metadata={"table_id": table_id, "run_id": run_id},
-        tags=["eval-run", f"table:{table_id}"],
-    )
+    if langfuse_client.client and langfuse_client.client.get_current_trace_id():
+        try:
+            langfuse_client.client.trace(
+                id=langfuse_client.client.get_current_trace_id(),
+                metadata={"table_id": table_id, "run_id": run_id},
+                tags=["eval-run", f"table:{table_id}"],
+            )
+        except Exception as exc:
+            logger.debug("Langfuse trace metadata update failed: %s", exc)
@@
-    if langfuse_client.client and langfuse_client.client.get_current_trace_id():
-        langfuse_client.client.trace(id=langfuse_client.client.get_current_trace_id(), 
-        output={"score": avg_score_contains, "pass_rate": pass_rate_contains}
-    )
+    if langfuse_client.client and langfuse_client.client.get_current_trace_id():
+        try:
+            langfuse_client.client.trace(
+                id=langfuse_client.client.get_current_trace_id(),
+                output={"score": avg_score_contains, "pass_rate": pass_rate_contains},
+            )
+        except Exception as exc:
+            logger.debug("Langfuse trace output update failed: %s", exc)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/evaluation.py` around lines 113 - 117, The Langfuse trace
write operations at lines 113–117 and 185–188 in the evaluation.py file are not
protected from failures, which means any transient Langfuse observability issue
will abort the entire evaluation execution and prevent returning successful
results. Wrap both the langfuse_client.trace() calls in try-except blocks (or
similar error handling) to isolate them from the critical evaluation path,
ensuring that Langfuse failures are caught and logged but do not interrupt
evaluation completion, database commits, or function returns. This is especially
critical at the second site (lines 185–188) where the database has already been
committed and scores computed.

Comment thread backend/pyproject.toml
"pydantic==2.10.4",
"pydantic-settings==2.7.0",
"langfuse==2.56.2",
"langfuse>=2.0.0",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Constrain langfuse to a bounded range for reproducible runtime behavior.

Line 13 allows any future major release, which can silently introduce breaking API changes across environments. Prefer pinning or at least an upper bound.

Suggested patch
-    "langfuse>=2.0.0",
+    "langfuse>=2.56.2,<3.0.0",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"langfuse>=2.0.0",
"langfuse>=2.56.2,<3.0.0",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/pyproject.toml` at line 13, The langfuse dependency constraint at
line 13 in backend/pyproject.toml is specified as "langfuse>=2.0.0" which lacks
an upper bound, allowing any future major version to be installed and
potentially introducing breaking changes. Modify the version constraint to
include an upper bound by changing it to "langfuse>=2.0.0,<3.0.0" (or similar
bounded range) to ensure reproducible behavior and prevent silent breakage from
major version updates while still allowing compatible patch and minor version
updates.

Comment thread docker-compose.yml
es-data: null
db-data: null

secrets:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we remove the deploy key and use a local copy of the core utils package instead of downloading from the url?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably for testing, this will be restored

Comment thread backend/app/main.py

@asynccontextmanager
async def lifespan(app: FastAPI):
if not settings.OPENAI_API_KEY:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be a problem for local runs. might be better to run a conditional check / make this optional with a warning

def get_judge_health():
from app.config import settings

api_key = getattr(settings, "OPENAI_API_KEY", None)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer calling the judge directly instead of the indirect check that the var exists. the logic might change and we will forget to correct this endpoint, as well as missing unexpected errors

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean calling evaluate_with_llm instead?


from esca_sdk import EscaClient

client = EscaClient(api_key=api_key, base_url=base_url)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use the util function?


try:
from agent.config import settings as agent_settings
api_key = getattr(agent_settings, "ESCA_API_KEY", None)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the judge - just call the function and catch any errors. var validation is not supposed to happen here

Comment on lines 17 to +18
MAX_REFINER_ITERATIONS = 3
REFINER_SCHEMA_CONTEXT_TABLES = 4

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to settings file and make env var



# Define Tools
@tool

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why delete?

@yuvalkh yuvalkh Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need it currently as a tool and there is a function called hybrid_search_tables

if i < settings.MAX_PROFILES_TO_FETCH:

# 2. Automatically get profiles for the top candidate tables (up to MAX_PROFILES_TO_FETCH) to seed the prompt
import asyncio

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move imports to top of the file

Comment thread agent/src/agent/nodes/schema_explorer.py

return {"schema_plan": plan}

tables_used = getattr(data, "tables_used", [])

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this here

please extract the logic to a function / middleware

I think this checks design can be improved

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to create a node in the graph for "sql static validations" that will verify syntax and that the tables actually exist. this way it will be easier to navigate, will result in cleaner code, and will be easier to evaluate


preview_str = ""
if raw_data_ref:
if esca_write_failed or not raw_data_ref:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we continuing to here if the write failed? shouldn't we fail the whole graph if that happens? it feels wrong to finalize as if nothing is wrong if we have failed to fetch the data

@yuvalkh yuvalkh Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided that currently we will add a flag that decide if we want to work with esca and later on in a different branch we will add somehting like healthcheck to esca and decide smartly if it wants to work with esca or not dynamically and based if it's on air

res = await client.save_data(esca_payload)
esca_id = res.get("esca_id")
except Exception as e:
esca_id = None

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are already saving escape errors in state, why not here?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the state of graph, this is for getting the profile and then later on get it from esca


return {"schema_plan": plan}

tables_used = getattr(data, "tables_used", [])

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to create a node in the graph for "sql static validations" that will verify syntax and that the tables actually exist. this way it will be easier to navigate, will result in cleaner code, and will be easier to evaluate

Comment thread agent/src/agent/config.py
HYBRID_SEARCH_MAX_TABLES: int = 10
MAX_PROFILES_TO_FETCH: int = 3
PROFILE_FETCH_CONCURRENCY: int = Field(default=4, gt=0)
REDIS_URL: str = "redis://localhost:6379"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the redis connector from the core utils package, it has its own redis conf

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be deleted because we use the get_redis_client function from core

router = APIRouter(prefix="/tables", tags=["publish"])


def _check_regression(table_id: str, new_score: float, session: Session) -> list[dict]:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove?

@yuvalkh yuvalkh Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was no use of this function before.
We have the function _run_regression_eval that we use

Comment thread agent/src/agent/graph.py

return {
"feedback_route": route,
"sql_query": "",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, we don't want to delete other things because maybe we do need the query or the plan.
But what we should add later is a better approach that also look at the feedback and pass it as optional variable into some of the nodes.

Comment thread agent/src/agent/graph.py
Comment on lines +94 to +98
if state.get("hallucinated_tables"):
if state.get("schema_explorer_retry_count", 0) >= 3:
return "query_builder"
return "schema_explorer"
return "query_builder"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inefficient if's, but also, why should it silently continue if there are hallucinations? I think it would be better to through an error...

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if hallucination we should throw error and not continue to query builder

Comment on lines +100 to +104
esca_write_failed = True
if langfuse_client and langfuse_client.get_current_trace_id():
langfuse_client.span(id=langfuse_client.get_current_observation_id(),
level="WARNING", status_message=f"ESCA write failed: {e}"
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned it previously but if we can't write the data there is no reason to silently continue to the next graph. it should throw an error

@yuvalkh yuvalkh Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread agent/Dockerfile
# Copy workspace
COPY core /app/core
COPY agent /app/agent
COPY python-core-utils /app/python-core-utils

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be changed because we use the python-core-utils as library and not as folder

Comment thread agent/Dockerfile

# Configure git to use ssh for github.com
RUN git config --global url."git@github.com:".insteadOf "https://github.com/"
RUN apt-get update && apt-get install -y --no-install-recommends git \

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert we also want openssh-client for accessing the python core utils library

Comment thread agent/pyproject.toml
[tool.uv.sources]
core = { path = "../core" }
esca-sdk = { git = "https://github.com/elirazpevz/esca.git", subdirectory = "packages/esca-sdk" }
python-core-utils = { path = "../python-core-utils" }

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment thread backend/app/config.py
model_config = SettingsConfigDict(env_file=".env", extra="ignore")

DATABASE_URL: str = "sqlite:///./text2sql.db"
OPENAI_API_KEY: str | None = None

@yuvalkh yuvalkh Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change into LLM_API_KEY because that's what we use in our agent

Comment thread backend/Dockerfile
# Copy workspace dependencies
COPY core /app/core
COPY backend /app/backend
COPY python-core-utils /app/python-core-utils

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here change to python-core-utils the library and not from the local files in the project

Comment thread backend/pyproject.toml
"pydantic==2.10.4",
"pydantic-settings==2.7.0",
"langfuse==2.56.2",
"langfuse>=2.0.0",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set it to 4.7.1 since it's what we're working with right now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants