Skip to content

feat: agent architecture and performance optimizations#16

Open
stavp-star wants to merge 2 commits into
stav/critic-bug-fixesfrom
stav/agent-architecture-performence
Open

feat: agent architecture and performance optimizations#16
stavp-star wants to merge 2 commits into
stav/critic-bug-fixesfrom
stav/agent-architecture-performence

Conversation

@stavp-star

@stavp-star stavp-star commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary of Work

This PR introduces the G2 architecture refactors and performance improvements:

  • Restricts schema explorer's parallel LLM summarization calls to sequential execution using a Semaphore to prevent local model OOM.
  • Makes semantic typing parser resilient by accepting both dictionaries and objects from structured LLM outputs, preventing strict KeyError: 'table_column' crashes.
  • Enhances Trino connectivity default schema config and fixes table mappings.
  • Implements a comprehensive unit testing suite covering resilience, state machine routing, quality gateways, and caching.

Open Questions & Future Actions

  1. Testing Location for API-bound Logic: The Langfuse handler test (test_tts_g1_02_langfuse_handler_isolation) and LLM judge tests currently run in the agent repository's test suite. Should these tests be migrated to backend/tests/ since they validate API endpoints and backend services?
  2. Local LLM Structured Output Constraints: The 1B parameter model (llama3.2:1b) fails to handle Ollama's structured grammar JSON constraints, crashing the runner process with unexpected EOF. Should we restrict Advanced Schema Explorer features (like semantic typing and schema summarization) to larger models (e.g., gemma4:e4b), or fallback to unstructured text parsing for small local models?
  3. Dynamic Schema Prefixing: To prevent Trino TABLE_NOT_FOUND errors when querying tables belonging to non-default schemas (e.g., complex_retail.customers), should the query builder dynamically prefix tables using metadata registered in the database, instead of relying solely on TRINO_SCHEMA env defaults?

Summary by CodeRabbit

Release Notes

  • New Features

    • Added satisfaction check gating with execution, plausibility, column coverage, and semantic alignment validations
    • Implemented human-in-the-loop escalation with pause/resume capability
    • Added schema enrichment pipeline including semantic typing, join graph, and ambiguity detection
    • Introduced Redis-backed caching for schema and profile data
    • Added health check endpoints for judge and Esca services
    • Enhanced schema exploration with hallucination detection and retry logic
  • Improvements

    • Improved error tracking and accumulation across refinement iterations
    • Enhanced resilience with fallback behavior for Esca write failures
    • Refactored LLM initialization for consistency across components
    • Added configuration validation at startup
    • Updated Docker builds to use HTTPS instead of SSH dependencies
  • Configuration

    • Added feature flags for advanced schema exploration phases
    • Added table scoping modes (strict/hybrid)
    • Added satisfaction check thresholds and settings

Stav Ponte added 2 commits June 16, 2026 17:49
…1 through G2-05)

TTS-G2-01: Table Scoping Modes
- Add scoping_mode field to AgentState ('strict' | 'hybrid')
- Add TABLE_SCOPING_MODE config setting (default: hybrid)
- Add validate_config_node as START→extractor gatekeeper with
  InvalidConfigurationException for strict mode + empty allowed_tables
- hybrid_search_tables enforces hard allowlist in strict mode
- Strict mode injects [STRICT MODE] block into schema explorer prompt
- Langfuse trace tagged with scoping_mode=<value>

TTS-G2-02: Fallback / Escalation Hierarchy (HITL)
- Add escalated, escalation_reason to AgentState
- Add hitl_escalation_node compiled with interrupt_before=['hitl_escalation']
- route_schema_explorer escalates after MAX_SCHEMA_RETRIES (3) retries
- route_refiner escalates after MAX_REFINER_ITERATIONS exhausted
- refiner.py and schema_explorer.py set escalation_reason before handing off
- hitl_escalation → extractor edge enables full state-reset resume

TTS-G2-03: Advanced Schema Explorer (4-Phase Pipeline)
- Add networkx>=3.3 dependency
- New utils/schema_enrichment.py with 4 async phases:
  - run_semantic_typing (LLM column classification)
  - run_join_graph (BFS via networkx over ForeignKeyMapping + CrossTableProfile)
  - run_schema_summarization (per-table LLM summaries replacing verbose JSON)
  - run_ambiguity_detection (column/table name ambiguity notes)
- Each phase gated by ENABLE_* config flag (defaults: only AMBIGUITY_DETECT on)
- active_schema_phases list emitted to Langfuse trace metadata

TTS-G2-04: Satisfaction Check Module
- New nodes/satisfaction_check.py with 4 checks:
  A: Execution success (SATISFACTION_CHECK_EXECUTION)
  B: Row plausibility min/max (SATISFACTION_CHECK_PLAUSIBILITY)
  C: Structural column coverage via LLM (SATISFACTION_CHECK_COLUMNS)
  D: Semantic alignment scored 0-1 (SATISFACTION_CHECK_SEMANTIC, off by default)
- satisfaction_fail_count drives routing: < MAX → refiner, >= MAX → hitl_escalation
- Replaces direct refiner→finalizer success path
- All check results pushed to Langfuse trace metadata

TTS-G2-05: Redis Schema Cache
- New core/src/core/cache.py with CacheService (get/set/invalidate/invalidate_pattern)
- Non-blocking: all Redis errors log a warning and fall through to live data
- Key conventions: profile:{table_id}:{version}, ddl:{cat}:{sch}:{tbl}, catalog_valid:...
- SCAN-based pattern invalidation (no blocking KEYS command)
- Integrated into get_table_profile: cache-first fetch with PROFILE_CACHE_TTL
- cache_hit_count / cache_miss_count tracked per schema_explorer invocation
- Counters pushed to Langfuse trace at end of schema_explorer_node
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces "Group 2" agent hardening: a new satisfaction-check gate, HITL escalation node, schema enrichment phases (semantic typing, join graph, summarization, ambiguity detection), Redis-backed profile caching, a shared LLM factory, and expanded AgentState. The backend migrates from langfuse_context to langfuse_client.client.trace, removes scoring.py and stubs, adds /health/judge and /health/esca endpoints, and switches python-core-utils from SSH/git to a local path reference.

Changes

Agent G2 Pipeline Hardening

Layer / File(s) Summary
SSH→local-path dependency and build infra
.gitignore, agent/Dockerfile, agent/pyproject.toml, core/pyproject.toml, docker-compose.yml, frontend/...
Removes SSH deploy-key-based git access from Dockerfiles and docker-compose; copies python-core-utils directly into images; adds uv.sources local path mappings; updates gitignore and frontend favicon.
Core CacheService, Trino params, LLM factory, and shared utilities
core/src/core/cache.py, core/src/core/__init__.py, core/src/core/config.py, core/src/core/langfuse.py, core/src/core/trino.py, agent/src/agent/llm.py, agent/src/agent/utils/esca.py, agent/src/agent/utils/sql.py
Adds CacheService (Redis SCAN/pipeline, JSON helpers, key builders, singleton factory); re-exports from core; adds Langfuse config fields and get_langfuse_handler; extends execute_query_sync with optional params; adds get_esca_client context manager, clean_sql, and get_llm factory.
AgentState and AgentSettings expansion
agent/src/agent/state.py, agent/src/agent/config.py
Adds 11 new optional AgentState fields (error tracking, HITL escalation, inline results, satisfaction check); expands AgentSettings with Redis URL, scoping mode, schema enrichment feature flags, satisfaction thresholds, and cache TTLs.
Schema enrichment phases and schema explorer refactor
agent/src/agent/utils/schema_enrichment.py, agent/src/agent/nodes/schema_explorer.py
Adds schema_enrichment.py with four async phases (run_semantic_typing, run_join_graph with networkx/BFS fallback, run_schema_summarization, run_ambiguity_detection) and six Pydantic output models. Refactors schema_explorer_node with scoping modes, concurrent Redis-cached profile fetching, multi-phase enrichment, ambiguity interrupt, and hallucinated-table detection.
Refiner, query builder, extractor, and finalizer node updates
agent/src/agent/nodes/refiner.py, agent/src/agent/nodes/query_builder.py, agent/src/agent/nodes/extractor.py, agent/src/agent/nodes/finalizer.py
Switches all nodes to get_llm factory. Refiner gains error_history accumulation, schema-context injection, and richer return payload. Finalizer falls back to inline_result_rows when ESCA write fails. Extractor uses Langfuse-managed prompt with json_schema structured output.
Satisfaction check node and hardened graph topology
agent/src/agent/nodes/satisfaction_check.py, agent/src/agent/graph.py
Adds satisfaction_check_node with four independent checks and Langfuse instrumentation. Rebuilds graph with validate_config start, hitl_escalation pause/resume node, satisfaction gating, expanded conditional routing, and interrupt_before=["hitl_escalation"].
Chat router request-scoped handler and MCP server cleanup
agent/src/agent/routers/chat.py, agent/src/agent/mcp_server.py
Replaces module-level CallbackHandler with Depends(get_langfuse_handler); adds sql_query to interrupted ChatResponse. Removes unused pydantic imports from MCP server.
Agent test suite
agent/tests/conftest.py, agent/tests/test_routing.py, agent/tests/test_resilience.py, agent/tests/test_cache_and_gates.py, agent/tests/test_isolation.py
Adds conftest with mocks for LLM, Redis, Trino, ESCA, and Langfuse. Adds tests for satisfaction gating, CacheService SCAN eviction, routing/escalation/scoping, ESCA fallback, profile fetch resilience, and handler isolation.

Backend Langfuse Migration, Cleanup, and Infrastructure

Layer / File(s) Summary
Backend Dockerfile, config, infra init, and startup guard
backend/Dockerfile, backend/app/config.py, backend/app/main.py, backend/app/infra_init.py, backend/pyproject.toml
Removes SSH from backend Dockerfile; adds OPENAI_API_KEY to Settings with RuntimeError startup guard; reads connection params from environment in infra_init; extends seed data per table; relaxes langfuse pin to >=2.0.0.
Langfuse context migration, scoring removal, and stub cleanup
backend/app/services/langfuse_client.py, backend/app/services/llm_judge.py, backend/app/services/evaluator.py, backend/app/routers/evaluation.py, backend/app/routers/orchestration.py, backend/app/routers/publish.py, backend/app/services/scoring.py, core/src/core/models/models.py, backend/app/services/profiling_engine.py, backend/app/services/trino_client.py
Migrates all langfuse_context.update_current_trace calls to conditional langfuse_client.client.trace; adds @observe to evaluate_with_llm; removes scoring.py, stub helpers, regression gate, EvaluationHistoryMetricRead, and EvalResultStatus.
New health endpoints and backend import reordering
backend/app/routers/health.py, backend/app/routers/admin_approval.py, backend/app/routers/admin_auth.py, backend/app/routers/audit.py, backend/app/routers/enrichment.py, backend/app/routers/feedback.py, backend/app/routers/questions.py, backend/app/routers/scopes.py, backend/app/services/auth.py, backend/app/services/join_detection.py
Adds GET /health/judge and GET /health/esca endpoints. Applies import reordering (core imports before fastapi/sqlmodel) across all backend routers and services.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant chat_endpoint
  participant agent_graph
  participant validate_config_node
  participant schema_explorer_node
  participant refiner_node
  participant satisfaction_check_node
  participant hitl_escalation_node
  participant finalizer_node

  Client->>chat_endpoint: POST /chat (query, allowed_tables)
  chat_endpoint->>agent_graph: ainvoke(state, config[langfuse_handler])
  agent_graph->>validate_config_node: check strict-mode allowed_tables
  validate_config_node-->>agent_graph: pass / InvalidConfigurationException
  agent_graph->>schema_explorer_node: hybrid search + enrichment phases
  schema_explorer_node-->>agent_graph: schema_plan or hallucinated→retry/HITL
  agent_graph->>refiner_node: execute SQL + accumulate error_history
  refiner_node-->>agent_graph: success(inline_result_rows) or escalation_reason
  agent_graph->>satisfaction_check_node: run 4 checks (exec/plausibility/columns/semantic)
  satisfaction_check_node-->>agent_graph: satisfaction_failures + fail_count
  alt fail_count >= SATISFACTION_MAX_FAILURES
    agent_graph->>hitl_escalation_node: interrupt_before pause
    hitl_escalation_node-->>Client: interrupted response (sql_query, schema_plan)
    Client->>chat_endpoint: resume_value
    chat_endpoint->>agent_graph: Command(resume=resume_value)
  else pass
    agent_graph->>finalizer_node: build preview (ESCA or inline_result_rows fallback)
    finalizer_node-->>chat_endpoint: summary + sql_explanation + raw_data_ref
    chat_endpoint-->>Client: completed ChatResponse
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • StavPonte11/text2sql-onboarding#10: Directly related — the main PR extends the same rejection_router_node routing and graph topology introduced there, adding satisfaction/HITL conditional edges to the same workflow graph.
  • StavPonte11/text2sql-onboarding#13: Directly related — both PRs modify agent/src/agent/mcp_server.py's chat_with_agent tool, including the interrupted/completed branching and extractor-selection logic.

Poem

🐇 Hoppity-hop through the graph I go,
Past HITL gates where the queries flow,
Redis caches my schemas with glee,
No SSH keys — just local paths for me!
Satisfaction checks keep the SQL neat,
The rabbit declares this pipeline complete! 🎉

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch stav/agent-architecture-performence

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 23

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
agent/pyproject.toml (1)

7-27: ⚠️ Potential issue | 🟠 Major

Add python-core-utils to runtime dependencies in [project].dependencies.

python-core-utils is imported directly in three files (agent/src/agent/main.py, agent/src/agent/routers/chat.py, and agent/src/agent/nodes/schema_explorer.py) but is only declared as a source mapping in [tool.uv.sources] (line 27), not in the dependencies list. Source mappings do not guarantee runtime availability; in clean environments this will surface as ModuleNotFoundError.

Proposed fix
 dependencies = [
     "fastapi>=0.115.6",
     "uvicorn[standard]>=0.32.1",
     "langgraph",
     "langchain-ollama",
     "core",
+    "python-core-utils",
     "esca-sdk",
     "pydantic-settings>=2.7.0",
     "trino>=0.328.0",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/pyproject.toml` around lines 7 - 27, The dependency `python-core-utils`
is declared as a source mapping in the [tool.uv.sources] section but is missing
from the [project].dependencies list, which will cause ModuleNotFoundError in
clean environments where it is imported in main.py, routers/chat.py, and
nodes/schema_explorer.py. Add `python-core-utils` as an entry in the
dependencies list under [project], similar to how other local path dependencies
like `core` are included.
backend/app/infra_init.py (1)

660-667: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

_seed_trino_data is not idempotent when DELETE fails.

If DELETE is unsupported, the code still executes INSERT unconditionally, so every startup appends duplicate seed rows.

Suggested fix
         try:
             # Clear existing data to apply updated seeds cleanly
             try:
                 _trino_exec(f"DELETE FROM {table['fqn']}")
             except Exception as e:
                 # If DELETE is not supported (e.g. some Iceberg configs require specific formats), we ignore and fall back to count checks
                 logger.debug("DELETE on %s failed: %s", table["fqn"], e)
+                existing = _trino_exec(table["count_sql"], ignore_errors=True)
+                if existing and existing[0][0] > 0:
+                    logger.info(
+                        "[InfraInit] Skipping seed for '%s' (rows already present)",
+                        table["fqn"],
+                    )
+                    continue
 
             _trino_exec(table["seed_sql"])
             logger.info("[InfraInit] Seeded sample data into '%s' ✓", table["fqn"])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/infra_init.py` around lines 660 - 667, The _seed_trino_data
function executes the INSERT statement unconditionally even when the DELETE
operation fails, causing duplicate seed data on repeated executions. When the
DELETE fails and the exception is caught, before executing the seed_sql INSERT,
add a count check using a Trino query to verify if the table is empty. Only
proceed with the INSERT statement if the count is zero, otherwise skip the
insert to maintain idempotency.
backend/app/services/langfuse_client.py (1)

88-103: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update docstring to reflect removed method.

The docstring on Line 102 still references get_prompt_as_langchain(name), but this method was removed in this PR. Please update the docstring to remove this reference.

📝 Proposed fix
     """
     Langfuse connection — mirrors LangfuseTracer from the main application.
 
     Data members (DMs):
         public_key  — Langfuse public API key.
         private_key — Langfuse secret API key.
         host        — Langfuse server URL.
         client      — Instantiated sdk.Langfuse client (None if disabled).
 
     Implements Connection: connect(), is_alive(), logout().
 
     Additional methods (present in the main app's LangfuseTracer):
         get_prompt(name)                  — fetch a prompt by name.
-        get_prompt_as_langchain(name)     — fetch and convert to a LangChain prompt.
     """
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/services/langfuse_client.py` around lines 88 - 103, The docstring
for the LangfuseTracer class still lists the removed method
`get_prompt_as_langchain(name)` in the "Additional methods" section. Locate the
docstring in the LangfuseTracer class and remove the line that references
`get_prompt_as_langchain(name) — fetch and convert to a LangChain prompt.` while
keeping the reference to the remaining `get_prompt(name)` method.
core/src/core/models/models.py (1)

325-325: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reintroduce a constrained status type for EvalResult.status.

Line 325 stores status as unconstrained str, which allows invalid persisted states and can silently skew downstream status-filtered metrics. Also, the inline comment says "pass" | "fail" but writers already use "pending". Use a StrEnum (or equivalent DB-level constraint) for pending/pass/fail.

Suggested fix
+class EvalResultStatus(StrEnum):
+    pending = "pending"
+    passed = "pass"
+    failed = "fail"
+
 class EvalResult(SQLModel, table=True):
@@
-    status: str  # "pass" | "fail"
+    status: EvalResultStatus = Field(default=EvalResultStatus.pending)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/core/models/models.py` at line 325, The status field in EvalResult
at line 325 is currently typed as an unconstrained str, which allows invalid
values to be persisted and contradicts the inline comment that only lists "pass"
| "fail" when actual usage includes "pending". Define a StrEnum with the valid
status values pending, pass, and fail, then change the status field type
annotation from str to use this new enum type instead of the unconstrained
string type.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.gitignore:
- Line 42: The python-core-utils/ directory is currently ignored in .gitignore
but both Dockerfiles attempt to copy it, which will fail during Docker builds.
Resolve this discrepancy by choosing one of three approaches: (1) remove the
python-core-utils/ entry from .gitignore and commit the directory to the
repository, (2) convert python-core-utils/ into a git submodule and update both
Dockerfiles to initialize it during the build, or (3) if the directory is not
actually needed in the Docker images, remove the COPY python-core-utils
directives from the Dockerfiles. Ensure the chosen approach is consistently
applied across all affected files so that Docker builds succeed in CI
environments.

In `@agent/src/agent/config.py`:
- Around line 51-58: Add Pydantic Field constraints to validate the satisfaction
and cache configuration values in the config class. For SATISFACTION_MIN_ROWS
and SATISFACTION_MAX_ROWS, ensure they are positive integers with MIN_ROWS less
than MAX_ROWS. For SATISFACTION_SEMANTIC_THRESHOLD, constrain it to the range
[0, 1]. For SATISFACTION_MAX_FAILURES, ensure it is a positive integer. For
SCHEMA_CACHE_TTL and PROFILE_CACHE_TTL, ensure both are positive integers. Use
Pydantic validators like gt (greater than), ge (greater than or equal), le (less
than or equal), and Field constraints that match the patterns already
established elsewhere in this config class.

In `@agent/src/agent/nodes/finalizer.py`:
- Around line 72-76: The isinstance check for preview_rows[0] in the column
extraction logic is checking if each row is a dict, but the result.rows
structure from TrinoExecutionResult contains lists, not dicts. Fix the
isinstance check to verify that preview_rows[0] is a list instead of a dict.
Additionally, update the test mock in conftest.py that incorrectly defines rows
as dicts to instead define rows as lists to match the actual
TrinoExecutionResult structure, which will ensure the column extraction logic is
properly tested with the correct data type.

In `@agent/src/agent/nodes/refiner.py`:
- Around line 8-9: Remove the duplicate import statement for langfuse_client
from the agent.langfuse_client module in the refiner.py file. Keep only a single
import statement importing langfuse_client from agent.langfuse_client and delete
the redundant duplicate line.

In `@agent/src/agent/nodes/satisfaction_check.py`:
- Around line 43-48: The column extraction logic in the satisfaction_check
function incorrectly assumes rows are dictionaries when they are actually lists
(as returned by Trino through refiner_node). Instead of checking if rows[0] is a
dict and calling .keys(), you need to handle the case where rows are lists and
extract column names from the state's column metadata or adjust the logic to
work with list-based rows. This will ensure the columns list is properly
populated so that Check C and Check D are not skipped due to the empty columns
guard condition in their conditional statements.

In `@agent/src/agent/state.py`:
- Around line 23-36: The TypedDict in state.py currently models optional fields
incorrectly. The fields last_error, hallucinated_tables, esca_write_failed,
inline_result_rows, error_history, schema_explorer_retry_count, escalated,
escalation_reason, satisfaction_failures, and satisfaction_fail_count are
conditionally set and accessed with defaults throughout the workflow, but the
current type definition with total=True (default) requires all keys to exist.
Wrap each of these field definitions with NotRequired[...] to properly indicate
they are optional. Additionally, change the scoping_mode field from str | None
to NotRequired[Literal["strict", "hybrid"] | None] to document the valid
constraint values and mark it as truly optional. Import NotRequired from typing
(or typing_extensions if needed) and Literal from typing to support these type
annotations.

In `@agent/src/agent/utils/schema_enrichment.py`:
- Around line 271-277: The BFS fallback path construction in the loop (where
node_names are iterated and _bfs_shortest_path is called) is missing the "joins"
key that the networkx branch includes. Update the paths.append() call to include
a "joins" key with a placeholder value (such as an empty list) alongside the
existing "from", "to", and "path" keys, ensuring downstream consumers receive a
consistent dictionary structure regardless of whether networkx is available.
- Around line 363-369: The `run_ambiguity_detection` function assumes the LLM
response is always an `AmbiguityOutput` model instance, but when using
`method="json_schema"`, the LLM may return a dict instead. This causes an
`AttributeError` when accessing `.ambiguity_notes`. Add defensive handling
similar to what `run_semantic_typing` and `run_schema_summarization` do: check
if the result is a dict and extract the ambiguity_notes field accordingly (e.g.,
`result.get("ambiguity_notes", [])` for dict responses), while continuing to
handle model instances by accessing the `.ambiguity_notes` attribute as before.

In `@agent/tests/test_isolation.py`:
- Around line 39-52: The test imports evaluate_with_llm directly at the module
level, but patches core.llm.evaluate_with_llm, causing the patch to be bypassed
since the local binding is used instead. Additionally, the silent ImportError
catch masks that core.llm does not exist—the function is actually located in
backend/app/services/llm_judge.py. Fix this by: (1) removing the direct import
of evaluate_with_llm at the top of the try block, (2) importing the correct
module path where evaluate_with_llm actually lives
(backend/app/services/llm_judge.py), (3) updating the patch target to match the
actual module location, (4) removing the silent ImportError exception handler so
test failures are not hidden.

In `@backend/app/infra_init.py`:
- Line 27: The `import os` statement is not placed with the other standard
library imports at the top of the file, which violates Ruff's E402 rule (module
level import not at top of file). Move the `import os` import from line 27 to
the top of the file where the other standard library imports are grouped,
ensuring it is placed before any third-party or local imports.
- Around line 33-36: The environment variables _TRINO_HOST, _TRINO_PORT, and
_TRINO_USER are now being read from the environment but are not actually being
used when registering the OpenMetadata service. Find where the OpenMetadata
service registration occurs (likely a function that sets up the OM service
connection) and replace the hardcoded Trino endpoint values (currently hardcoded
as trino:8080) with the corresponding environment-driven values from _TRINO_HOST
and _TRINO_PORT to ensure OM service registration respects environment
overrides.

In `@backend/app/main.py`:
- Around line 43-45: Remove the hard-fail behavior from the OPENAI_API_KEY
validation during startup. Instead of raising RuntimeError immediately when the
key is missing, remove or comment out the raise statement so the application can
continue starting. Move the OPENAI_API_KEY validation to the judge-specific code
paths where it is actually used (not in the global startup sequence), so that
only requests requiring the judge feature will fail if the key is missing.
Alternatively, add a configuration flag that controls whether OPENAI_API_KEY
validation is enforced at startup versus only when judge endpoints are accessed.

In `@backend/app/routers/health.py`:
- Around line 215-219: The fallback logic in the _ping_esca function uses
client.load_head with a dummy ID to simulate health checks when ping() is
unavailable, which pollutes ESCA logs and metrics with unnecessary requests.
Instead of using this fallback, modify the function to fail fast with a clear,
descriptive error message if the client does not have a ping method available.
This will prevent dummy requests from being sent while making it clear that the
ESCA client needs to implement a proper ping method or health check endpoint.
- Around line 208-209: The hardcoded magic strings "dummy" and
"http://localhost:8000" in the condition checking for ESCA configuration
validation are fragile and could break if defaults change. Extract these
sentinel values as named constants in the settings module and import them into
this health check endpoint, or remove the localhost check entirely since
legitimate local development configurations should not be rejected. Update the
conditional logic to reference these constants instead of embedding the magic
strings directly.
- Around line 196-206: The nested try-except block attempting to import from
app.config is dead code since backend/app/config.py does not define ESCA_API_KEY
or ESCA_URL settings. Remove the second try block (the fallback import from
app.config) and instead directly set api_key and base_url to None in the outer
except block when the initial import from agent.config fails. This simplifies
the logic and accurately reflects that ESCA settings only exist in agent.config.

In `@backend/app/services/profiling_engine.py`:
- Around line 17-18: The import statement that retrieves `TrinoExecutionResult`
from `app.services.trino_client` is incorrect because that module is empty,
causing the timeout exception handler to crash at runtime. Change the import
source from `app.services.trino_client` to `core` (which is already imported in
the file for `execute_query_sync`). This ensures `TrinoExecutionResult` is
properly available with all required fields (`success`, `rows`, `columns`,
`error_message`) when the exception handler needs to return an error result.

In `@backend/pyproject.toml`:
- Line 13: The dependency specification for langfuse uses an overly broad
version constraint that allows major version upgrades (v3.x and v4.x) containing
breaking changes to APIs like client.trace() that your code depends on. Replace
the `langfuse>=2.0.0` constraint with a bounded range `>=2.56.2,<3.0.0` that
permits patch and minor version updates within the v2.x series while preventing
breaking major version upgrades.

In `@core/src/core/cache.py`:
- Around line 75-83: The current implementation buffers all delete operations
from the entire SCAN loop into a single pipeline and executes it once at the
end, which can cause memory and latency spikes for large match sets. Instead,
move the pipeline creation and execution inside the while loop in the
scan_delete_matching method. Create a new pipeline for each batch of keys
retrieved by SCAN, delete the keys in that batch immediately via pipe.execute(),
then continue to the next SCAN iteration. This bounds the memory and latency
cost per batch rather than accumulating everything until the final flush.
- Around line 134-139: The get_cache_service function silently ignores
mismatches between the provided redis_url argument and the URL used to create
the cached _cache_instance, which can route callers to an incorrect Redis
backend. Modify the function to detect when a different redis_url is supplied
than what the existing _cache_instance was initialized with, and either raise an
error to alert the caller of the mismatch or store the redis_url and validate
consistency on subsequent calls. This ensures that unexpected URL changes are
not silently ignored.

In `@core/src/core/config.py`:
- Around line 7-9: The LANGFUSE_BASE_URL configuration setting added in the
config.py file is not being consumed by the Langfuse handler creation in
core/src/core/langfuse.py, making the config setting non-functional. Update the
Langfuse handler instantiation to read and use the LANGFUSE_BASE_URL setting
from the configuration instead of relying on hardcoded defaults. Additionally,
ensure the configuration property name aligns with the environment variable
naming convention used in the deployment setup (docker-compose.yml uses
LANGFUSE_HOST), so either rename LANGFUSE_BASE_URL to LANGFUSE_HOST or ensure
both naming conventions are properly mapped during configuration initialization.

In `@core/src/core/langfuse.py`:
- Around line 11-12: The exception handler in the Langfuse CallbackHandler
initialization block currently uses logger.error() with string formatting, which
loses the full traceback context needed for debugging. Replace the
logger.error() call with logger.exception(), which automatically captures and
logs the complete stack trace along with the error message. This change should
be made in the except block that handles the Exception during Langfuse handler
initialization.

In `@core/src/core/trino.py`:
- Around line 72-75: The conditional check `if params:` at line 72 incorrectly
treats empty containers (empty list, dict, or tuple) as falsy values, causing
them to bypass the parameterized execute path even though they are valid
parameter values according to the type hint. Replace `if params:` with `if
params is not None:` to explicitly check for None while allowing empty
containers to be passed to the parameterized cur.execute(sql, params) call. This
ensures all valid parameter types—including empty sequences—are routed to the
correct execution path.

In `@docker-compose.yml`:
- Line 407: The backend service command at line 407 couples initialization jobs
(app.seed and app.infra_init) directly to the uvicorn startup, causing the API
server to fail if these jobs fail and re-running side effects on every container
restart. Create a separate one-shot initialization service in the
docker-compose.yml that runs only the app.seed and app.infra_init commands
independently, then modify the backend service command to run only the uvicorn
server (uvicorn app.main:app --host 0.0.0.0 --port 8000). Use docker-compose
depends_on or proper service ordering to ensure the init service completes
before the backend service attempts to start, keeping the backend focused solely
on serving traffic.

---

Outside diff comments:
In `@agent/pyproject.toml`:
- Around line 7-27: The dependency `python-core-utils` is declared as a source
mapping in the [tool.uv.sources] section but is missing from the
[project].dependencies list, which will cause ModuleNotFoundError in clean
environments where it is imported in main.py, routers/chat.py, and
nodes/schema_explorer.py. Add `python-core-utils` as an entry in the
dependencies list under [project], similar to how other local path dependencies
like `core` are included.

In `@backend/app/infra_init.py`:
- Around line 660-667: The _seed_trino_data function executes the INSERT
statement unconditionally even when the DELETE operation fails, causing
duplicate seed data on repeated executions. When the DELETE fails and the
exception is caught, before executing the seed_sql INSERT, add a count check
using a Trino query to verify if the table is empty. Only proceed with the
INSERT statement if the count is zero, otherwise skip the insert to maintain
idempotency.

In `@backend/app/services/langfuse_client.py`:
- Around line 88-103: The docstring for the LangfuseTracer class still lists the
removed method `get_prompt_as_langchain(name)` in the "Additional methods"
section. Locate the docstring in the LangfuseTracer class and remove the line
that references `get_prompt_as_langchain(name) — fetch and convert to a
LangChain prompt.` while keeping the reference to the remaining
`get_prompt(name)` method.

In `@core/src/core/models/models.py`:
- Line 325: The status field in EvalResult at line 325 is currently typed as an
unconstrained str, which allows invalid values to be persisted and contradicts
the inline comment that only lists "pass" | "fail" when actual usage includes
"pending". Define a StrEnum with the valid status values pending, pass, and
fail, then change the status field type annotation from str to use this new enum
type instead of the unconstrained string type.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: dee6554e-010f-48fd-89a8-c9a2cdf58c07

📥 Commits

Reviewing files that changed from the base of the PR and between 3a80177 and e9c4009.

⛔ Files ignored due to path filters (2)
  • agent/uv.lock is excluded by !**/*.lock
  • backend/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (57)
  • .gitignore
  • agent/Dockerfile
  • agent/pyproject.toml
  • agent/src/agent/config.py
  • agent/src/agent/graph.py
  • agent/src/agent/llm.py
  • agent/src/agent/mcp_server.py
  • agent/src/agent/nodes/extractor.py
  • agent/src/agent/nodes/finalizer.py
  • agent/src/agent/nodes/query_builder.py
  • agent/src/agent/nodes/refiner.py
  • agent/src/agent/nodes/satisfaction_check.py
  • agent/src/agent/nodes/schema_explorer.py
  • agent/src/agent/routers/chat.py
  • agent/src/agent/state.py
  • agent/src/agent/utils/esca.py
  • agent/src/agent/utils/schema_enrichment.py
  • agent/src/agent/utils/sql.py
  • agent/tests/conftest.py
  • agent/tests/test_cache_and_gates.py
  • agent/tests/test_isolation.py
  • agent/tests/test_resilience.py
  • agent/tests/test_routing.py
  • backend/Dockerfile
  • backend/app/config.py
  • backend/app/infra_init.py
  • backend/app/main.py
  • backend/app/routers/admin_approval.py
  • backend/app/routers/admin_auth.py
  • backend/app/routers/audit.py
  • backend/app/routers/enrichment.py
  • backend/app/routers/evaluation.py
  • backend/app/routers/feedback.py
  • backend/app/routers/health.py
  • backend/app/routers/orchestration.py
  • backend/app/routers/publish.py
  • backend/app/routers/questions.py
  • backend/app/routers/scopes.py
  • backend/app/services/auth.py
  • backend/app/services/evaluator.py
  • backend/app/services/join_detection.py
  • backend/app/services/langfuse_client.py
  • backend/app/services/llm_judge.py
  • backend/app/services/profiling_engine.py
  • backend/app/services/scoring.py
  • backend/app/services/trino_client.py
  • backend/pyproject.toml
  • core/pyproject.toml
  • core/src/core/__init__.py
  • core/src/core/cache.py
  • core/src/core/config.py
  • core/src/core/langfuse.py
  • core/src/core/models/models.py
  • core/src/core/trino.py
  • docker-compose.yml
  • frontend/.env
  • frontend/index.html
💤 Files with no reviewable changes (2)
  • backend/app/services/trino_client.py
  • backend/app/services/scoring.py

Comment thread .gitignore
agent/scratch/*
fix_agent_langfuse.py
fix_langfuse.py
python-core-utils/

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Tracked files under python-core-utils ==="
git ls-files | rg '^python-core-utils/' || true

echo
echo "=== Submodule status (if any) ==="
git submodule status || true

echo
echo "=== COPY references for python-core-utils ==="
rg -n 'COPY\s+python-core-utils' agent/Dockerfile backend/Dockerfile docker-compose.yml

Repository: StavPonte11/text2sql-onboarding

Length of output: 337


Both agent/Dockerfile and backend/Dockerfile reference an ignored directory.

python-core-utils/ is ignored in .gitignore yet both Dockerfiles attempt to COPY python-core-utils (agent/Dockerfile line 12, backend/Dockerfile line 13). The directory is neither tracked in git nor a submodule, so Docker builds will fail with a missing-path error in CI environments. Either unignore the directory and commit it, establish it as a git submodule, or remove the COPY directives if the directory should not be part of the image.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.gitignore at line 42, The python-core-utils/ directory is currently ignored
in .gitignore but both Dockerfiles attempt to copy it, which will fail during
Docker builds. Resolve this discrepancy by choosing one of three approaches: (1)
remove the python-core-utils/ entry from .gitignore and commit the directory to
the repository, (2) convert python-core-utils/ into a git submodule and update
both Dockerfiles to initialize it during the build, or (3) if the directory is
not actually needed in the Docker images, remove the COPY python-core-utils
directives from the Dockerfiles. Ensure the chosen approach is consistently
applied across all affected files so that Docker builds succeed in CI
environments.

Comment thread agent/src/agent/config.py
Comment on lines +51 to +58
SATISFACTION_MIN_ROWS: int = 1
SATISFACTION_MAX_ROWS: int = 50_000
SATISFACTION_SEMANTIC_THRESHOLD: float = 0.75
SATISFACTION_MAX_FAILURES: int = 2 # escalate to HITL after this many check failures

# ── G2-05: Redis Schema Cache ─────────────────────────────────────────────
SCHEMA_CACHE_TTL: int = 600 # seconds — DDL content
PROFILE_CACHE_TTL: int = 1800 # seconds — table profile statistics

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n agent/src/agent/config.py | head -100

Repository: StavPonte11/text2sql-onboarding

Length of output: 3067


🏁 Script executed:

rg "SATISFACTION_MIN_ROWS|SATISFACTION_MAX_ROWS|SATISFACTION_SEMANTIC_THRESHOLD|SATISFACTION_MAX_FAILURES|SCHEMA_CACHE_TTL|PROFILE_CACHE_TTL" --type py -B 2 -A 2

Repository: StavPonte11/text2sql-onboarding

Length of output: 5927


🏁 Script executed:

rg "validator|field_validator|model_validator" agent/src/agent/config.py -A 3

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


🏁 Script executed:

cat -n agent/src/agent/nodes/satisfaction_check.py | head -150

Repository: StavPonte11/text2sql-onboarding

Length of output: 6850


Add bounds validation for satisfaction thresholds and cache TTLs.

These env-driven values currently accept invalid ranges (e.g., negative or zero TTLs, semantic threshold outside [0,1], or inverted row bounds), which silently degrade gate decisions and cache behavior:

  • SATISFACTION_MIN_ROWS ≤ 0 causes the plausibility check to never fail
  • SATISFACTION_MAX_ROWS ≤ 0 or < MIN_ROWS inverts the check logic
  • SATISFACTION_SEMANTIC_THRESHOLD outside [0,1] makes the check always pass or always fail
  • SATISFACTION_MAX_FAILURES ≤ 0 breaks escalation logic
  • SCHEMA_CACHE_TTL and PROFILE_CACHE_TTL ≤ 0 fail on Redis setex operations

Apply Pydantic Field constraints (already used elsewhere in this config class):

Proposed fix
-    SATISFACTION_MIN_ROWS: int = 1
-    SATISFACTION_MAX_ROWS: int = 50_000
-    SATISFACTION_SEMANTIC_THRESHOLD: float = 0.75
-    SATISFACTION_MAX_FAILURES: int = 2  # escalate to HITL after this many check failures
+    SATISFACTION_MIN_ROWS: int = Field(default=1, ge=0)
+    SATISFACTION_MAX_ROWS: int = Field(default=50_000, ge=1)
+    SATISFACTION_SEMANTIC_THRESHOLD: float = Field(default=0.75, ge=0.0, le=1.0)
+    SATISFACTION_MAX_FAILURES: int = Field(default=2, ge=1)  # escalate to HITL after this many check failures
@@
-    SCHEMA_CACHE_TTL: int = 600    # seconds — DDL content
-    PROFILE_CACHE_TTL: int = 1800  # seconds — table profile statistics
+    SCHEMA_CACHE_TTL: int = Field(default=600, gt=0)    # seconds — DDL content
+    PROFILE_CACHE_TTL: int = Field(default=1800, gt=0)  # seconds — table profile statistics
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/config.py` around lines 51 - 58, Add Pydantic Field
constraints to validate the satisfaction and cache configuration values in the
config class. For SATISFACTION_MIN_ROWS and SATISFACTION_MAX_ROWS, ensure they
are positive integers with MIN_ROWS less than MAX_ROWS. For
SATISFACTION_SEMANTIC_THRESHOLD, constrain it to the range [0, 1]. For
SATISFACTION_MAX_FAILURES, ensure it is a positive integer. For SCHEMA_CACHE_TTL
and PROFILE_CACHE_TTL, ensure both are positive integers. Use Pydantic
validators like gt (greater than), ge (greater than or equal), le (less than or
equal), and Field constraints that match the patterns already established
elsewhere in this config class.

Comment on lines +72 to +76
columns = (
list(preview_rows[0].keys())
if preview_rows and isinstance(preview_rows[0], dict)
else []
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the structure of inline_result_rows from refiner
rg -n "inline_result_rows" -A3 -B3 agent/src/agent/nodes/refiner.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 569


🏁 Script executed:

cat -n agent/src/core/trino.py | head -100

Repository: StavPonte11/text2sql-onboarding

Length of output: 131


🏁 Script executed:

rg -n "def.*rows" agent/src/core/trino.py -A5

Repository: StavPonte11/text2sql-onboarding

Length of output: 139


🏁 Script executed:

rg -n "result = " agent/src/agent/nodes/refiner.py -B5 -A2

Repository: StavPonte11/text2sql-onboarding

Length of output: 640


🏁 Script executed:

find agent -name "*.py" -type f | head -20

Repository: StavPonte11/text2sql-onboarding

Length of output: 724


🏁 Script executed:

rg -n "execute_query_sync" -A10

Repository: StavPonte11/text2sql-onboarding

Length of output: 11234


🏁 Script executed:

rg -n "class.*Result|def.*rows" --type py | grep -i trino

Repository: StavPonte11/text2sql-onboarding

Length of output: 140


🏁 Script executed:

cat -n core/src/core/trino.py | sed -n '17,100p'

Repository: StavPonte11/text2sql-onboarding

Length of output: 3461


🏁 Script executed:

rg -n "rows.*=" core/src/core/trino.py -A2 -B2

Repository: StavPonte11/text2sql-onboarding

Length of output: 812


Column extraction fails: isinstance(preview_rows[0], dict) check will always be False for inline result rows.

inline_result_rows comes from result.rows, which is typed as list[list[Any]] in TrinoExecutionResult (core/src/core/trino.py:19) and populated as [list(r) for r in rows] (line 84). The code at lines 73–74 checks isinstance(preview_rows[0], dict), but each row is a list, not a dict. This means columns will always be [] when falling back to inline results, leaving the preview without column headers.

The test mock in agent/tests/conftest.py:119 incorrectly defines rows as dicts, masking this bug.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/nodes/finalizer.py` around lines 72 - 76, The isinstance
check for preview_rows[0] in the column extraction logic is checking if each row
is a dict, but the result.rows structure from TrinoExecutionResult contains
lists, not dicts. Fix the isinstance check to verify that preview_rows[0] is a
list instead of a dict. Additionally, update the test mock in conftest.py that
incorrectly defines rows as dicts to instead define rows as lists to match the
actual TrinoExecutionResult structure, which will ensure the column extraction
logic is properly tested with the correct data type.

Comment on lines 8 to +9
from agent.langfuse_client import langfuse_client
from agent.langfuse_client import langfuse_client

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Duplicate import statement.

langfuse_client is imported twice from the same module.

♻️ Remove duplicate import
 from agent.langfuse_client import langfuse_client
-from agent.langfuse_client import langfuse_client
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from agent.langfuse_client import langfuse_client
from agent.langfuse_client import langfuse_client
from agent.langfuse_client import langfuse_client
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/nodes/refiner.py` around lines 8 - 9, Remove the duplicate
import statement for langfuse_client from the agent.langfuse_client module in
the refiner.py file. Keep only a single import statement importing
langfuse_client from agent.langfuse_client and delete the redundant duplicate
line.

Comment on lines +43 to +48
rows = state.get("inline_result_rows") or []
columns: list[str] = []

# Attempt to derive column names from the first result row
if rows and isinstance(rows[0], dict):
columns = list(rows[0].keys())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Column extraction assumes dict rows but inline rows are lists.

Same issue as finalizer.py: the code assumes rows[0] is a dict (line 47-48), but inline_result_rows from refiner_node contains list rows per Trino's return format. The columns list will be empty, causing Check C and Check D to be skipped (since they guard on and columns).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/nodes/satisfaction_check.py` around lines 43 - 48, The column
extraction logic in the satisfaction_check function incorrectly assumes rows are
dictionaries when they are actually lists (as returned by Trino through
refiner_node). Instead of checking if rows[0] is a dict and calling .keys(), you
need to handle the case where rows are lists and extract column names from the
state's column metadata or adjust the logic to work with list-based rows. This
will ensure the columns list is properly populated so that Check C and Check D
are not skipped due to the empty columns guard condition in their conditional
statements.

Comment thread core/src/core/cache.py
Comment on lines +134 to +139
def get_cache_service(redis_url: str) -> CacheService:
"""Return a module-level singleton CacheService."""
global _cache_instance
if _cache_instance is None:
_cache_instance = CacheService(redis_url)
return _cache_instance

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not silently ignore redis_url after first singleton initialization.

At Line 134-139, get_cache_service(redis_url) always returns the first-created instance, even if a different redis_url is later supplied. That can silently route callers to the wrong Redis backend.

Suggested change
-_cache_instance: CacheService | None = None
+_cache_instances: dict[str, CacheService] = {}

 def get_cache_service(redis_url: str) -> CacheService:
-    """Return a module-level singleton CacheService."""
-    global _cache_instance
-    if _cache_instance is None:
-        _cache_instance = CacheService(redis_url)
-    return _cache_instance
+    """Return one CacheService singleton per Redis URL."""
+    if redis_url not in _cache_instances:
+        _cache_instances[redis_url] = CacheService(redis_url)
+    return _cache_instances[redis_url]
🧰 Tools
🪛 Ruff (0.15.17)

[warning] 136-136: Using the global statement to update _cache_instance is discouraged

(PLW0603)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/core/cache.py` around lines 134 - 139, The get_cache_service
function silently ignores mismatches between the provided redis_url argument and
the URL used to create the cached _cache_instance, which can route callers to an
incorrect Redis backend. Modify the function to detect when a different
redis_url is supplied than what the existing _cache_instance was initialized
with, and either raise an error to alert the caller of the mismatch or store the
redis_url and validate consistency on subsequent calls. This ensures that
unexpected URL changes are not silently ignored.

Comment thread core/src/core/config.py
Comment on lines +7 to +9
LANGFUSE_PUBLIC_KEY: str = ""
LANGFUSE_SECRET_KEY: str = ""
LANGFUSE_BASE_URL: str = "https://cloud.langfuse.com"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Align Langfuse host setting with actual runtime wiring.

At Line 7-9, LANGFUSE_BASE_URL is added, but core/src/core/langfuse.py does not consume these settings when creating the handler. This creates a non-functional config surface and drifts from deployment env naming (docker-compose.yml Line 431 uses LANGFUSE_HOST).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/core/config.py` around lines 7 - 9, The LANGFUSE_BASE_URL
configuration setting added in the config.py file is not being consumed by the
Langfuse handler creation in core/src/core/langfuse.py, making the config
setting non-functional. Update the Langfuse handler instantiation to read and
use the LANGFUSE_BASE_URL setting from the configuration instead of relying on
hardcoded defaults. Additionally, ensure the configuration property name aligns
with the environment variable naming convention used in the deployment setup
(docker-compose.yml uses LANGFUSE_HOST), so either rename LANGFUSE_BASE_URL to
LANGFUSE_HOST or ensure both naming conventions are properly mapped during
configuration initialization.

Comment thread core/src/core/langfuse.py
Comment on lines +11 to +12
except Exception as e:
logger.error(f"Failed to initialize Langfuse CallbackHandler: {e}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cd core && head -20 src/core/langfuse.py | cat -n

Repository: StavPonte11/text2sql-onboarding

Length of output: 596


🏁 Script executed:

cd core && sed -n '1,30p' src/core/langfuse.py | cat -n

Repository: StavPonte11/text2sql-onboarding

Length of output: 596


🏁 Script executed:

cd core && rg "logger\." src/core/langfuse.py -A 2 -B 2

Repository: StavPonte11/text2sql-onboarding

Length of output: 231


Log the full traceback when Langfuse handler initialization fails.

At line 12, only the exception message is logged. Using logger.error() with string formatting loses the full traceback context, making failures difficult to diagnose. Use logger.exception() instead, which automatically captures the complete stack trace.

Proposed fix
     except Exception as e:
-        logger.error(f"Failed to initialize Langfuse CallbackHandler: {e}")
+        logger.exception("Failed to initialize Langfuse CallbackHandler")
         return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
logger.error(f"Failed to initialize Langfuse CallbackHandler: {e}")
except Exception as e:
logger.exception("Failed to initialize Langfuse CallbackHandler")
🧰 Tools
🪛 Ruff (0.15.17)

[warning] 11-11: Do not catch blind exception: Exception

(BLE001)


[warning] 12-12: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


[warning] 12-12: Logging statement uses f-string

(G004)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/core/langfuse.py` around lines 11 - 12, The exception handler in the
Langfuse CallbackHandler initialization block currently uses logger.error() with
string formatting, which loses the full traceback context needed for debugging.
Replace the logger.error() call with logger.exception(), which automatically
captures and logs the complete stack trace along with the error message. This
change should be made in the except block that handles the Exception during
Langfuse handler initialization.

Source: Linters/SAST tools

Comment thread core/src/core/trino.py
Comment on lines +72 to +75
if params:
cur.execute(sql, params)
else:
cur.execute(sql)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use explicit None check for parameterized execution.

Line 72 uses if params:. Empty containers ([], ()) are valid according to the function's type hint (tuple | dict | list | None) but currently route to cur.execute(sql) instead of the parameterized path since empty sequences evaluate to False in Python.

Change to if params is not None: to ensure all non-None parameter values—including empty containers—are passed to the parameterized execute path.

Proposed fix
-        if params:
+        if params is not None:
             cur.execute(sql, params)
         else:
             cur.execute(sql)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if params:
cur.execute(sql, params)
else:
cur.execute(sql)
if params is not None:
cur.execute(sql, params)
else:
cur.execute(sql)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/core/trino.py` around lines 72 - 75, The conditional check `if
params:` at line 72 incorrectly treats empty containers (empty list, dict, or
tuple) as falsy values, causing them to bypass the parameterized execute path
even though they are valid parameter values according to the type hint. Replace
`if params:` with `if params is not None:` to explicitly check for None while
allowing empty containers to be passed to the parameterized cur.execute(sql,
params) call. This ensures all valid parameter types—including empty
sequences—are routed to the correct execution path.

Comment thread docker-compose.yml
secrets:
- deploy_key
# command: sh -c "python -m app.seed && python -m app.infra_init && uvicorn app.main:app --host 0.0.0.0 --port 8000"
command: sh -c "python -m app.seed && python -m app.infra_init && uvicorn app.main:app --host 0.0.0.0 --port 8000"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Decouple initialization jobs from API startup.

At Line 407, app.seed and app.infra_init are hard-coupled to backend boot. Any failure/latency in either job keeps uvicorn down, and every container restart re-runs initialization side effects. Move these to a one-shot init service/job (or guard them for strict idempotency) and keep backend command focused on serving traffic.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker-compose.yml` at line 407, The backend service command at line 407
couples initialization jobs (app.seed and app.infra_init) directly to the
uvicorn startup, causing the API server to fail if these jobs fail and
re-running side effects on every container restart. Create a separate one-shot
initialization service in the docker-compose.yml that runs only the app.seed and
app.infra_init commands independently, then modify the backend service command
to run only the uvicorn server (uvicorn app.main:app --host 0.0.0.0 --port
8000). Use docker-compose depends_on or proper service ordering to ensure the
init service completes before the backend service attempts to start, keeping the
backend focused solely on serving traffic.

@elirazpevz elirazpevz changed the base branch from main to stav/critic-bug-fixes June 21, 2026 16:29
Comment thread agent/src/agent/config.py
LANGFUSE_PROMPT_REJECTION_ROUTER: str = "text2sql/rejection_router"

# ── G2-01: Table Scoping ──────────────────────────────────────────────────
TABLE_SCOPING_MODE: Literal["strict", "hybrid"] = "hybrid"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be on the request level and not on global level. so you can pass the configuration as argument when invoking the agent

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

realized it acts as default - should be called default table scoping mode then

in graph.py inspects `satisfaction_failures` to decide the next node.
"""
# ── Global gate ───────────────────────────────────────────────────────────
if not settings.SATISFACTION_CHECK_ENABLED:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this happen in the router function?

Comment on lines +56 to +65
if settings.SATISFACTION_CHECK_PLAUSIBILITY:
n = len(rows)
if n < settings.SATISFACTION_MIN_ROWS:
failures.append(
f"[CHECK_B] Result returned {n} rows — below minimum {settings.SATISFACTION_MIN_ROWS}."
)
elif n > settings.SATISFACTION_MAX_ROWS:
failures.append(
f"[CHECK_B] Result returned {n} rows — exceeds maximum {settings.SATISFACTION_MAX_ROWS}."
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure this is an expected behavior?

Comment thread core/src/core/cache.py
logger = logging.getLogger(__name__)

# Fixed TTL for catalog pre-flight validation cache (not configurable)
CATALOG_VALID_TTL = 300

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to .env

Comment thread core/src/core/cache.py
Comment on lines +35 to +40
self._redis: aioredis.Redis = aioredis.from_url(
redis_url,
decode_responses=False, # return raw bytes so callers control decoding
socket_connect_timeout=2,
socket_timeout=2,
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the python-core-utils package, we have some redis stuff. we should use it for the connection, and its also worth considering wether to move some of the stuff here to there

Comment thread core/src/core/langfuse.py
return _langfuse_handler
"""FastAPI dependency to inject an isolated Langfuse CallbackHandler."""
try:
return CallbackHandler()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the env vars the default vars, allowing them to be removed?

_cache = get_cache_service(settings.REDIS_URL)

# G2-02 limits
MAX_SCHEMA_RETRIES = 3

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to .env


human_message = (
f"Question: {user_query}\n"
f"Query Enrichments (extra context for ambiguous terms): {json.dumps(enrichments)}"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

יש פה לא מעט דברים שנכנסים לפרומפט - אולי שווה לעשות עם פרומפט טמפלייט או לשמור חלק בלנגפיוז


class SemanticAnnotation(BaseModel):
table_column: str = Field(description="The full table_name.column_name identifier")
semantic_type: str = Field(description="Must be one of: id | timestamp | category | metric | text | geo | unknown")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enum for data validation?

col_list.append(f"{tname}.{col['name']} ({col.get('type', '?')})")

prompt_text = (
"Classify each column with one of: id | timestamp | category | metric | text | geo | unknown.\n"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should run it once in the profiling instead of in each invocation?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relevant to most of the functions on this file

Comment thread agent/src/agent/graph.py
Comment on lines +77 to +78
The human then calls graph.update_state() to inject corrected state and
clears sql_query / last_error / trino_error / escalated / escalation_reason.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure that what happens?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also im not sure clearing the state is the right call

Comment thread agent/src/agent/graph.py

memory = MemorySaver()
agent_graph = workflow.compile(checkpointer=memory)
agent_graph = workflow.compile(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be really nice to add a graph of the graph to the readme (so I can also see visually what it looks like)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants