Skip to content

feat: observability, trace viewer, and HITL enhancements#19

Open
stavp-star wants to merge 15 commits into
stav/configurational-feature-flagfrom
stav/observability-and-ux
Open

feat: observability, trace viewer, and HITL enhancements#19
stavp-star wants to merge 15 commits into
stav/configurational-feature-flagfrom
stav/observability-and-ux

Conversation

@stavp-star

@stavp-star stavp-star commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Implements inline agent graph visualization (React Flow), trace viewer (Langfuse integration), structured HITL categories with suggested fixes, and comprehensive Playwright testing.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added feature-flag management with execution modes (backend admin APIs + frontend Flags page).
    • Added real-time agent execution streaming, plus a trace viewer for completed runs.
    • Added automated quality gates with human-in-the-loop escalation.
    • Added “suggested fixes” for interrupted/rejected queries.
  • Improvements

    • Enhanced schema exploration with caching, scoping options, and better validation/retry behavior.
    • Improved execution/debug visibility with per-step execution paths and SQL/explanation handling.
  • Tests

    • Expanded backend/frontend tests, including Playwright coverage for agent scenarios.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 788e32fb-0934-4ace-b612-26cd909eed9b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds backend-managed feature flags and execution modes, extends agent runtime routing and node state, and introduces frontend admin and trace views.

Changes

Runtime flags, agent flow, and UI

Layer / File(s) Summary
Workspace and runtime foundations
.gitignore, agent/Dockerfile, agent/pyproject.toml, backend/Dockerfile, backend/pyproject.toml, core/pyproject.toml, backend/app/infra_init.py, docker-compose.yml, frontend/.env, frontend/index.html, frontend/package.json, frontend/playwright.config.ts, backend/app/config.py, backend/app/main.py, backend/app/routers/__init__.py, backend/tests/conftest.py, backend/app/routers/{admin_approval.py,admin_auth.py,audit.py,enrichment.py,feedback.py,questions.py,scopes.py}, backend/app/services/{auth.py,join_detection.py}, frontend/src/index.css
Workspace dependency resolution moves to local python-core-utils, Docker and Compose remove SSH deploy-key setup, startup and infra initialization read more environment variables, and several import-order/UI asset settings are updated.
Backend flag platform
backend/alembic/versions/f9a3d1c8e205_add_config_schema_flags.py, backend/app/config.py, backend/app/main.py, backend/app/routers/flags.py, backend/app/routers/health.py, backend/app/services/flag_service.py, core/src/core/{cache.py,config.py,langfuse.py,models/models.py,trino.py,__init__.py}, backend/tests/conftest.py
The backend adds config schema tables and seeded flag data, Redis-backed flag and execution-mode services, /flags APIs, health checks, startup validation, and shared cache/model support.
Agent runtime plumbing
agent/src/agent/{config.py,state.py,llm.py}, agent/src/agent/utils/{esca.py,flag_bridge.py,jeen_client.py,redis_publisher.py,skill_registry.py,schema_enrichment.py,sql.py}, agent/src/agent/nodes/{init_flags.py,init_skills.py,extractor.py,query_builder.py,finalizer.py,refiner.py,satisfaction_check.py,schema_explorer.py}, agent/tests/{conftest.py,test_cache_and_gates.py,test_isolation.py,test_resilience.py,test_routing.py}
Agent settings, runtime state, per-node LLM selection, flag resolution, skill loading, Redis event publishing, ESCA client lifecycle, schema enrichment, and node plumbing are added together with supporting tests.
Graph gates and schema flow
agent/src/agent/graph.py, agent/src/agent/nodes/{schema_explorer.py,refiner.py,satisfaction_check.py}, agent/src/agent/utils/schema_enrichment.py, agent/tests/*
The graph adds config validation, retry and escalation routing, schema enrichment, Redis-backed profile loading, satisfaction checks, richer refinement state, and tests for routing, cache behavior, and resilience.
API, MCP, and observability paths
agent/src/agent/{mcp_server.py,routers/chat.py}, backend/app/routers/{agent.py,evaluation.py,orchestration.py,publish.py,profiling.py}, backend/app/services/{evaluator.py,langfuse_client.py,llm_judge.py,auth.py,join_detection.py,trino_client.py}
Chat and MCP payloads expand with execution mode and skill inputs, backend endpoints add SSE streaming, trace lookup, and fix suggestions, and tracing code shifts to client-based Langfuse updates.
Frontend admin and trace UI
frontend/src/{App.tsx,api/flags.ts,api/agent.ts,components/AgentGraph.tsx,components/TraceTimeline.tsx,components/JsonTreeView.tsx,components/SchemaPlanDisplay.tsx,components/SchemaPlanDisplay.module.css,components/layout/Sidebar.tsx,config/constants.ts,pages/FlagsPage.tsx,pages/FlagsPage.css,pages/AgentTestingPage.tsx,pages/AgentTestingPage.module.css,pages/LandingPage.tsx,tests/components.test.tsx}, frontend/package.json, frontend/playwright.config.ts, frontend/tests/{agent-testing.spec.ts,real-agent.spec.ts}
The frontend adds an admin /flags route and page for flags and execution modes, execution-path and trace viewers on agent testing, rejection-category suggestions, and Playwright coverage.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant AgentTestingPage
  participant BackendAgentRouter
  participant MCPServer
  participant AgentGraph
  participant Redis
  participant Langfuse

  User->>AgentTestingPage: submit chat
  AgentTestingPage->>BackendAgentRouter: POST /api/agent/chat
  BackendAgentRouter->>MCPServer: chat_with_agent
  MCPServer->>AgentGraph: ainvoke(state, config)
  AgentGraph->>Redis: publish node events
  AgentGraph->>Langfuse: write trace metadata
  BackendAgentRouter-->>AgentTestingPage: thread_id and trace_id
  AgentTestingPage->>BackendAgentRouter: GET /api/agent/stream/{thread_id}
  BackendAgentRouter->>Redis: subscribe agent_stream:{thread_id}
  Redis-->>AgentTestingPage: execution events
  AgentTestingPage->>BackendAgentRouter: GET /api/agent/traces/{trace_id}
  BackendAgentRouter->>Langfuse: fetch trace timeline
  Langfuse-->>AgentTestingPage: spans and observations
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

🐇 I hopped through flags in tidy rows,
And traced the graph where logic flows.
A Redis whisper, a modal bright,
New modes now flicker into sight.
With twitching ears, I cheer this spree—
More paths, more hops, more clarity.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.85% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: observability, trace viewer, and HITL enhancements' directly summarizes the main changes: observability additions, trace viewing capabilities, and HITL improvements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch stav/observability-and-ux

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 45

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
backend/app/services/llm_judge.py (1)

129-137: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing validation for OPENAI_API_KEY may cause unclear errors.

The previous code had defensive handling for missing API keys. Now, if settings.OPENAI_API_KEY is None or empty, the request will fail with an opaque httpx/OpenAI error rather than a clear configuration error. This makes debugging harder in misconfigured environments.

Proposed fix
     api_key = settings.OPENAI_API_KEY
+    if not api_key:
+        logger.warning("OPENAI_API_KEY not configured - LLM judge unavailable")
+        return JudgeOutput(
+            table_selection_correctness=0.0,
+            sql_semantic_equivalence=0.0,
+            result_correctness=0.0,
+            hallucination_detected=False,
+            failure_type="configuration_error",
+            reasoning={"error": "OPENAI_API_KEY not configured"},
+            confidence_in_judgment=0.0,
+        )

     try:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/services/llm_judge.py` around lines 129 - 137, Add validation for
the OPENAI_API_KEY immediately after retrieving it with api_key =
settings.OPENAI_API_KEY. Check if the api_key is None or empty, and if so, raise
a meaningful configuration error with a clear message before proceeding to the
httpx.Client request. This ensures users get a clear and immediate error about
missing configuration rather than an opaque httpx or OpenAI API error later.
backend/app/services/langfuse_client.py (1)

100-103: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Stale docstring references removed method.

The docstring at lines 100-103 mentions get_prompt_as_langchain(name) but this method was removed. The class docstring at lines 6-8 also references this method.

Proposed fix

Update the docstrings to remove references to the removed method:

   LangfuseTracer      — Concrete Langfuse connection; mirrors the main app's
-                        LangfuseTracer (implements Connection, adds get_prompt
-                        and get_prompt_as_langchain).
+                        LangfuseTracer (implements Connection, adds get_prompt).

And at lines 100-103:

     Additional methods (present in the main app's LangfuseTracer):
         get_prompt(name)                  — fetch a prompt by name.
-        get_prompt_as_langchain(name)     — fetch and convert to a LangChain prompt.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/services/langfuse_client.py` around lines 100 - 103, Remove the
stale docstring references to the removed `get_prompt_as_langchain(name)` method
from the LangfuseTracer class. Update both the class docstring at the beginning
of the class (lines 6-8) and the additional methods docstring section (lines
100-103) to remove any mention of `get_prompt_as_langchain(name)` since this
method no longer exists in the class implementation. Keep only the reference to
`get_prompt(name)` which is still present.
backend/app/routers/orchestration.py (1)

652-654: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Incorrect SQL filter using Python not instead of SQL comparison.

not EvaluationAlert.acknowledged performs Python boolean negation on a SQLAlchemy Column object, which always evaluates to False. This query will never return unacknowledged alerts.

Proposed fix
     unacked_alerts = session.exec(
-        select(EvaluationAlert).where(not EvaluationAlert.acknowledged)
+        select(EvaluationAlert).where(EvaluationAlert.acknowledged == False)
     ).all()

Or use the more explicit SQLAlchemy pattern:

select(EvaluationAlert).where(EvaluationAlert.acknowledged.is_(False))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/orchestration.py` around lines 652 - 654, The query in
the unacked_alerts assignment is using Python's not operator on the
EvaluationAlert.acknowledged SQLAlchemy Column object, which always evaluates to
False and returns no results. Replace the not operator with a proper SQLAlchemy
comparison operator by using the tilde operator (~) to negate the condition, or
alternatively use the .is_(False) method on the acknowledged column to
explicitly check for False values. This will ensure the query correctly filters
for unacknowledged alerts.
frontend/src/pages/AgentTestingPage.tsx (2)

63-69: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reset newly added HITL UI state in handleReset.

rejectionCategory, suggestedFixes, and traceModalVisible persist across resets, causing stale reject context in the next request.

Suggested fix
   const handleReset = () => {
     setQuery('');
     setThreadId(null);
     setChatResponse(null);
     setFeedback('');
+    setRejectionCategory(undefined);
+    setSuggestedFixes([]);
+    setLoadingFixes(false);
+    setTraceModalVisible(false);
     chatMutation.reset();
   };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/AgentTestingPage.tsx` around lines 63 - 69, The
handleReset function is missing reset calls for newly added HITL state variables
that are causing stale data to persist across requests. Add reset calls within
the handleReset function for the following state setters: setRejectionCategory,
setSuggestedFixes, and setTraceModalVisible, resetting them to their initial
values (null/empty string for rejectionCategory, empty array or null for
suggestedFixes, and false for traceModalVisible) to ensure a clean state for the
next request.

36-44: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clear previous threadId when starting a fresh run.

A new query currently keeps the old threadId until response returns, so execution graph/trace context can momentarily display the previous run.

Suggested fix
   const handleSubmit = () => {
     if (!query) return;
     setChatResponse(null);
+    setThreadId(null);
     chatMutation.mutate({
       query,
       hitl_enabled: hitlEnabled,
       allowed_statuses: allowedStatuses.length > 0 ? allowedStatuses : undefined,
     });
   };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/AgentTestingPage.tsx` around lines 36 - 44, The
handleSubmit function clears the chatResponse but does not clear the threadId
state, causing the UI to display the previous run's execution graph/trace
context until the new response arrives. Add a state reset for threadId (set it
to null or undefined) in the handleSubmit function right after or alongside the
existing setChatResponse(null) call to ensure a fresh context when a new query
is submitted.
agent/src/agent/nodes/schema_explorer.py (1)

213-264: 🧹 Nitpick | 🔵 Trivial | ⚖️ Poor tradeoff

Synchronous database session in async tool.

get_table_profile is an async tool but uses synchronous Session(engine) for database operations. This blocks the event loop during DB I/O. Consider using AsyncSession with async_engine for better concurrency, especially since this tool is called concurrently for multiple tables.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/nodes/schema_explorer.py` around lines 213 - 264, The
get_table_profile function is async but uses synchronous Session(engine) which
blocks the event loop during database I/O operations. Replace the synchronous
Session(engine) with AsyncSession(async_engine) and convert the with statement
to async with. Update all session.get() and session.exec() calls to be awaited
since they will now return coroutines. This allows the function to properly
yield control to the event loop during database operations instead of blocking.
backend/app/services/evaluator.py (1)

145-151: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Malformed function call continuation.

The trace() call starting at line 146 has the opening parenthesis on 146 but parameters continue on 147-150 with inconsistent indentation. This is syntactically valid but confusing.

🔧 Proposed formatting fix
         if _lf_client.client and trace_id:
-            _lf_client.client.trace(id=trace_id,
-            input={
-                "query": question_obj.question,
-                "databases": [question_obj.table_id],
-            },
-        )
+            _lf_client.client.trace(
+                id=trace_id,
+                input={
+                    "query": question_obj.question,
+                    "databases": [question_obj.table_id],
+                },
+            )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/services/evaluator.py` around lines 145 - 151, The trace() method
call on the _lf_client.client object has inconsistent indentation where the
opening parenthesis is on one line followed by parameters on subsequent lines
that are not properly aligned. Reformat the trace() method call so that all
parameters (id and input) are indented consistently and aligned to clearly show
they belong within the method call, improving readability and following Python
formatting conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agent/src/agent/config.py`:
- Around line 61-68: Add validation constraints to the config settings in
agent/src/agent/config.py to enforce valid ranges at load time. Specifically,
add validators to ensure SATISFACTION_SEMANTIC_THRESHOLD is between 0.0 and 1.0
(inclusive), that SATISFACTION_MIN_ROWS is less than or equal to
SATISFACTION_MAX_ROWS, that both SATISFACTION_MAX_ROWS and SATISFACTION_MIN_ROWS
are positive integers, that SCHEMA_CACHE_TTL and PROFILE_CACHE_TTL are greater
than 0 (to prevent caching from being disabled), and that
SATISFACTION_MAX_FAILURES is a positive integer. These constraints should be
enforced at configuration load time to prevent invalid values from being
accepted.

In `@agent/src/agent/graph.py`:
- Around line 128-134: The early return statements in the category checking
block (when category equals "Wrong Tables", "Wrong Logic", or when feedback is
not provided) only set the feedback_route field, while the main return statement
at lines 150-157 includes additional state reset fields like sql_query,
schema_plan, raw_data_ref, trino_error, and execution_path. To fix this
inconsistency, modify all three early return statements to include the same
complete set of state reset fields that are present in the main return
statement, ensuring that state cleanup occurs regardless of which code path is
taken.

In `@agent/src/agent/llm.py`:
- Around line 52-55: The temperature parsing logic in the section where
temperature is set from flags can crash if the flag value is not a valid float
string (e.g., "low"). Wrap the float() conversion in a try-except block to catch
ValueError exceptions and safely fall back to 0.0 when the temperature value
cannot be parsed, ensuring node initialization completes without crashing on
malformed flag values.

In `@agent/src/agent/mcp_server.py`:
- Around line 141-179: The suggest_fixes function decorated with `@mcp.tool`() is
currently defined after the if __name__ == "__main__":
mcp.run(transport="stdio") block, which means the tool will never be registered
because mcp.run() blocks execution. Move the entire suggest_fixes function
definition, including its `@mcp.tool`() decorator and all its implementation, to
before the if __name__ == "__main__": block to ensure the tool is properly
registered with the MCP server before mcp.run() is called.
- Around line 174-179: The `suggest_fixes` async function is blocking the event
loop by calling `structured_llm.invoke(prompt)` synchronously instead of
awaiting an async version. Replace `structured_llm.invoke(prompt)` with `await
structured_llm.ainvoke(prompt)`. Additionally, the code assumes `res` is always
a Pydantic model with a `.suggestions` attribute, but it could be a dict
depending on the LLM integration. Update the response handling to check whether
`res` is a dict using a conditional check (like `isinstance(res, dict)`) and
access the suggestions accordingly using either `res.get("suggestions")` for
dicts or `res.suggestions` for Pydantic models.

In `@agent/src/agent/nodes/init_skills.py`:
- Around line 25-29: The runtime-flag coercion for skills_enabled, hot_reload,
and cache_ttl variables uses unsafe conversions that can crash on malformed
input (e.g., bool() incorrectly handles string "False" as truthy, int() can
raise ValueError on non-numeric strings). Wrap the conversion logic in a
try-except block to handle potential errors gracefully, and implement proper
string-to-boolean conversion that correctly interprets string representations
like "false"/"true" before applying bool() conversion. Provide fallback default
values when coercion fails to prevent node initialization failures.

In `@agent/src/agent/nodes/query_builder.py`:
- Around line 1-5: The SQL normalization logic in this file is manually
stripping markdown fences and semicolons instead of using the shared clean_sql()
utility function, which creates inconsistency across the codebase. Replace the
manual SQL string manipulation (the lines that call replace on markdown fences
and check for trailing semicolons) with a call to clean_sql() passing the
response content, and ensure clean_sql is imported from the appropriate
utilities module.

In `@agent/src/agent/nodes/refiner.py`:
- Around line 123-130: The success return path at lines 123-130 is missing the
execution_path field that is being populated and tracked throughout the
function, particularly in the failure path. Add execution_path to the return
dictionary in the success case to maintain consistency with execution tracing.
Include the execution_path variable in the returned dictionary alongside the
other fields like trino_error, last_error, raw_data_ref, esca_write_failed,
inline_result_rows, and error_history.
- Line 57: The asyncio.create_task() call for publish_node_event is not storing
the returned task reference, which means it can be garbage-collected before
completion and any exceptions will be silently lost. Store the returned task in
a variable or add it to a collection of background tasks to keep the reference
alive, or alternatively add a comment documenting that this is intentionally
fire-and-forget behavior since the observability event is non-critical.

In `@agent/src/agent/nodes/satisfaction_check.py`:
- Around line 98-100: The exception handlers for Check C and Check D (at the
noted line ranges) currently only log warnings and allow execution to continue,
which can result in satisfaction_failures=None and incorrect routing to the
finalizer node. Instead of just logging the warning in the except blocks, record
these as gate failures by setting satisfaction_failures to indicate the check
failed, ensuring that when enabled checks throw exceptions, they are treated as
failures rather than silently continuing.
- Around line 43-46: The asyncio.create_task() call in the satisfaction_check
node is unguarded and can produce unhandled task warnings, and the
publish_node_event function is being called even when thread_id is empty. Guard
the fire-and-forget task by wrapping it with proper exception handling to catch
any errors from publish_node_event, and add a conditional check to only call
asyncio.create_task() when thread_id is not empty to avoid unnecessary noise
from publishing with an empty thread_id.

In `@agent/src/agent/nodes/schema_explorer.py`:
- Around line 361-378: The cache_hit_count and cache_miss_count counters are
being incremented with non-atomic += operations inside the fetch_profile async
function while executing concurrently. To fix this, introduce an asyncio.Lock to
protect these counter updates. Create the lock at the appropriate scope level,
then acquire it with async with statement before performing the cache_hit_count
+= 1 and cache_miss_count += 1 operations to ensure atomic updates across
concurrent coroutines.
- Around line 575-581: The SQL query in the schema explorer node uses f-string
interpolation for the catalog identifier while using parameterized queries for
schema and table names, creating a potential security vulnerability. Refactor
the SQL query that selects from information_schema.tables to use fully
parameterized queries for all identifiers (catalog, schema, and table names)
instead of f-string interpolation for the catalog. If Trino's information_schema
does not support parameterized catalog names, add a code comment explaining this
limitation and why the current IDENT_RE validation approach is necessary for
security.
- Around line 316-318: The asyncio.create_task() call for publish_node_event
lacks exception handling, causing exceptions to be silently lost and potentially
triggering "Task exception was never retrieved" warnings. Add a done callback to
the task returned by asyncio.create_task() to properly handle any exceptions
that occur during execution, or alternatively refactor to use asyncio.TaskGroup
which provides built-in lifecycle management and exception handling for
concurrent tasks.
- Around line 597-600: The exception handler for Redis/Trino table verification
in schema_explorer.py is too aggressive — it adds all tables_used to the
hallucinated list whenever any exception occurs, which marks valid tables as
hallucinated during infrastructure failures. Instead of unconditionally
extending hallucinated with all tables_used when an exception is caught,
differentiate between verification failures and infrastructure failures: for
transient/infrastructure errors (like network issues), skip the hallucination
marking and just log the error without modifying the hallucinated list,
reserving the hallucination logic only for actual verification results that
indicate false tables.

In `@agent/src/agent/state.py`:
- Line 1: Tighten the type annotation for the scoping_mode field in the
TypedDict from str | None to use a Literal type that specifies only the actual
allowed values (such as "strict" and "hybrid") along with None. This change will
enforce the valid mode strings at the type level and prevent invalid mode values
from being propagated through the state contract.

In `@agent/src/agent/utils/redis_publisher.py`:
- Around line 9-14: The sync Redis client created in the get_sync_redis()
function lacks explicit network timeouts, causing potential stalls in
latency-sensitive graph node handlers. Modify the redis.from_url() call on line
13 to include explicit socket_connect_timeout and socket_timeout parameters with
appropriate short timeout values suitable for synchronous operations, ensuring
the blocking publish() call and downstream node handler invocations remain
responsive during network degradation.

In `@agent/src/agent/utils/schema_enrichment.py`:
- Around line 184-217: The synchronous Session created in the try block is held
open during potentially long-running operations in an async function, which can
cause concurrency issues. Refactor the code to limit the session scope to only
the database query operations (the select queries for ForeignKeyMapping,
CrossTableProfile, and Table to build id_to_name), then close the session
immediately after these queries complete. Move any subsequent long-running graph
computations outside of the session's lifecycle by extracting the queries into
their own scope or by closing the session before returning the collected data
for further processing.
- Around line 259-277: The BFS fallback path construction produces dictionaries
with only "from", "to", and "path" keys, while the networkx branch includes an
additional "joins" key with edge labels, causing inconsistent output structure.
To fix this, add an empty "joins" list to each path dictionary when appending to
the paths list in the fallback block. Modify the append statement in the nested
loop where _bfs_shortest_path is called to include "joins": [] alongside the
existing "from", "to", and "path" keys in the dictionary.

In `@agent/src/agent/utils/skill_registry.py`:
- Around line 38-46: The exception handler in the try-except block at lines
38-46 causes duplicate entries when a Redis error occurs after some skills have
already been successfully appended to loaded_skills. When an exception happens,
setting missing_ids to skill_ids means all skills will be refetched from Jeen as
a fallback, including those already loaded from cache. To fix this, either clear
loaded_skills before the fallback assignment to missing_ids, or modify the
fallback to only add to missing_ids the skill IDs that were not already
successfully loaded and appended to loaded_skills to prevent re-fetching cached
entries.

In `@agent/tests/test_isolation.py`:
- Around line 39-52: The test has two issues that prevent it from properly
validating the intended behavior. First, the patch target for evaluate_with_llm
is incorrect because after importing the function directly with `from core.llm
import evaluate_with_llm`, patching `core.llm.evaluate_with_llm` has no effect
on the local reference. Change the patch target to patch the function where it's
being called from, which is in the test module itself. Second, the silent `pass`
statement in the ImportError exception handler means the test will silently pass
without testing anything if the import fails, providing false confidence.
Replace the `pass` with `pytest.skip()` to properly mark the test as skipped
when the required module is unavailable.

In `@backend/app/infra_init.py`:
- Around line 28-30: The _MINIO_HOST environment variable defaults to
localhost:9000, which fails in Docker Compose because localhost resolves to the
backend container itself rather than the MinIO service. Change the default value
in the os.getenv call for _MINIO_HOST from localhost:9000 to minio:9000 (or the
appropriate MinIO service hostname used in your Docker Compose setup) so that
the backend can properly resolve and connect to the MinIO service during
container startup.
- Line 27: The import os statement is positioned after executable or module
setup code on line 27, which violates Ruff's E402 rule requiring all
module-level imports to be at the top of the file. Move the import os statement
to the top import block at the beginning of the file with the other import
statements to comply with the linting requirements.

In `@backend/app/main.py`:
- Around line 44-46: The OPENAI_API_KEY validation in the settings check is
raising a RuntimeError that prevents the entire application from starting.
Instead of raising an error that blocks all functionality, replace this with a
warning log message that allows the application to continue with degraded
functionality for features that don't require the LLM judge. Alternatively,
introduce a configuration flag or environment variable that controls whether
this missing key should be a hard failure or a warning, allowing flexibility for
different deployment scenarios where the LLM judge may not be required.

In `@backend/app/routers/agent.py`:
- Around line 232-245: The suggest_fixes async function references an undefined
helper function _get_mcp_client() which will cause a NameError at runtime.
Either define the _get_mcp_client() function that returns an MCP client session
context manager, or refactor the suggest_fixes endpoint to use the existing
_call_agent_mcp helper pattern that is already available in this file to
maintain consistency and avoid code duplication.
- Around line 163-184: The stream_agent_execution endpoint's event_generator
function has no timeout on the pubsub.listen() async iterator, which can cause
indefinite hangs if the agent completes without a termination message or if the
client disconnects ungracefully. Wrap the pubsub.listen() iteration with
asyncio.timeout (or asyncio.wait_for for older Python versions) to add a
timeout, and implement periodic keepalive messages (such as comment lines or
heartbeat events) sent at regular intervals to detect client disconnections and
prevent resource leaks. Handle timeout exceptions gracefully by breaking out of
the loop and ensuring cleanup occurs in the finally block.

In `@backend/app/routers/evaluation.py`:
- Around line 185-188: The langfuse_client.client.trace() method call has
inconsistent formatting with parameters awkwardly split across lines. Reformat
the trace() call so that all parameters are properly aligned and indented on
their own lines, with the closing parenthesis appropriately placed. Ensure the
id parameter and output parameter are consistently formatted with proper
indentation for readability.
- Around line 113-117: The trace() method call on langfuse_client.client has
misaligned formatting across multiple lines making it difficult to read.
Reformat the entire trace() call so that the opening parenthesis, all parameters
(id, metadata, tags), and closing parenthesis are properly aligned and clearly
show the complete function call. Ensure each parameter is on its own line or
grouped logically with consistent indentation so the multi-line call is
unambiguous and easier to maintain.

In `@backend/app/routers/flags.py`:
- Around line 85-90: The current implementation calls svc.list_all() to fetch
all flags and then iterates through them with next() to find a match by name,
which is inefficient since name is the primary key. Replace the list_all() call
and iteration pattern with a direct lookup: either use an existing direct lookup
method on FlagService (such as svc.get(name)) if available, or add a new
get(name) method to the FlagService class that performs a direct database query
by name, then call that method instead of the inefficient list and iterate
approach.

In `@backend/app/routers/health.py`:
- Around line 208-209: Remove the hardcoded string comparisons (api_key ==
"dummy" and base_url == "http://localhost:8000") from the condition in the
health check endpoint that returns the config_error status. Instead, validate
the configuration by checking if api_key and base_url are empty, None, or falsy
values. If you need to detect when ESCA is intentionally disabled in certain
environments, introduce an explicit ESCA_ENABLED configuration flag and use that
to determine whether to skip or fail the ESCA health check validation.

In `@backend/app/services/evaluator.py`:
- Around line 162-163: The trace method call on _lf_client.client has a
malformed call pattern with incorrect parameter usage. In the trace() method
invocation where it passes id=trace_id and output={"response": generated_sql},
verify the correct parameter names and structure expected by the trace method.
Fix the method call by using the correct parameter names and ensure proper
formatting with spaces after commas (PEP8 compliance) in the
_lf_client.client.trace() call on line 163.

In `@backend/app/services/flag_service.py`:
- Around line 200-213: The docstring for the delete method in FlagServiceBase
incorrectly states that the DB row is deleted, but the actual implementation
sets flag.value to None to preserve metadata. Update the docstring to accurately
describe the behavior by replacing the statement about deleting the DB row with
language that explains the method clears the flag value rather than deleting the
row, and ensure the docstring aligns with the inline comment explaining why the
row is preserved.
- Around line 238-245: The flag_overrides attribute is being accessed on the
mode object after the Session context manager closes, causing a detached
instance issue. Move the line that accesses mode.flag_overrides inside the with
Session block, before the session is closed. The _try_cache_set call and return
statement should remain outside the with block, but the attribute access on mode
must happen while the session is still active.
- Around line 162-167: The flag_map dictionary is being constructed outside the
Session context manager, which means it's accessing attributes on detached
SQLModel instances after the session has closed. Move the flag_map dictionary
comprehension line (where you build the dictionary from f.name and f.value)
inside the with Session(engine) context block, right after the flags are fetched
with session.exec(select(FeatureFlag)).all(). This ensures all attribute access
on the FeatureFlag instances happens while the session is still active and
prevents potential issues with lazy-loaded attributes being accessed on detached
instances.
- Around line 89-124: Replace the fragile async-to-sync event loop pattern in
the methods _try_cache_get, _try_cache_set, and _invalidate by switching from
redis.asyncio to a synchronous redis.Redis client instance. Remove all
asyncio.get_event_loop() and asyncio.new_event_loop() calls from these three
methods and replace them with direct synchronous method calls on the Redis
client (get, setex, delete). Additionally, replace all occurrences of
datetime.utcnow() throughout the file with datetime.now(timezone.utc) to address
the Python 3.12+ deprecation warning, and ensure you import timezone from the
datetime module.

In `@backend/pyproject.toml`:
- Line 13: The langfuse dependency in the dependencies list is specified as
`langfuse>=2.0.0`, which is too permissive and allows untested versions from 2.0
through 2.55.x. Replace this with a more restrictive version constraint that
pins to the tested baseline version (2.56.2) while still allowing patch-level
updates. Use a version specifier that restricts the range to 2.56.2 and higher
but excludes the next major version.

In `@core/src/core/cache.py`:
- Around line 131-139: The get_cache_service function accepts a redis_url
parameter but ignores it after the first initialization of _cache_instance,
which can route subsequent calls with different URLs to the wrong Redis
endpoint. Fix this by either removing the unused redis_url parameter if a single
global endpoint is intended, or by comparing the provided redis_url against a
stored value and reinitializing _cache_instance if the URL has changed.

In `@core/src/core/langfuse.py`:
- Around line 11-12: In the exception handler for Langfuse CallbackHandler
initialization (the except block at lines 11-12), replace the logger.error()
call with logger.exception(). The logger.exception() method automatically
captures and includes the full stack trace in the log output, whereas
logger.error() only logs the message string. Change the call from
logger.error(f"Failed to initialize Langfuse CallbackHandler: {e}") to
logger.exception(f"Failed to initialize Langfuse CallbackHandler: {e}") or
simply logger.exception("Failed to initialize Langfuse CallbackHandler") since
exception() automatically includes the error details.

In `@core/src/core/trino.py`:
- Around line 72-75: The condition checking if params is provided uses
truthiness (if params:), which incorrectly skips the parameterized execute path
when params is an empty tuple, list, or dict. Replace the truthiness check with
an explicit None check (if params is not None:) in the conditional statement
before the cur.execute calls to ensure parameterized execution is used for all
non-None params, including empty collections.

In `@frontend/src/api/agent.ts`:
- Around line 39-41: The suggestFixes function in the api/agent.ts file trusts
the response shape without validation, which could cause issues if the backend
returns data that isn't an array of strings. Add validation logic after
receiving the response to check that response.data is actually an array where
all elements are strings, and handle the failure case appropriately by either
throwing an error with a descriptive message or returning a safe default value
before returning the data to the UI consumer.

In `@frontend/src/api/flags.ts`:
- Around line 53-75: The `name` parameter is being interpolated directly into
URL paths in the update, reset, getMode, upsertMode, and deleteMode methods
without encoding, which allows reserved characters like forward slashes or query
operators to break the routing. Wrap each instance of the `name` parameter with
encodeURIComponent() before inserting it into the path strings (e.g.,
`/flags/${encodeURIComponent(name)}` and
`/flags/modes/${encodeURIComponent(name)}`).

In `@frontend/src/components/AgentGraph.tsx`:
- Around line 53-60: The setPath([]) call in the useEffect is nested inside a
condition that only executes when threadId is falsy, so it doesn't reset the
path state when threadId changes to a different non-empty value. This causes
execution steps from different threads to mix in the graph. Move the setPath([])
call to the very beginning of the useEffect, before the if condition that checks
for !threadId, so that the path is always cleared whenever threadId changes
regardless of whether it's becoming empty or switching to a different thread ID.
Keep the early return after setPath([]) for when threadId is falsy, and then
proceed with setting up the new EventSource.

In `@frontend/src/components/TraceTimeline.tsx`:
- Around line 29-43: The useEffect hook in TraceTimeline has two issues: first,
it doesn't handle race conditions where older requests complete after newer
ones, and second, it fails to reset state when traceId becomes falsy or changes.
Fix this by implementing a cleanup mechanism using a flag (like an isMounted or
isActive variable) that gets set to false in the cleanup function returned by
useEffect. Inside the then and catch handlers, only update state (setSpans and
setLoading) if this flag is true. Additionally, add a state reset at the
beginning of the effect when traceId changes—call setSpans([]) and
setLoading(false) to clear stale data before making the new request. This
ensures that cancelled or superseded requests cannot update the component's
state and that state is properly cleaned up when the trace changes.

In `@frontend/src/pages/AgentTestingPage.tsx`:
- Around line 242-249: The promise chain for agentApi.suggestFixes(threadId,
val) is missing explicit error handling. Add a .catch() handler after the
.then(setSuggestedFixes) to handle potential API failures. In the catch block,
handle the error appropriately by clearing the suggestions state
(setSuggestedFixes), logging the error, and optionally providing user feedback
about the failure. The .finally() block should remain to properly reset the
setLoadingFixes state regardless of success or failure.

In `@frontend/src/pages/FlagsPage.tsx`:
- Around line 422-436: The submitMode function does not validate that
modeDraft.name is not empty or whitespace before calling
upsertModeMutation.mutate, which can result in invalid API requests to the
backend. Add validation before the mutation call to check if modeDraft.name is
empty or contains only whitespace characters, and only proceed with the mutation
if the name is valid. Additionally, apply the same validation fix to the other
similar submission function mentioned at lines 561-567 to ensure consistency
across all mode submission flows.

---

Outside diff comments:
In `@agent/src/agent/nodes/schema_explorer.py`:
- Around line 213-264: The get_table_profile function is async but uses
synchronous Session(engine) which blocks the event loop during database I/O
operations. Replace the synchronous Session(engine) with
AsyncSession(async_engine) and convert the with statement to async with. Update
all session.get() and session.exec() calls to be awaited since they will now
return coroutines. This allows the function to properly yield control to the
event loop during database operations instead of blocking.

In `@backend/app/routers/orchestration.py`:
- Around line 652-654: The query in the unacked_alerts assignment is using
Python's not operator on the EvaluationAlert.acknowledged SQLAlchemy Column
object, which always evaluates to False and returns no results. Replace the not
operator with a proper SQLAlchemy comparison operator by using the tilde
operator (~) to negate the condition, or alternatively use the .is_(False)
method on the acknowledged column to explicitly check for False values. This
will ensure the query correctly filters for unacknowledged alerts.

In `@backend/app/services/evaluator.py`:
- Around line 145-151: The trace() method call on the _lf_client.client object
has inconsistent indentation where the opening parenthesis is on one line
followed by parameters on subsequent lines that are not properly aligned.
Reformat the trace() method call so that all parameters (id and input) are
indented consistently and aligned to clearly show they belong within the method
call, improving readability and following Python formatting conventions.

In `@backend/app/services/langfuse_client.py`:
- Around line 100-103: Remove the stale docstring references to the removed
`get_prompt_as_langchain(name)` method from the LangfuseTracer class. Update
both the class docstring at the beginning of the class (lines 6-8) and the
additional methods docstring section (lines 100-103) to remove any mention of
`get_prompt_as_langchain(name)` since this method no longer exists in the class
implementation. Keep only the reference to `get_prompt(name)` which is still
present.

In `@backend/app/services/llm_judge.py`:
- Around line 129-137: Add validation for the OPENAI_API_KEY immediately after
retrieving it with api_key = settings.OPENAI_API_KEY. Check if the api_key is
None or empty, and if so, raise a meaningful configuration error with a clear
message before proceeding to the httpx.Client request. This ensures users get a
clear and immediate error about missing configuration rather than an opaque
httpx or OpenAI API error later.

In `@frontend/src/pages/AgentTestingPage.tsx`:
- Around line 63-69: The handleReset function is missing reset calls for newly
added HITL state variables that are causing stale data to persist across
requests. Add reset calls within the handleReset function for the following
state setters: setRejectionCategory, setSuggestedFixes, and
setTraceModalVisible, resetting them to their initial values (null/empty string
for rejectionCategory, empty array or null for suggestedFixes, and false for
traceModalVisible) to ensure a clean state for the next request.
- Around line 36-44: The handleSubmit function clears the chatResponse but does
not clear the threadId state, causing the UI to display the previous run's
execution graph/trace context until the new response arrives. Add a state reset
for threadId (set it to null or undefined) in the handleSubmit function right
after or alongside the existing setChatResponse(null) call to ensure a fresh
context when a new query is submitted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 9265d5fd-147d-4b3d-ab56-08466dd9d51d

📥 Commits

Reviewing files that changed from the base of the PR and between 3a80177 and 2bf1999.

⛔ Files ignored due to path filters (3)
  • agent/uv.lock is excluded by !**/*.lock
  • backend/uv.lock is excluded by !**/*.lock
  • frontend/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (83)
  • .gitignore
  • agent/.coverage
  • agent/Dockerfile
  • agent/pyproject.toml
  • agent/src/agent/config.py
  • agent/src/agent/graph.py
  • agent/src/agent/llm.py
  • agent/src/agent/mcp_server.py
  • agent/src/agent/nodes/extractor.py
  • agent/src/agent/nodes/finalizer.py
  • agent/src/agent/nodes/init_flags.py
  • agent/src/agent/nodes/init_skills.py
  • agent/src/agent/nodes/query_builder.py
  • agent/src/agent/nodes/refiner.py
  • agent/src/agent/nodes/satisfaction_check.py
  • agent/src/agent/nodes/schema_explorer.py
  • agent/src/agent/routers/chat.py
  • agent/src/agent/state.py
  • agent/src/agent/utils/esca.py
  • agent/src/agent/utils/flag_bridge.py
  • agent/src/agent/utils/jeen_client.py
  • agent/src/agent/utils/redis_publisher.py
  • agent/src/agent/utils/schema_enrichment.py
  • agent/src/agent/utils/skill_registry.py
  • agent/src/agent/utils/sql.py
  • agent/tests/conftest.py
  • agent/tests/test_cache_and_gates.py
  • agent/tests/test_isolation.py
  • agent/tests/test_resilience.py
  • agent/tests/test_routing.py
  • backend/Dockerfile
  • backend/alembic/versions/f9a3d1c8e205_add_config_schema_flags.py
  • backend/app/config.py
  • backend/app/infra_init.py
  • backend/app/main.py
  • backend/app/routers/__init__.py
  • backend/app/routers/admin_approval.py
  • backend/app/routers/admin_auth.py
  • backend/app/routers/agent.py
  • backend/app/routers/audit.py
  • backend/app/routers/enrichment.py
  • backend/app/routers/evaluation.py
  • backend/app/routers/feedback.py
  • backend/app/routers/flags.py
  • backend/app/routers/health.py
  • backend/app/routers/orchestration.py
  • backend/app/routers/publish.py
  • backend/app/routers/questions.py
  • backend/app/routers/scopes.py
  • backend/app/services/auth.py
  • backend/app/services/evaluator.py
  • backend/app/services/flag_service.py
  • backend/app/services/join_detection.py
  • backend/app/services/langfuse_client.py
  • backend/app/services/llm_judge.py
  • backend/app/services/profiling_engine.py
  • backend/app/services/scoring.py
  • backend/app/services/trino_client.py
  • backend/pyproject.toml
  • backend/tests/conftest.py
  • core/pyproject.toml
  • core/src/core/__init__.py
  • core/src/core/cache.py
  • core/src/core/config.py
  • core/src/core/langfuse.py
  • core/src/core/models/models.py
  • core/src/core/trino.py
  • docker-compose.yml
  • frontend/.env
  • frontend/index.html
  • frontend/package.json
  • frontend/playwright.config.ts
  • frontend/src/App.tsx
  • frontend/src/api/agent.ts
  • frontend/src/api/flags.ts
  • frontend/src/components/AgentGraph.tsx
  • frontend/src/components/TraceTimeline.tsx
  • frontend/src/components/layout/Sidebar.tsx
  • frontend/src/config/constants.ts
  • frontend/src/pages/AgentTestingPage.tsx
  • frontend/src/pages/FlagsPage.css
  • frontend/src/pages/FlagsPage.tsx
  • frontend/tests/agent-testing.spec.ts
💤 Files with no reviewable changes (2)
  • backend/app/services/trino_client.py
  • backend/app/services/scoring.py

Comment thread agent/src/agent/config.py
Comment on lines +61 to +68
SATISFACTION_MIN_ROWS: int = 1
SATISFACTION_MAX_ROWS: int = 50_000
SATISFACTION_SEMANTIC_THRESHOLD: float = 0.75
SATISFACTION_MAX_FAILURES: int = 2 # escalate to HITL after this many check failures

# ── G2-05: Redis Schema Cache ─────────────────────────────────────────────
SCHEMA_CACHE_TTL: int = 600 # seconds — DDL content
PROFILE_CACHE_TTL: int = 1800 # seconds — table profile statistics

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Constrain satisfaction and cache numeric settings at config load time.

Line 61 through Line 68 currently accept invalid values (e.g., SATISFACTION_SEMANTIC_THRESHOLD=1.5, SCHEMA_CACHE_TTL=0, SATISFACTION_MAX_ROWS < SATISFACTION_MIN_ROWS), which can produce incorrect gating or effectively disable caching.

Suggested fix
-from pydantic import Field
+from pydantic import Field, model_validator
@@
-    SATISFACTION_MIN_ROWS: int = 1
-    SATISFACTION_MAX_ROWS: int = 50_000
-    SATISFACTION_SEMANTIC_THRESHOLD: float = 0.75
+    SATISFACTION_MIN_ROWS: int = Field(default=1, ge=0)
+    SATISFACTION_MAX_ROWS: int = Field(default=50_000, gt=0)
+    SATISFACTION_SEMANTIC_THRESHOLD: float = Field(default=0.75, ge=0.0, le=1.0)
@@
-    SCHEMA_CACHE_TTL: int = 600    # seconds — DDL content
-    PROFILE_CACHE_TTL: int = 1800  # seconds — table profile statistics
+    SCHEMA_CACHE_TTL: int = Field(default=600, gt=0)    # seconds — DDL content
+    PROFILE_CACHE_TTL: int = Field(default=1800, gt=0)  # seconds — table profile statistics
+
+    `@model_validator`(mode="after")
+    def _validate_satisfaction_bounds(self) -> "AgentSettings":
+        if self.SATISFACTION_MAX_ROWS < self.SATISFACTION_MIN_ROWS:
+            raise ValueError("SATISFACTION_MAX_ROWS must be >= SATISFACTION_MIN_ROWS")
+        return self
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/config.py` around lines 61 - 68, Add validation constraints
to the config settings in agent/src/agent/config.py to enforce valid ranges at
load time. Specifically, add validators to ensure
SATISFACTION_SEMANTIC_THRESHOLD is between 0.0 and 1.0 (inclusive), that
SATISFACTION_MIN_ROWS is less than or equal to SATISFACTION_MAX_ROWS, that both
SATISFACTION_MAX_ROWS and SATISFACTION_MIN_ROWS are positive integers, that
SCHEMA_CACHE_TTL and PROFILE_CACHE_TTL are greater than 0 (to prevent caching
from being disabled), and that SATISFACTION_MAX_FAILURES is a positive integer.
These constraints should be enforced at configuration load time to prevent
invalid values from being accepted.

Comment thread agent/src/agent/graph.py
Comment on lines +128 to 134
if category == "Wrong Tables":
return {"feedback_route": "schema_explorer"}
elif category == "Wrong Logic":
return {"feedback_route": "query_builder"}

if not feedback:
return {"feedback_route": "query_builder"}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Inconsistent return structure across code paths.

Early returns at lines 129, 131, and 134 only set feedback_route, while the main return at lines 150-157 includes additional state reset fields (sql_query, schema_plan, raw_data_ref, trino_error, execution_path). This inconsistency means category-based routing bypasses state cleanup, potentially leaving stale data in agent state.

Proposed fix to ensure consistent state reset
     if category == "Wrong Tables":
-        return {"feedback_route": "schema_explorer"}
+        return {
+            "feedback_route": "schema_explorer",
+            "sql_query": "",
+            "schema_plan": "",
+            "raw_data_ref": None,
+            "trino_error": None,
+            "execution_path": ["rejection_router"]
+        }
     elif category == "Wrong Logic":
-        return {"feedback_route": "query_builder"}
+        return {
+            "feedback_route": "query_builder",
+            "sql_query": "",
+            "schema_plan": "",
+            "raw_data_ref": None,
+            "trino_error": None,
+            "execution_path": ["rejection_router"]
+        }

     if not feedback:
-        return {"feedback_route": "query_builder"}
+        return {
+            "feedback_route": "query_builder",
+            "sql_query": "",
+            "schema_plan": "",
+            "raw_data_ref": None,
+            "trino_error": None,
+            "execution_path": ["rejection_router"]
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if category == "Wrong Tables":
return {"feedback_route": "schema_explorer"}
elif category == "Wrong Logic":
return {"feedback_route": "query_builder"}
if not feedback:
return {"feedback_route": "query_builder"}
if category == "Wrong Tables":
return {
"feedback_route": "schema_explorer",
"sql_query": "",
"schema_plan": "",
"raw_data_ref": None,
"trino_error": None,
"execution_path": ["rejection_router"]
}
elif category == "Wrong Logic":
return {
"feedback_route": "query_builder",
"sql_query": "",
"schema_plan": "",
"raw_data_ref": None,
"trino_error": None,
"execution_path": ["rejection_router"]
}
if not feedback:
return {
"feedback_route": "query_builder",
"sql_query": "",
"schema_plan": "",
"raw_data_ref": None,
"trino_error": None,
"execution_path": ["rejection_router"]
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/graph.py` around lines 128 - 134, The early return statements
in the category checking block (when category equals "Wrong Tables", "Wrong
Logic", or when feedback is not provided) only set the feedback_route field,
while the main return statement at lines 150-157 includes additional state reset
fields like sql_query, schema_plan, raw_data_ref, trino_error, and
execution_path. To fix this inconsistency, modify all three early return
statements to include the same complete set of state reset fields that are
present in the main return statement, ensuring that state cleanup occurs
regardless of which code path is taken.

Comment on lines 141 to +179
if __name__ == "__main__":
mcp.run()
mcp.run(transport="stdio")

@mcp.tool()
async def suggest_fixes(thread_id: str, category: str) -> str:
"""Generate quick fix suggestions during an interruption."""
from agent.llm import get_llm
from pydantic import BaseModel, Field

state_snapshot = await agent_graph.aget_state({"configurable": {"thread_id": thread_id}})
if not state_snapshot.values:
return "[]"

sql_query = state_snapshot.values.get("sql_query", "")
schema_plan = state_snapshot.values.get("schema_plan", "")
user_query = state_snapshot.values.get("user_query", "")
runtime_flags = state_snapshot.values.get("runtime_flags", {})

llm = get_llm(node="routing", runtime_flags=runtime_flags)

class Fixes(BaseModel):
suggestions: list[str] = Field(description="2-3 short, actionable suggested fixes (under 60 chars each)")

prompt = f"""
The user rejected the agent's Text2SQL output with category '{category}'.
User Query: {user_query}
Current SQL: {sql_query}
Current Plan: {schema_plan}

Provide 2-3 short, distinct button labels for the user to quickly apply a fix.
For example: "GROUP BY date instead of month", "Include cancelled orders", "Filter by region".
"""

structured_llm = llm.with_structured_output(Fixes, method="json_schema")
try:
res = structured_llm.invoke(prompt)
return json.dumps(res.suggestions)
except Exception as e:
return json.dumps([])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tool defined after mcp.run() will not be registered.

The suggest_fixes tool is decorated with @mcp.tool() but defined after the if __name__ == "__main__": mcp.run(...) block. When run as main, the tool definition is never reached because mcp.run() blocks. When imported as a module, it works, but the placement is confusing and error-prone.

🐛 Proposed fix — move tool before mcp.run
+@mcp.tool()
+async def suggest_fixes(thread_id: str, category: str) -> str:
+    """Generate quick fix suggestions during an interruption."""
+    from agent.llm import get_llm
+    from pydantic import BaseModel, Field
+    
+    state_snapshot = await agent_graph.aget_state({"configurable": {"thread_id": thread_id}})
+    if not state_snapshot.values:
+        return "[]"
+    
+    sql_query = state_snapshot.values.get("sql_query", "")
+    schema_plan = state_snapshot.values.get("schema_plan", "")
+    user_query = state_snapshot.values.get("user_query", "")
+    runtime_flags = state_snapshot.values.get("runtime_flags", {})
+    
+    llm = get_llm(node="routing", runtime_flags=runtime_flags)
+    
+    class Fixes(BaseModel):
+        suggestions: list[str] = Field(description="2-3 short, actionable suggested fixes (under 60 chars each)")
+        
+    prompt = f"""
+    The user rejected the agent's Text2SQL output with category '{category}'.
+    User Query: {user_query}
+    Current SQL: {sql_query}
+    Current Plan: {schema_plan}
+    
+    Provide 2-3 short, distinct button labels for the user to quickly apply a fix.
+    For example: "GROUP BY date instead of month", "Include cancelled orders", "Filter by region".
+    """
+    
+    structured_llm = llm.with_structured_output(Fixes, method="json_schema")
+    try:
+        res = await structured_llm.ainvoke(prompt)
+        return json.dumps(res.suggestions)
+    except Exception as e:
+        return json.dumps([])
+
+
 if __name__ == "__main__":
     mcp.run(transport="stdio")
-
-@mcp.tool()
-async def suggest_fixes(thread_id: str, category: str) -> str:
-    ...
🧰 Tools
🪛 ast-grep (0.43.0)

[info] 176-176: use jsonify instead of json.dumps for JSON output
Context: json.dumps(res.suggestions)
Note: Security best practice.

(use-jsonify)


[info] 178-178: use jsonify instead of json.dumps for JSON output
Context: json.dumps([])
Note: Security best practice.

(use-jsonify)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/mcp_server.py` around lines 141 - 179, The suggest_fixes
function decorated with `@mcp.tool`() is currently defined after the if __name__
== "__main__": mcp.run(transport="stdio") block, which means the tool will never
be registered because mcp.run() blocks execution. Move the entire suggest_fixes
function definition, including its `@mcp.tool`() decorator and all its
implementation, to before the if __name__ == "__main__": block to ensure the
tool is properly registered with the MCP server before mcp.run() is called.

Comment on lines +174 to +179
structured_llm = llm.with_structured_output(Fixes, method="json_schema")
try:
res = structured_llm.invoke(prompt)
return json.dumps(res.suggestions)
except Exception as e:
return json.dumps([])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Synchronous invoke() in async function blocks event loop.

suggest_fixes is an async function but calls structured_llm.invoke(prompt) synchronously. This blocks the event loop during the LLM call. Use await structured_llm.ainvoke(prompt) instead.

Also, similar to schema_enrichment.py, res might be a dict instead of a Pydantic model depending on the LLM integration.

🐛 Proposed fix
     structured_llm = llm.with_structured_output(Fixes, method="json_schema")
     try:
-        res = structured_llm.invoke(prompt)
-        return json.dumps(res.suggestions)
+        res = await structured_llm.ainvoke(prompt)
+        # Handle both Pydantic model and dict responses
+        suggestions = res.suggestions if hasattr(res, 'suggestions') else res.get('suggestions', [])
+        return json.dumps(suggestions)
     except Exception as e:
+        logger.warning(f"suggest_fixes failed: {e}")
         return json.dumps([])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
structured_llm = llm.with_structured_output(Fixes, method="json_schema")
try:
res = structured_llm.invoke(prompt)
return json.dumps(res.suggestions)
except Exception as e:
return json.dumps([])
structured_llm = llm.with_structured_output(Fixes, method="json_schema")
try:
res = await structured_llm.ainvoke(prompt)
# Handle both Pydantic model and dict responses
suggestions = res.suggestions if hasattr(res, 'suggestions') else res.get('suggestions', [])
return json.dumps(suggestions)
except Exception as e:
return json.dumps([])
🧰 Tools
🪛 ast-grep (0.43.0)

[info] 176-176: use jsonify instead of json.dumps for JSON output
Context: json.dumps(res.suggestions)
Note: Security best practice.

(use-jsonify)


[info] 178-178: use jsonify instead of json.dumps for JSON output
Context: json.dumps([])
Note: Security best practice.

(use-jsonify)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/mcp_server.py` around lines 174 - 179, The `suggest_fixes`
async function is blocking the event loop by calling
`structured_llm.invoke(prompt)` synchronously instead of awaiting an async
version. Replace `structured_llm.invoke(prompt)` with `await
structured_llm.ainvoke(prompt)`. Additionally, the code assumes `res` is always
a Pydantic model with a `.suggestions` attribute, but it could be a dict
depending on the LLM integration. Update the response handling to check whether
`res` is a dict using a conditional check (like `isinstance(res, dict)`) and
access the suggestions accordingly using either `res.get("suggestions")` for
dicts or `res.suggestions` for Pydantic models.

Comment thread agent/src/agent/nodes/refiner.py Outdated
result = None

thread_id = config.get("configurable", {}).get("thread_id", "") if config else ""
asyncio.create_task(publish_node_event(thread_id, "refiner"))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fire-and-forget task may be silently dropped.

asyncio.create_task() without storing the reference means the task can be garbage-collected before completion, and any exceptions will be lost. Consider storing the task or using a more robust pattern.

Proposed fix
-    asyncio.create_task(publish_node_event(thread_id, "refiner"))
+    # Store task to prevent GC and log exceptions
+    task = asyncio.create_task(publish_node_event(thread_id, "refiner"))
+    task.add_done_callback(lambda t: t.exception() if t.done() and not t.cancelled() else None)

Alternatively, if observability events are non-critical, document this is intentionally fire-and-forget.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/nodes/refiner.py` at line 57, The asyncio.create_task() call
for publish_node_event is not storing the returned task reference, which means
it can be garbage-collected before completion and any exceptions will be
silently lost. Store the returned task in a variable or add it to a collection
of background tasks to keep the reference alive, or alternatively add a comment
documenting that this is intentionally fire-and-forget behavior since the
observability event is non-critical.

Comment thread frontend/src/api/flags.ts
Comment on lines +53 to +75
update: (name: string, value: unknown): Promise<FeatureFlag> =>
fetchWithAdminEmail(`/flags/${name}`, {
method: 'PATCH',
body: JSON.stringify({ value }),
}),

reset: (name: string): Promise<null> =>
fetchWithAdminEmail(`/flags/${name}`, { method: 'DELETE' }),

// ── Execution Modes ───────────────────────────────────────────────────────
listModes: (): Promise<ExecutionMode[]> => fetchWithAdminEmail('/flags/modes/'),

getMode: (name: string): Promise<ExecutionMode> =>
fetchWithAdminEmail(`/flags/modes/${name}`),

upsertMode: (name: string, data: Partial<ExecutionMode>): Promise<ExecutionMode> =>
fetchWithAdminEmail(`/flags/modes/${name}`, {
method: 'PUT',
body: JSON.stringify(data),
}),

deleteMode: (name: string): Promise<null> =>
fetchWithAdminEmail(`/flags/modes/${name}`, { method: 'DELETE' }),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Encode dynamic path segments before building /flags/... URLs.

name is interpolated raw in multiple calls; reserved characters (e.g., /, ?, #) can break the route or target the wrong path.

Suggested fix
+const toPathSegment = (value: string) => encodeURIComponent(value);
+
 export const flagsApi = {
@@
   update: (name: string, value: unknown): Promise<FeatureFlag> =>
-    fetchWithAdminEmail(`/flags/${name}`, {
+    fetchWithAdminEmail(`/flags/${toPathSegment(name)}`, {
       method: 'PATCH',
       body: JSON.stringify({ value }),
     }),
@@
   reset: (name: string): Promise<null> =>
-    fetchWithAdminEmail(`/flags/${name}`, { method: 'DELETE' }),
+    fetchWithAdminEmail(`/flags/${toPathSegment(name)}`, { method: 'DELETE' }),
@@
   getMode: (name: string): Promise<ExecutionMode> =>
-    fetchWithAdminEmail(`/flags/modes/${name}`),
+    fetchWithAdminEmail(`/flags/modes/${toPathSegment(name)}`),
@@
   upsertMode: (name: string, data: Partial<ExecutionMode>): Promise<ExecutionMode> =>
-    fetchWithAdminEmail(`/flags/modes/${name}`, {
+    fetchWithAdminEmail(`/flags/modes/${toPathSegment(name)}`, {
       method: 'PUT',
       body: JSON.stringify(data),
     }),
@@
   deleteMode: (name: string): Promise<null> =>
-    fetchWithAdminEmail(`/flags/modes/${name}`, { method: 'DELETE' }),
+    fetchWithAdminEmail(`/flags/modes/${toPathSegment(name)}`, { method: 'DELETE' }),
 };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
update: (name: string, value: unknown): Promise<FeatureFlag> =>
fetchWithAdminEmail(`/flags/${name}`, {
method: 'PATCH',
body: JSON.stringify({ value }),
}),
reset: (name: string): Promise<null> =>
fetchWithAdminEmail(`/flags/${name}`, { method: 'DELETE' }),
// ── Execution Modes ───────────────────────────────────────────────────────
listModes: (): Promise<ExecutionMode[]> => fetchWithAdminEmail('/flags/modes/'),
getMode: (name: string): Promise<ExecutionMode> =>
fetchWithAdminEmail(`/flags/modes/${name}`),
upsertMode: (name: string, data: Partial<ExecutionMode>): Promise<ExecutionMode> =>
fetchWithAdminEmail(`/flags/modes/${name}`, {
method: 'PUT',
body: JSON.stringify(data),
}),
deleteMode: (name: string): Promise<null> =>
fetchWithAdminEmail(`/flags/modes/${name}`, { method: 'DELETE' }),
const toPathSegment = (value: string) => encodeURIComponent(value);
export const flagsApi = {
update: (name: string, value: unknown): Promise<FeatureFlag> =>
fetchWithAdminEmail(`/flags/${toPathSegment(name)}`, {
method: 'PATCH',
body: JSON.stringify({ value }),
}),
reset: (name: string): Promise<null> =>
fetchWithAdminEmail(`/flags/${toPathSegment(name)}`, { method: 'DELETE' }),
// ── Execution Modes ───────────────────────────────────────────────────────
listModes: (): Promise<ExecutionMode[]> => fetchWithAdminEmail('/flags/modes/'),
getMode: (name: string): Promise<ExecutionMode> =>
fetchWithAdminEmail(`/flags/modes/${toPathSegment(name)}`),
upsertMode: (name: string, data: Partial<ExecutionMode>): Promise<ExecutionMode> =>
fetchWithAdminEmail(`/flags/modes/${toPathSegment(name)}`, {
method: 'PUT',
body: JSON.stringify(data),
}),
deleteMode: (name: string): Promise<null> =>
fetchWithAdminEmail(`/flags/modes/${toPathSegment(name)}`, { method: 'DELETE' }),
};
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/api/flags.ts` around lines 53 - 75, The `name` parameter is
being interpolated directly into URL paths in the update, reset, getMode,
upsertMode, and deleteMode methods without encoding, which allows reserved
characters like forward slashes or query operators to break the routing. Wrap
each instance of the `name` parameter with encodeURIComponent() before inserting
it into the path strings (e.g., `/flags/${encodeURIComponent(name)}` and
`/flags/modes/${encodeURIComponent(name)}`).

Comment thread frontend/src/components/AgentGraph.tsx Outdated
Comment on lines +53 to +60
useEffect(() => {
if (!threadId) {
setPath([]);
return;
}

const eventSource = new EventSource(`/api/agent/stream/${threadId}`);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "AgentGraph.tsx" -type f

Repository: StavPonte11/text2sql-onboarding

Length of output: 116


🏁 Script executed:

cat -n frontend/src/components/AgentGraph.tsx

Repository: StavPonte11/text2sql-onboarding

Length of output: 4531


Reset execution path when switching to a different thread.

When threadId changes from one value to another, the useEffect runs again (line 79 dependency array). The cleanup function closes the previous EventSource, but the path state is not cleared. This causes new messages from the new thread to append to the old path data, mixing execution steps from two different runs in the graph.

Move setPath([]) to the beginning of the effect so it always resets when threadId changes, before setting up the new EventSource:

Suggested patch
   useEffect(() => {
+    // Always reset when the stream identity changes.
+    setPath([]);
+
     if (!threadId) {
-      setPath([]);
       return;
     }
 
     const eventSource = new EventSource(`/api/agent/stream/${threadId}`);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
useEffect(() => {
if (!threadId) {
setPath([]);
return;
}
const eventSource = new EventSource(`/api/agent/stream/${threadId}`);
useEffect(() => {
// Always reset when the stream identity changes.
setPath([]);
if (!threadId) {
return;
}
const eventSource = new EventSource(`/api/agent/stream/${threadId}`);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/components/AgentGraph.tsx` around lines 53 - 60, The setPath([])
call in the useEffect is nested inside a condition that only executes when
threadId is falsy, so it doesn't reset the path state when threadId changes to a
different non-empty value. This causes execution steps from different threads to
mix in the graph. Move the setPath([]) call to the very beginning of the
useEffect, before the if condition that checks for !threadId, so that the path
is always cleared whenever threadId changes regardless of whether it's becoming
empty or switching to a different thread ID. Keep the early return after
setPath([]) for when threadId is falsy, and then proceed with setting up the new
EventSource.

Comment on lines +29 to +43
useEffect(() => {
if (!traceId) return;

setLoading(true);
axios.get(`/api/agent/traces/${traceId}`)
.then(res => {
setSpans(res.data || []);
})
.catch(err => {
console.error("Failed to load trace", err);
})
.finally(() => {
setLoading(false);
});
}, [traceId]);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd -t f "TraceTimeline.tsx"

Repository: StavPonte11/text2sql-onboarding

Length of output: 117


🏁 Script executed:

cat -n frontend/src/components/TraceTimeline.tsx

Repository: StavPonte11/text2sql-onboarding

Length of output: 5024


🏁 Script executed:

# Check if there are integration tests or usage of this component
rg "TraceTimeline" --type tsx --type ts -B 2 -A 2

Repository: StavPonte11/text2sql-onboarding

Length of output: 103


🏁 Script executed:

# Search for similar axios patterns in the codebase to see if this is an endemic issue
rg "axios\.(get|post|put|delete)" --type tsx --type ts -B 3 -A 5 | head -100

Repository: StavPonte11/text2sql-onboarding

Length of output: 103


🏁 Script executed:

# Search for TraceTimeline usage to understand how traceId changes
rg "TraceTimeline" -t js -t jsx

Repository: StavPonte11/text2sql-onboarding

Length of output: 103


🏁 Script executed:

# Check if AbortController or cleanup patterns are used elsewhere for axios
rg "AbortController|useRef.*cancel|\.cancel\(\)" -t js -t jsx

Repository: StavPonte11/text2sql-onboarding

Length of output: 103


🏁 Script executed:

# Look for other useEffect patterns with axios in the codebase
rg "useEffect.*axios|axios.*useEffect" -B 2 -A 8 -t js -t jsx | head -80

Repository: StavPonte11/text2sql-onboarding

Length of output: 103


🏁 Script executed:

# Search for TraceTimeline usage without type filter
rg "TraceTimeline" -B 2 -A 2

Repository: StavPonte11/text2sql-onboarding

Length of output: 1417


🏁 Script executed:

# Look for other useEffect with axios patterns
rg "useEffect.*axios|axios.*then.*catch" -B 1 -A 8 | head -100

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


🏁 Script executed:

# Check if there are any cleanup patterns in the codebase
rg "return.*=>.*cancel|AbortController" -B 2 -A 2

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


🏁 Script executed:

# Look at parent component to understand traceId changes
rg -l "TraceTimeline" | grep -v node_modules

Repository: StavPonte11/text2sql-onboarding

Length of output: 157


🏁 Script executed:

# View the parent component usage
cat -n frontend/src/pages/AgentTestingPage.tsx | head -150

Repository: StavPonte11/text2sql-onboarding

Length of output: 6555


🏁 Script executed:

# Search for where threadId gets set in AgentTestingPage
rg "setThreadId|threadId.*=" frontend/src/pages/AgentTestingPage.tsx -B 2 -A 2

Repository: StavPonte11/text2sql-onboarding

Length of output: 984


Add cleanup to prevent stale responses and reset state when trace changes.

This effect has two functional correctness issues: (1) Race condition—older requests can complete after newer ones, overwriting current trace data; (2) State inconsistency—when traceId becomes empty, spans and loading are not reset, leaving stale or inconsistent UI state.

Implement a cleanup flag to abort updates for cancelled requests, and reset state when traceId changes or becomes falsy:

Suggested fix
   useEffect(() => {
-    if (!traceId) return;
+    let cancelled = false;
+    if (!traceId) {
+      setSpans([]);
+      setLoading(false);
+      return () => {
+        cancelled = true;
+      };
+    }
 
     setLoading(true);
+    setSpans([]);
     axios.get(`/api/agent/traces/${traceId}`)
       .then(res => {
-        setSpans(res.data || []);
+        if (!cancelled) setSpans(res.data || []);
       })
       .catch(err => {
-        console.error("Failed to load trace", err);
+        if (!cancelled) {
+          console.error("Failed to load trace", err);
+          setSpans([]);
+        }
       })
       .finally(() => {
-        setLoading(false);
+        if (!cancelled) setLoading(false);
       });
+    return () => {
+      cancelled = true;
+    };
   }, [traceId]);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
useEffect(() => {
if (!traceId) return;
setLoading(true);
axios.get(`/api/agent/traces/${traceId}`)
.then(res => {
setSpans(res.data || []);
})
.catch(err => {
console.error("Failed to load trace", err);
})
.finally(() => {
setLoading(false);
});
}, [traceId]);
useEffect(() => {
let cancelled = false;
if (!traceId) {
setSpans([]);
setLoading(false);
return () => {
cancelled = true;
};
}
setLoading(true);
setSpans([]);
axios.get(`/api/agent/traces/${traceId}`)
.then(res => {
if (!cancelled) setSpans(res.data || []);
})
.catch(err => {
if (!cancelled) {
console.error("Failed to load trace", err);
setSpans([]);
}
})
.finally(() => {
if (!cancelled) setLoading(false);
});
return () => {
cancelled = true;
};
}, [traceId]);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/components/TraceTimeline.tsx` around lines 29 - 43, The
useEffect hook in TraceTimeline has two issues: first, it doesn't handle race
conditions where older requests complete after newer ones, and second, it fails
to reset state when traceId becomes falsy or changes. Fix this by implementing a
cleanup mechanism using a flag (like an isMounted or isActive variable) that
gets set to false in the cleanup function returned by useEffect. Inside the then
and catch handlers, only update state (setSpans and setLoading) if this flag is
true. Additionally, add a state reset at the beginning of the effect when
traceId changes—call setSpans([]) and setLoading(false) to clear stale data
before making the new request. This ensures that cancelled or superseded
requests cannot update the component's state and that state is properly cleaned
up when the trace changes.

Comment thread frontend/src/pages/AgentTestingPage.tsx Outdated
Comment on lines +242 to +249
onChange={(val) => {
setRejectionCategory(val);
if (val && threadId) {
setLoadingFixes(true);
agentApi.suggestFixes(threadId, val)
.then(setSuggestedFixes)
.finally(() => setLoadingFixes(false));
} else {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle suggestFixes request failures explicitly.

The promise chain has no .catch(...); failures can leave stale suggestions and create unhandled async errors in the UI flow.

Suggested fix
                         agentApi.suggestFixes(threadId, val)
                           .then(setSuggestedFixes)
+                          .catch(() => setSuggestedFixes([]))
                           .finally(() => setLoadingFixes(false));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
onChange={(val) => {
setRejectionCategory(val);
if (val && threadId) {
setLoadingFixes(true);
agentApi.suggestFixes(threadId, val)
.then(setSuggestedFixes)
.finally(() => setLoadingFixes(false));
} else {
onChange={(val) => {
setRejectionCategory(val);
if (val && threadId) {
setLoadingFixes(true);
agentApi.suggestFixes(threadId, val)
.then(setSuggestedFixes)
.catch(() => setSuggestedFixes([]))
.finally(() => setLoadingFixes(false));
} else {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/AgentTestingPage.tsx` around lines 242 - 249, The promise
chain for agentApi.suggestFixes(threadId, val) is missing explicit error
handling. Add a .catch() handler after the .then(setSuggestedFixes) to handle
potential API failures. In the catch block, handle the error appropriately by
clearing the suggestions state (setSuggestedFixes), logging the error, and
optionally providing user feedback about the failure. The .finally() block
should remain to properly reset the setLoadingFixes state regardless of success
or failure.

Comment on lines +422 to +436
const submitMode = () => {
// Construct overrides object from overridesList
const overridesObj: Record<string, any> = {};
for (const item of overridesList) {
overridesObj[item.name] = item.value;
}

upsertModeMutation.mutate({
name: modeDraft.name,
data: {
description: modeDraft.description,
flag_overrides: overridesObj,
is_active: modeDraft.is_active
},
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent empty mode names before submitting.

At Line 430, create flow can send an empty/whitespace name, producing an invalid /flags/modes/{name} request and avoidable backend failure.

Suggested fix
   const submitMode = () => {
+    const normalizedName = modeDraft.name.trim();
+    if (!editingMode && !normalizedName) {
+      message.error('Mode name is required');
+      return;
+    }
+
     // Construct overrides object from overridesList
     const overridesObj: Record<string, any> = {};
@@
     upsertModeMutation.mutate({
-      name: modeDraft.name,
+      name: normalizedName,
       data: { 
         description: modeDraft.description, 
         flag_overrides: overridesObj, 
         is_active: modeDraft.is_active 
       },
     });
   };
@@
       <Modal
         title={editingMode ? `Edit Mode: ${editingMode.name}` : 'New Execution Mode'}
         open={modeModalOpen}
         onOk={submitMode}
         onCancel={() => setModeModalOpen(false)}
         okText="Save Mode"
         confirmLoading={upsertModeMutation.isPending}
+        okButtonProps={{ disabled: !editingMode && !modeDraft.name.trim() }}
         width={580}
       >

Also applies to: 561-567

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/FlagsPage.tsx` around lines 422 - 436, The submitMode
function does not validate that modeDraft.name is not empty or whitespace before
calling upsertModeMutation.mutate, which can result in invalid API requests to
the backend. Add validation before the mutation call to check if modeDraft.name
is empty or contains only whitespace characters, and only proceed with the
mutation if the name is valid. Additionally, apply the same validation fix to
the other similar submission function mentioned at lines 561-567 to ensure
consistency across all mode submission flows.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 47

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
agent/src/agent/nodes/schema_explorer.py (1)

253-357: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Do not hold the DB session open across Esca/cache awaits in get_table_profile.

The session opened at Line 253 remains active while awaiting cache I/O and Esca network calls. Under concurrent profile fetches, this can tie up DB connections and reduce availability.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/nodes/schema_explorer.py` around lines 253 - 357, The
database session opened at the beginning of the get_table_profile function
remains active during async operations (Esca client save_data call and cache
operations), which can exhaust DB connections under concurrent load. Reorganize
the code to execute all database queries first, including the EnrichmentVersion
fetch that currently happens after the Esca call, then close the session context
before performing any async I/O operations like the Esca client call and cache
set operations. This ensures the session is released immediately after all
synchronous database operations complete.
♻️ Duplicate comments (11)
agent/src/agent/nodes/schema_explorer.py (1)

658-661: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid marking all tables_used as hallucinated on infrastructure errors.

On Redis/Trino verification failures, this branch marks every table as hallucinated, which can trigger false retries and HITL escalation even when tables are valid.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/nodes/schema_explorer.py` around lines 658 - 661, The
exception handler in the Redis/Trino table verification block is marking all
tables in the tables_used list as hallucinated whenever any exception occurs,
regardless of whether it's an infrastructure error or actual verification
failure. This causes false positives that trigger unnecessary retries and
escalations. Remove the logic that extends the hallucinated list with all
tables_used in the except block when a general Exception is caught, or modify
the exception handling to only add tables to hallucinated when you can
definitively determine they are invalid based on specific verification errors,
not just infrastructure failures.
agent/tests/test_isolation.py (1)

39-52: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This test can pass without validating behavior (patch target + silent ImportError).

evaluate_with_llm is imported into local scope, so patching core.llm.evaluate_with_llm may not affect the call at Line 47. Also, except ImportError: pass turns a missing dependency into a false pass.

Proposed fix
-    try:
-        from core.llm import ConfigurationError, evaluate_with_llm
+    try:
+        import core.llm as llm_module
@@
-        with patch("core.llm.evaluate_with_llm", side_effect=mock_evaluate):
-            result = await evaluate_with_llm("test query", "SELECT 1")
+        with patch.object(llm_module, "evaluate_with_llm", side_effect=mock_evaluate):
+            result = await llm_module.evaluate_with_llm("test query", "SELECT 1")
             assert result["score"] is None
             assert result["error"] == "judge_unavailable"
     except ImportError:
-        # if core.llm doesn't exist, we skip or mock the specific node that uses it
-        pass
+        pytest.skip("core.llm module not available")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/tests/test_isolation.py` around lines 39 - 52, The patch target for
evaluate_with_llm is incorrect because the function is imported into the local
scope of the test, so patching core.llm.evaluate_with_llm will not affect the
actual call at line 47. Change the patch decorator or patch call to target the
module where evaluate_with_llm is used (the test module itself) rather than
where it's defined. Additionally, the except ImportError: pass block silently
ignores missing dependencies and allows the test to pass without actually
validating behavior. Replace the silent pass statement with an assertion failure
or skip directive so that import failures are properly reported instead of
hiding them.
agent/src/agent/nodes/satisfaction_check.py (1)

126-128: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Record Check C/D execution errors as gate failures (current behavior fails open).

When these checks throw, the node logs and continues; routing can incorrectly proceed as if checks passed. This concern was raised previously and is still present.

Proposed fix
         except Exception as exc:
             logger.warning("satisfaction_check Check C failed: %s", exc)
+            failures.append(f"[CHECK_C] Column coverage check failed to execute: {exc}")
@@
         except Exception as exc:
             logger.warning("satisfaction_check Check D failed: %s", exc)
+            failures.append(f"[CHECK_D] Semantic alignment check failed to execute: {exc}")

Also applies to: 147-149

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/nodes/satisfaction_check.py` around lines 126 - 128, The
exception handlers for Check C (lines 126-128) and Check D (lines 147-149)
currently log warnings and continue execution, allowing routing to proceed as if
the checks passed. Instead of simply logging and continuing when these
exceptions occur, mark the check results as failed (likely by setting the
corresponding check flag or variable to False) so that the gate failure is
properly recorded and prevents incorrect routing progression. This ensures
errors are treated as explicit gate failures rather than silent passes.
backend/app/infra_init.py (1)

27-27: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move import os into the top import block.

Line 27 keeps a module-level import after prior module statements, so Ruff E402 remains unresolved.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/infra_init.py` at line 27, The `import os` statement at line 27
violates Python import conventions by being placed after other module-level
statements, which Ruff flags as E402. Move the `import os` import to the top of
the file with the other import statements, before any non-import code or
module-level logic begins, to resolve this linting error.

Source: Linters/SAST tools

core/src/core/cache.py (1)

134-139: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Singleton cache factory can bind to the wrong Redis endpoint.

get_cache_service(redis_url) only honors the first URL; later calls with a different Redis URL reuse the old client and can cross-contaminate environments.

Suggested fix
 _cache_instance: CacheService | None = None
+_cache_instance_url: str | None = None
 
 
 def get_cache_service(redis_url: str) -> CacheService:
     """Return a module-level singleton CacheService."""
-    global _cache_instance
-    if _cache_instance is None:
+    global _cache_instance, _cache_instance_url
+    if _cache_instance is None or _cache_instance_url != redis_url:
         _cache_instance = CacheService(redis_url)
+        _cache_instance_url = redis_url
     return _cache_instance
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/core/cache.py` around lines 134 - 139, The singleton pattern in
get_cache_service does not validate the redis_url parameter against the URL used
when creating the cached instance, so different URLs passed in subsequent calls
are ignored. Modify the function to either verify that all calls use the same
redis_url as the initial creation (raising an error if they differ), or
implement logic to handle URL changes appropriately. This prevents the
CacheService from inadvertently connecting to the wrong Redis endpoint and
causing cross-contamination between environments.
backend/pyproject.toml (1)

13-13: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Constrain Langfuse to a tested minor range.

Line 13 (langfuse>=2.0.0) permits much older 2.x releases and can introduce runtime incompatibilities in tracing paths.

Suggested fix
-    "langfuse>=2.0.0",
+    "langfuse>=2.56.2,<3.0.0",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/pyproject.toml` at line 13, The langfuse dependency in the
pyproject.toml file is specified as "langfuse>=2.0.0", which permits any version
from 2.0.0 onwards and can introduce runtime incompatibilities. Update this
dependency constraint to specify a tested minor version range by adding an upper
bound, such as constraining it to a specific 2.x range (e.g.,
"langfuse>=2.0.0,<2.x.x" or similar) to prevent installation of untested
versions that could cause tracing path incompatibilities.
backend/app/main.py (1)

44-46: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not hard-fail entire API on missing OPENAI_API_KEY.

Lines 44-46 abort app startup, which takes down unrelated endpoints (including operational/admin routes) instead of degrading only judge-dependent functionality.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/main.py` around lines 44 - 46, Remove the RuntimeError raise in
the OPENAI_API_KEY validation block (currently at lines 44-46) that hard-fails
the entire application startup. Instead, log the error message and allow the
application to continue running. Consider setting a flag or configuration
variable to indicate that the LLM judge functionality is unavailable, so that
only judge-dependent endpoints can be disabled or degraded while other
operational and admin routes remain functional.
backend/app/services/flag_service.py (1)

89-124: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Stop bridging async Redis with ad-hoc event loops in sync handlers.

Lines 89–124 create/run event loops inside sync request paths (run_until_complete + new_event_loop()), which is fragile and can leak loop resources over time.

Suggested direction
-import redis.asyncio as aioredis
+import redis

-_REDIS: aioredis.Redis | None = None
+_REDIS: redis.Redis | None = None

-def _get_redis(redis_url: str) -> aioredis.Redis:
+def _get_redis(redis_url: str) -> redis.Redis:
     global _REDIS
     if _REDIS is None:
-        _REDIS = aioredis.from_url(
+        _REDIS = redis.from_url(
             redis_url,
             decode_responses=True,
             socket_connect_timeout=2,
             socket_timeout=2,
         )
     return _REDIS

Then replace run_until_complete(...) calls with direct sync get/setex/delete.

#!/bin/bash
# Confirm async-to-sync bridge patterns are still present
rg -n "run_until_complete|new_event_loop|get_event_loop|redis\.asyncio|aioredis" backend/app/services/flag_service.py
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/services/flag_service.py` around lines 89 - 124, The methods
_try_cache_get, _try_cache_set, and _invalidate are creating and managing event
loops to bridge async Redis calls in synchronous contexts, which causes resource
leaks. Replace the async Redis client with a synchronous Redis client that
supports blocking operations, then remove all the asyncio event loop creation
and run_until_complete logic from these three methods. Instead, call the
synchronous versions of the Redis methods directly (get, setex, and delete
respectively) without any event loop management.
core/src/core/trino.py (1)

72-75: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use an explicit None check for parameterized execution.

The truthiness check if params: will incorrectly skip parameterized execution when params is an empty tuple, list, or dict. This changes behavior unexpectedly and could cause SQL injection vulnerabilities if empty parameters are expected but the non-parameterized path is taken instead.

🔧 Proposed fix
         conn = get_trino_connection()
         cur = conn.cursor()
-        if params:
+        if params is not None:
             cur.execute(sql, params)
         else:
             cur.execute(sql)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/core/trino.py` around lines 72 - 75, The current code uses a
truthiness check `if params:` in the execute block which incorrectly treats
empty parameter containers (empty tuple, list, or dict) as falsy and skips
parameterized execution. Replace the condition `if params:` with an explicit
`None` check `if params is not None:` to ensure parameterized execution is
called whenever parameters are provided, even if they are empty collections.
This applies to both cur.execute() calls in the conditional block.
frontend/src/pages/FlagsPage.tsx (1)

422-437: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard against blank execution mode names before submit.

Create flow still allows empty/whitespace names and can emit invalid mode upsert requests.

Proposed fix
   const submitMode = () => {
+    const normalizedName = modeDraft.name.trim();
+    if (!editingMode && !normalizedName) {
+      message.error('Mode name is required');
+      return;
+    }
+
     // Construct overrides object from overridesList
     const overridesObj: Record<string, any> = {};
@@
     upsertModeMutation.mutate({
-      name: modeDraft.name,
+      name: editingMode ? modeDraft.name : normalizedName,
       data: { 
         description: modeDraft.description, 
         flag_overrides: overridesObj, 
         is_active: modeDraft.is_active 
       },
     });
   };
@@
       <Modal
@@
         confirmLoading={upsertModeMutation.isPending}
+        okButtonProps={{ disabled: !editingMode && !modeDraft.name.trim() }}
         width={580}
       >

Also applies to: 561-567

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/FlagsPage.tsx` around lines 422 - 437, The submitMode
function allows submission of mode data with blank or whitespace-only names
because there is no validation before calling upsertModeMutation.mutate(). Add a
guard check at the beginning of the submitMode function to validate that
modeDraft.name is not empty and not just whitespace. If the name is invalid,
either return early or display an appropriate error message to the user before
proceeding with the mutation call.
frontend/src/api/flags.ts (1)

53-75: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Encode dynamic path segments before building /flags/... URLs.

name is interpolated raw; reserved characters can target the wrong endpoint or fail requests.

Proposed fix
+const toPathSegment = (value: string) => encodeURIComponent(value);
+
 export const flagsApi = {
@@
   update: (name: string, value: unknown): Promise<FeatureFlag> =>
-    fetchWithAdminEmail(`/flags/${name}`, {
+    fetchWithAdminEmail(`/flags/${toPathSegment(name)}`, {
       method: 'PATCH',
       body: JSON.stringify({ value }),
     }),
@@
   reset: (name: string): Promise<null> =>
-    fetchWithAdminEmail(`/flags/${name}`, { method: 'DELETE' }),
+    fetchWithAdminEmail(`/flags/${toPathSegment(name)}`, { method: 'DELETE' }),
@@
   getMode: (name: string): Promise<ExecutionMode> =>
-    fetchWithAdminEmail(`/flags/modes/${name}`),
+    fetchWithAdminEmail(`/flags/modes/${toPathSegment(name)}`),
@@
   upsertMode: (name: string, data: Partial<ExecutionMode>): Promise<ExecutionMode> =>
-    fetchWithAdminEmail(`/flags/modes/${name}`, {
+    fetchWithAdminEmail(`/flags/modes/${toPathSegment(name)}`, {
       method: 'PUT',
       body: JSON.stringify(data),
     }),
@@
   deleteMode: (name: string): Promise<null> =>
-    fetchWithAdminEmail(`/flags/modes/${name}`, { method: 'DELETE' }),
+    fetchWithAdminEmail(`/flags/modes/${toPathSegment(name)}`, { method: 'DELETE' }),
 };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/api/flags.ts` around lines 53 - 75, The name parameter is being
interpolated directly into URL paths in the update, reset, getMode, upsertMode,
and deleteMode methods without proper encoding. Reserved or special characters
in the name could break the URL structure or route requests incorrectly. Wrap
each instance of the name variable with encodeURIComponent() when constructing
the URL path segments in all these methods to ensure the name is properly
URL-encoded before being used in the fetchWithAdminEmail calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agent/Dockerfile`:
- Around line 16-17: The uv sync command on line 17 lacks the --frozen flag,
which allows dependency re-resolution and can result in nondeterministic builds.
Add the --frozen flag to the existing uv sync --no-dev command to enforce the
lockfile without re-locking, ensuring reproducible container builds.

In `@agent/pyproject.toml`:
- Around line 24-27: Remove the untracked local path dependency for
python-core-utils from the [tool.uv.sources] section in agent/pyproject.toml
(the line declaring python-core-utils = { path = "../python-core-utils" }). Also
locate and remove the identical python-core-utils path dependency declaration
from backend/pyproject.toml at line 104. These local path references point to
non-existent directories that aren't tracked in git and will cause dependency
resolution failures in CI and clean-clone environments.

In `@agent/src/agent/llm.py`:
- Around line 48-55: The model and temperature resolution logic bypasses
per-node environment variable defaults by immediately falling back to generic
defaults. In the code block where model is resolved (around line 49), after
checking the runtime flags dictionary with model_flag, you need to check for a
per-node environment variable before falling back to settings.LLM_MODEL.
Similarly, in the temperature resolution (around line 54), after checking the
temp_flag in the flags dictionary, check for a per-node environment variable
before falling back to 0.0. Implement per-node environment variable mappings
(similar to the existing _NODE_MODEL_FLAGS and _NODE_TEMP_FLAGS dictionaries)
and use os.environ.get() to retrieve those environment values in the resolution
chain, ensuring the priority order is: runtime flags → per-node environment
variables → generic defaults.

In `@agent/src/agent/mcp_server.py`:
- Around line 43-49: The `langfuse_handler` variable is not initialized before
the try-except block that contains the guarded import, which means if an
exception is raised during the import or function call to
get_langfuse_handler(), the variable will never be assigned. This causes an
UnboundLocalError when `langfuse_handler` is later referenced. Initialize
`langfuse_handler` to None before the try block (before the attempt to import
and call get_langfuse_handler) so that the variable is always defined regardless
of whether the try block succeeds or fails.

In `@agent/src/agent/nodes/extractor.py`:
- Around line 53-56: Add a guard check after calling
langfuse_client.get_prompt(settings.LANGFUSE_PROMPT_EXTRACTOR) to verify that
langfuse_prompt is not None before attempting to call get_langchain_prompt() on
it. The upstream contract allows get_prompt() to return None, so this null check
is required to prevent an AttributeError during extractor initialization. If
langfuse_prompt is None, raise an informative error that indicates the prompt
could not be retrieved from Langfuse.

In `@agent/src/agent/nodes/finalizer.py`:
- Around line 12-13: The `get_llm("finalizer")` call at module import time
creates a static LLM instance that cannot adapt to per-invocation
`runtime_flags`, unlike other nodes in the codebase. Move the
`get_llm("finalizer")` call from the module level into the function or method
that actually uses the `llm` variable, so it is created fresh on each invocation
and can respect the current `runtime_flags`. Apply the same fix to the other
locations mentioned around lines 101-104 where similar module-level LLM
initialization occurs.

In `@agent/src/agent/nodes/init_skills.py`:
- Line 10: The module-level _skill_registry object is shared across all
concurrent invocations, and mutating _skill_registry.redis with
get_redis_client() on each function call creates race conditions. Either
initialize the SkillRegistry with the Redis client once at module load time
(outside of any function), or refactor the code to create a new SkillRegistry
instance per invocation instead of reusing the shared _skill_registry. Avoid
reassigning _skill_registry.redis inside functions that may run concurrently.

In `@agent/src/agent/nodes/query_builder.py`:
- Line 36: The `import re` statement is currently located inside a function,
which causes it to be executed on every function invocation. Move the `import
re` statement to the module level at the top of the file with the other standard
library imports. This eliminates unnecessary import overhead and follows Python
best practices for organizing imports at module level.
- Line 40: The assignment to the `explanation` variable in query_builder.py
checks for both the correctly spelled `"reasoning_content"` and a typo variant
`"reasonig_content"` (missing 'n'). Either add a comment above this line
explaining that the typo variant is intentionally included to handle a known API
quirk, or remove the `or response.additional_kwargs.get("reasonig_content")`
portion if the typo handling is unnecessary. This will clarify the code's intent
and maintain consistency with standard practices.

In `@agent/src/agent/nodes/refiner.py`:
- Around line 105-107: Move the json.dumps and encode operations (which
serialize payload_data into payload) inside the try block that handles ESCA
write operations. Currently, json.dumps is called outside the try block at line
106, which means serialization errors on non-JSON-native Trino values like
decimals or datetimes will not be caught by the ESCA failure fallback path.
Relocate the payload creation (json.dumps and encode operations) to occur after
the start of the try block that wraps the ESCA write logic, so that any JSON
serialization failures trigger the same error handling and fallback mechanism as
other ESCA write failures. Apply this fix to all occurrences mentioned in the
range 111-123 as well.
- Around line 71-72: The refiner node is overwriting the execution_path with a
new list containing only ["refiner"] instead of appending to it. At all return
statements in the refiner function (around lines 71-72, 101-102, and 131-132
where "execution_path": ["refiner"] appears), change the assignment to append
"refiner" to the existing execution_path variable that was read from the input,
preserving the prior node history in the execution path.
- Around line 74-77: The langfuse_client.get_prompt() call may return None, and
the code immediately calls .get_langchain_prompt() on the result without
validating it first. Add a guard check after the
langfuse_client.get_prompt(settings.LANGFUSE_PROMPT_REFINER) assignment to
verify that langfuse_prompt is not None before proceeding to build the
ChatPromptTemplate.from_messages() with the prompt. If langfuse_prompt is None,
handle the error appropriately by either raising an informative exception or
logging an error message.

In `@agent/src/agent/nodes/schema_explorer.py`:
- Around line 377-380: The bool() conversion in lines 377-380 incorrectly parses
string values from runtime_flags - any non-empty string including "false" or "0"
will evaluate to True, unintentionally enabling disabled features. Replace the
bool() conversions for schema_semantic_typing, schema_join_graph,
schema_summarization, and schema_ambiguity_detect with proper boolean string
parsing that checks if the value equals "true" or "1" (case-insensitive),
ensuring that explicit "false" or "0" values correctly disable the features as
intended.
- Around line 502-511: The Schema Summarization phase (Phase C) is marked as
active but does not actually invoke the run_schema_summarization function to
perform LLM-based summarization. Instead of only extracting existing
descriptions from profile_details, you need to call the run_schema_summarization
function with the profile_details to generate actual summaries, then use the
returned summaries to build the profiles_json_str. This ensures the
SCHEMA_SUMMARIZATION phase performs its intended LLM summarization instead of
just reusing existing description fields.

In `@agent/src/agent/utils/flag_bridge.py`:
- Around line 132-133: The overrides dictionary obtained from
mode_data.get("flag_overrides") is being merged into resolved without filtering
None values, which can erase valid defaults and cause downstream type casting
failures. Filter out None values from the overrides dictionary before calling
resolved.update(overrides) on line 133, similar to how the DB overlay filtering
is done at lines 111-113. This ensures that null/None override values do not
overwrite existing defaults.

In `@agent/src/agent/utils/redis_publisher.py`:
- Around line 9-14: The sync Redis client created in the get_sync_redis function
using redis.from_url lacks socket and connection timeout configurations, which
can cause the client to block indefinitely during network issues. Add socket
timeout and connection timeout parameters to the redis.from_url call to specify
reasonable timeout values (e.g., socket_timeout and connect_timeout parameters),
ensuring that publish_node_event_sync and other callers cannot block
indefinitely and maintain pipeline responsiveness.

In `@agent/src/agent/utils/schema_enrichment.py`:
- Around line 373-376: The code at line 374-375 in the result handling assumes
that `result` from `ainvoke` is always an AmbiguityOutput Pydantic model object
and accesses `result.ambiguity_notes` directly via attribute access. However,
`with_structured_output()` can return either a dict or a Pydantic model, and
other similar functions in this file like `run_semantic_typing` and
`run_schema_summarization` already handle both cases. Add a type check after the
`ainvoke` call to handle when `result` is a dict by extracting the
ambiguity_notes using dictionary access (e.g., `result.get("ambiguity_notes",
...)`), matching the dict-handling pattern already implemented in the other
structured-output functions in this file.

In `@agent/src/agent/utils/skill_registry.py`:
- Around line 58-60: In the skill caching loop where skills are being iterated
from the fetched collection, the code directly accesses skill['id'] which will
raise a KeyError if the id field is missing from any skill object. Replace the
direct dictionary access to skill['id'] with a defensive approach using the
.get() method to safely retrieve the id, and either skip caching that skill if
id is missing or handle it appropriately to prevent the entire cache write
operation from failing and losing all fetched skills.

In `@agent/tests/conftest.py`:
- Around line 121-129: The patch in the mock_trino fixture targets the
definition site core.trino.execute_query_sync, but this approach does not
intercept calls through local import aliases created by from ... import
statements in agent.nodes.refiner, agent.nodes.finalizer, and
agent.nodes.schema_explorer. Change the patch target to
agent.nodes.refiner.execute_query_sync to patch the local binding in the module
where execute_query_sync is actually called, ensuring the mock correctly
intercepts all invocations regardless of how the function was imported.

In `@backend/app/config.py`:
- Line 9: The OPENAI_API_KEY field in the config class needs to use SecretStr
type instead of str | None to prevent accidental exposure in logs and repr
output. Import SecretStr from pydantic at the top of the file, then change the
type annotation of OPENAI_API_KEY from str | None to SecretStr | None. Apply the
same change to other sensitive credential fields: LANGFUSE_SECRET_KEY,
OPENMETADATA_TOKEN, and TRINO_PASSWORD by updating their type annotations from
str | None to SecretStr | None.

In `@backend/app/infra_init.py`:
- Line 552: The Iceberg JDBC bootstrap is incorrectly using the app's
DATABASE_URL environment variable, which may be configured to use SQLite but
requires a Postgres DSN for metadata tables. This causes initialization to fail
in mixed setups. Instead of relying on the shared DATABASE_URL, create a
separate environment variable specifically for the Iceberg metadata database
connection string (for example, ICEBERG_METADATA_URL or similar), providing a
Postgres-specific default that is independent from the app's primary database
configuration. Update the line that reads os.getenv("DATABASE_URL", ...) to use
this new dedicated environment variable instead.

In `@backend/app/routers/__init__.py`:
- Line 12: The `__all__` list in the `__init__.py` file contains module
references as bare names instead of string literals. Fix this by converting each
module reference in the `__all__` list to a string literal. Wrap each of the
module names (tables, enrichment, questions, evaluation, flags, publish, scopes,
audit) in quotes so they are defined as strings rather than variable references,
which aligns with Python conventions and allows proper usage with wildcard
imports.

In `@backend/app/routers/agent.py`:
- Line 75: The `asynccontextmanager` import from contextlib is violating E402
module-level import ordering by being placed at line 75 instead of at the top of
the file with other imports. Move the `from contextlib import
asynccontextmanager` statement to the top of the module with the other import
statements to comply with import ordering standards and avoid lint failures.
- Around line 240-241: The httpx AsyncClient get request in the Langfuse HTTP
call is missing an explicit timeout parameter, currently relying on the default
5.0 second timeout. Add a timeout parameter to the client.get() method call
(where it currently has url and auth parameters) with an appropriate timeout
value to ensure the external API request fails gracefully and predictably if
network delays or service degradation occurs.

In `@backend/app/routers/flags.py`:
- Around line 70-76: The get_flag_map endpoint and similar flag/mode read
endpoints (mentioned in lines 135-143 and 146-155) lack authentication
requirements, exposing internal runtime configuration to unauthenticated
callers. Add an authentication/authorization dependency check to these endpoints
using the Depends() pattern to ensure only authenticated or properly authorized
internal services can access them. Review the function signatures for
get_flag_map and any other flag retrieval handlers to add the appropriate
authentication guard dependency that validates the caller's identity or
authorization level before allowing access to operational flags and
configuration metadata.
- Around line 79-90: The `/{name}` route in the `get_flag` function shadows more
specific routes because FastAPI evaluates routes in registration order. Move the
`/modes/` and `/modes/map` route handlers (currently appearing after line 90) to
be registered before the `get_flag` function to ensure the more specific
`/modes/` paths are matched before the generic `/{name}` parameter path. This
prevents requests to `/flags/modes/` from being incorrectly captured by the
`get_flag` handler with `name="modes"`.

In `@backend/app/routers/profiling.py`:
- Around line 48-55: The current implementation has a race condition where
multiple concurrent workers can query the same existing EnrichmentVersion for a
given table_id, read the same version number, and calculate the same
next_version value before any of them write. To fix this, add database-level
locking to the select query that retrieves the existing EnrichmentVersion.
Modify the select statement that queries for EnrichmentVersion by table_id to
include pessimistic locking using SQLAlchemy's with_for_update() method,
ensuring that only one worker acquires and increments the version at a time.
This makes the read-then-increment operation atomic at the database level,
preventing concurrent workers from selecting the same next_version.
- Around line 57-63: The current logic at line 57 cannot distinguish between
human-authored table descriptions and AI-generated ones after the initial
promotion. Once an ai_summary is promoted to table_description on the first run,
all subsequent runs treat it as human-authored (because it's non-empty) and only
update ai_summary, leaving table_description permanently stale. Add a metadata
field or flag to track whether table_description is AI-generated or
human-authored, then update the has_human_description check to only consider
truly human-authored descriptions, allowing AI summaries to be re-promoted in
future runs when actual human annotations are absent.

In `@backend/app/services/llm_judge.py`:
- Line 129: The api_key assignment at line 129 directly retrieves
settings.OPENAI_API_KEY without validating it exists first, removing the
fail-fast guard and allowing the code to proceed to remote calls even when
misconfigured. Add a validation check immediately after the api_key assignment
that verifies the key is not None or empty, raising an appropriate exception
(such as ValueError or a configuration error) if the key is missing. This will
restore the early short-circuit behavior and prevent unnecessary latency and
failures on each judge invocation.

In `@backend/app/services/profiling_engine.py`:
- Around line 1086-1090: The code at lines 1086-1090 is embedding actual
database values directly into the prompt sent to the external summarization LLM,
which poses a data leakage risk for sensitive information. Instead of including
the actual values in the parts list through c.top_values and
c.min_value/c.max_value, sanitize this data by either omitting the actual values
entirely, using placeholder metadata (like just stating a value exists without
the value itself), or implementing a redaction mechanism that masks sensitive
content while preserving column metadata context needed for summarization.
- Around line 1097-1103: Fix two Ruff linting errors in the summary helper
function. First, replace the ambiguous single-letter variable name `l` in the
list comprehension (on the line with col_lines) with a more descriptive name
like `line` to resolve the E741 error. Second, split the multi-import statement
on line 1103 (the line with `import httpx, json as _json`) into two separate
import statements, one for `httpx` and one for `json as _json`, to resolve the
E401 error.

In `@backend/Dockerfile`:
- Line 4: The COPY instruction using `ghcr.io/astral-sh/uv:latest` is not pinned
to a specific version, which can lead to non-reproducible builds when the image
is updated. Replace the `:latest` tag with an explicit version number (e.g.,
`0.1.0`) or a SHA256 digest to ensure deterministic and auditable builds across
all environments.
- Around line 17-18: The RUN command executing `uv sync --no-dev
--no-install-project` at line 18 does not enforce reproducible dependency
resolution. Add the `--locked` flag to the uv sync command to ensure the
lockfile is strictly enforced and will fail if it is out of sync with
pyproject.toml, preventing unintended dependency drift in Docker builds.

In `@core/pyproject.toml`:
- Around line 18-19: The python-core-utils dependency in the [tool.uv.sources]
section of core/pyproject.toml is configured to use a local path reference
(../python-core-utils) that does not exist in the repository. To resolve this,
you must either: restore the original Git URL dependency for python-core-utils
if it was recently changed, verify and update the relative path to point to the
actual location where the python-core-utils package exists in your repository,
or create the python-core-utils directory at the path ../python-core-utils with
the appropriate package structure. Choose the appropriate solution based on your
project structure and then update the path value in the [tool.uv.sources]
section accordingly.

In `@core/src/core/models/models.py`:
- Around line 854-857: The ExecutionModeUpsert class currently provides concrete
default values for flag_overrides and is_active, which causes partial update
requests that omit these fields to silently reset them to their defaults during
upsert operations. Change the default values for the flag_overrides field and
the is_active field in ExecutionModeUpsert to use None instead of providing
concrete defaults (empty dict and True respectively), and make these fields
Optional to clearly indicate they may not be provided in a partial update. This
allows the upsert service logic to distinguish between explicitly set values and
omitted fields, preventing unintended data loss when partial updates are
processed.

In `@docker-compose.yml`:
- Line 407: The command on line 407 chains seed and infra_init bootstrap
operations with the uvicorn API startup in a single command string, causing
these one-time operations to re-run on every container start and preventing API
startup if any bootstrap step fails. Separate these concerns by creating a
dedicated initialization service in the docker-compose.yml that runs only the
seed and infra_init commands, then modify the main backend service command to
run only the uvicorn command while configuring it to depend on the
initialization service completing first. This way bootstrap operations run once
during initial setup and the API service startup remains independent from
transient bootstrap failures.

In `@frontend/package.json`:
- Line 55: Remove the `@types/uuid` dependency from the package.json file since
the uuid package version 14.0.0 already includes built-in TypeScript type
definitions. Simply delete the line containing "`@types/uuid`": "^11.0.0" to
eliminate the redundant dependency and avoid potential type conflicts.

In `@frontend/playwright.config.ts`:
- Line 12: The baseURL property in the playwright.config.ts file is hardcoded to
'http://localhost:3000', which causes test runs to fail when the application is
served on a different port or environment. Replace the hardcoded string value
with a dynamic value that reads from the environment variable
PLAYWRIGHT_BASE_URL (or similar) using process.env, and provide a fallback to
'http://localhost:3000' if the environment variable is not set. This allows the
base URL to be configurable without modifying the configuration file.

In `@frontend/src/components/SchemaPlanDisplay.module.css`:
- Around line 28-44: The `:global()` pseudo-class selectors used in the
`.dataTable` rules (such as `:global(.ant-table)`, `:global(.ant-table-thead >
tr > th)`, `:global(.ant-table-tbody > tr > td)`, and `:global(.ant-table-tbody
> tr:hover > td)`) are triggering stylelint's `selector-pseudo-class-no-unknown`
error. To fix this, add a stylelint disable comment at the top of the
SchemaPlanDisplay.module.css file to suppress this specific rule for CSS Modules
global selectors, or alternatively update your project's stylelint configuration
to recognize and allow the `:global()` pseudo-class which is a standard CSS
Modules feature.

In `@frontend/src/components/SchemaPlanDisplay.tsx`:
- Around line 141-146: The render function in the value column uses the ||
operator to provide a fallback empty string, which incorrectly masks valid falsy
values like 0 and false. Replace the current logic that does const val =
record.value || record.val || '' with an explicit null/undefined check, such as
checking if record.value is not null and not undefined before using it,
otherwise fall back to record.val with the same explicit check, and finally to
an empty string. Apply this same fix to the similar code section around lines
215-233 that also uses || operators for rendering limit and filter values.

In `@frontend/src/index.css`:
- Line 1: The `@import` statement at the top of the file violates stylelint rules
regarding URL formatting and font keyword casing. Change the `@import` url()
syntax to use quotes instead (either `@import` 'url' or `@import` "url"), and verify
that all font-related keywords like the display parameter are using correct
casing according to your stylelint configuration. Apply the same fixes to the
similar font import declarations on lines 15-17 to resolve all lint violations
in the font import block.

In `@frontend/src/pages/AgentTestingPage.module.css`:
- Around line 358-359: The CSS custom property `--text-muted` is referenced in
multiple locations (lines 358-359, 419-420, 697, and 766) but is not defined in
the root theme variables, causing unpredictable fallback behavior. Update all
color declarations using var(--text-muted) to include a fallback value using the
CSS syntax var(--text-muted, <fallback-color>), or define the --text-muted
variable in your root theme configuration to ensure consistent color rendering
across the application.
- Around line 172-173: The keyframes animation names in the CSS file are
currently in camelCase format (such as spinLoader), which violates the stylelint
keyframes-name-pattern rule. Rename all keyframe definitions and their
corresponding animation property references to use kebab-case naming convention
(e.g., spinLoader should become spin-loader). This includes updating both the
`@keyframes` declaration and all animation property values that reference these
keyframes throughout the file, including the spinLoader animation and all other
keyframe animations mentioned in the related lines.

In `@frontend/src/pages/AgentTestingPage.tsx`:
- Around line 793-832: The useEffect hook with the fetchTrace function does not
clean up pending timers or cancel in-flight axios requests, which can cause
state updates after component unmount or when traceId changes. Add an
AbortController for the axios request to cancel in-flight requests, store the
timeoutId returned from setTimeout calls so they can be cleared, and return a
cleanup function from the useEffect that calls controller.abort() and clears any
pending timeouts using clearTimeout. Pass the abort signal from the controller
to the axios.get request configuration.

In `@frontend/tests/agent-testing.spec.ts`:
- Around line 83-86: In the agent-testing.spec.ts test file, the reject button
test currently only validates visibility with `toBeVisible()` but the comment
explicitly states the button should be disabled initially. After the existing
`await expect(rejectBtn).toBeVisible();` assertion for the rejectBtn variable,
add an additional assertion using `await expect(rejectBtn).toBeDisabled();` to
actually validate that the button is disabled as intended, ensuring the HITL
guard protection is properly tested.
- Around line 40-51: The test currently makes the View Full Trace button
verification optional by wrapping it in an if statement checking isVisible().
This allows the test to pass even when the button is missing for a completed
response with trace_id, weakening regression coverage. Remove the if condition
that wraps the viewTraceBtn verification and instead use await
expect(viewTraceBtn).toBeVisible() to assert the button must be present and
visible before proceeding with the click and modal verification steps.

In `@frontend/tests/real-agent.spec.ts`:
- Line 6: The page.goto() call in the test uses a hardcoded URL string
'http://localhost:3001/agent-testing' which makes the test environment-specific
and brittle. Replace the hardcoded localhost URL with a relative path that
Playwright can resolve using its configured base URL from the Playwright
configuration. This allows the test to work across different test environments
without modification.

---

Outside diff comments:
In `@agent/src/agent/nodes/schema_explorer.py`:
- Around line 253-357: The database session opened at the beginning of the
get_table_profile function remains active during async operations (Esca client
save_data call and cache operations), which can exhaust DB connections under
concurrent load. Reorganize the code to execute all database queries first,
including the EnrichmentVersion fetch that currently happens after the Esca
call, then close the session context before performing any async I/O operations
like the Esca client call and cache set operations. This ensures the session is
released immediately after all synchronous database operations complete.

---

Duplicate comments:
In `@agent/src/agent/nodes/satisfaction_check.py`:
- Around line 126-128: The exception handlers for Check C (lines 126-128) and
Check D (lines 147-149) currently log warnings and continue execution, allowing
routing to proceed as if the checks passed. Instead of simply logging and
continuing when these exceptions occur, mark the check results as failed (likely
by setting the corresponding check flag or variable to False) so that the gate
failure is properly recorded and prevents incorrect routing progression. This
ensures errors are treated as explicit gate failures rather than silent passes.

In `@agent/src/agent/nodes/schema_explorer.py`:
- Around line 658-661: The exception handler in the Redis/Trino table
verification block is marking all tables in the tables_used list as hallucinated
whenever any exception occurs, regardless of whether it's an infrastructure
error or actual verification failure. This causes false positives that trigger
unnecessary retries and escalations. Remove the logic that extends the
hallucinated list with all tables_used in the except block when a general
Exception is caught, or modify the exception handling to only add tables to
hallucinated when you can definitively determine they are invalid based on
specific verification errors, not just infrastructure failures.

In `@agent/tests/test_isolation.py`:
- Around line 39-52: The patch target for evaluate_with_llm is incorrect because
the function is imported into the local scope of the test, so patching
core.llm.evaluate_with_llm will not affect the actual call at line 47. Change
the patch decorator or patch call to target the module where evaluate_with_llm
is used (the test module itself) rather than where it's defined. Additionally,
the except ImportError: pass block silently ignores missing dependencies and
allows the test to pass without actually validating behavior. Replace the silent
pass statement with an assertion failure or skip directive so that import
failures are properly reported instead of hiding them.

In `@backend/app/infra_init.py`:
- Line 27: The `import os` statement at line 27 violates Python import
conventions by being placed after other module-level statements, which Ruff
flags as E402. Move the `import os` import to the top of the file with the other
import statements, before any non-import code or module-level logic begins, to
resolve this linting error.

In `@backend/app/main.py`:
- Around line 44-46: Remove the RuntimeError raise in the OPENAI_API_KEY
validation block (currently at lines 44-46) that hard-fails the entire
application startup. Instead, log the error message and allow the application to
continue running. Consider setting a flag or configuration variable to indicate
that the LLM judge functionality is unavailable, so that only judge-dependent
endpoints can be disabled or degraded while other operational and admin routes
remain functional.

In `@backend/app/services/flag_service.py`:
- Around line 89-124: The methods _try_cache_get, _try_cache_set, and
_invalidate are creating and managing event loops to bridge async Redis calls in
synchronous contexts, which causes resource leaks. Replace the async Redis
client with a synchronous Redis client that supports blocking operations, then
remove all the asyncio event loop creation and run_until_complete logic from
these three methods. Instead, call the synchronous versions of the Redis methods
directly (get, setex, and delete respectively) without any event loop
management.

In `@backend/pyproject.toml`:
- Line 13: The langfuse dependency in the pyproject.toml file is specified as
"langfuse>=2.0.0", which permits any version from 2.0.0 onwards and can
introduce runtime incompatibilities. Update this dependency constraint to
specify a tested minor version range by adding an upper bound, such as
constraining it to a specific 2.x range (e.g., "langfuse>=2.0.0,<2.x.x" or
similar) to prevent installation of untested versions that could cause tracing
path incompatibilities.

In `@core/src/core/cache.py`:
- Around line 134-139: The singleton pattern in get_cache_service does not
validate the redis_url parameter against the URL used when creating the cached
instance, so different URLs passed in subsequent calls are ignored. Modify the
function to either verify that all calls use the same redis_url as the initial
creation (raising an error if they differ), or implement logic to handle URL
changes appropriately. This prevents the CacheService from inadvertently
connecting to the wrong Redis endpoint and causing cross-contamination between
environments.

In `@core/src/core/trino.py`:
- Around line 72-75: The current code uses a truthiness check `if params:` in
the execute block which incorrectly treats empty parameter containers (empty
tuple, list, or dict) as falsy and skips parameterized execution. Replace the
condition `if params:` with an explicit `None` check `if params is not None:` to
ensure parameterized execution is called whenever parameters are provided, even
if they are empty collections. This applies to both cur.execute() calls in the
conditional block.

In `@frontend/src/api/flags.ts`:
- Around line 53-75: The name parameter is being interpolated directly into URL
paths in the update, reset, getMode, upsertMode, and deleteMode methods without
proper encoding. Reserved or special characters in the name could break the URL
structure or route requests incorrectly. Wrap each instance of the name variable
with encodeURIComponent() when constructing the URL path segments in all these
methods to ensure the name is properly URL-encoded before being used in the
fetchWithAdminEmail calls.

In `@frontend/src/pages/FlagsPage.tsx`:
- Around line 422-437: The submitMode function allows submission of mode data
with blank or whitespace-only names because there is no validation before
calling upsertModeMutation.mutate(). Add a guard check at the beginning of the
submitMode function to validate that modeDraft.name is not empty and not just
whitespace. If the name is invalid, either return early or display an
appropriate error message to the user before proceeding with the mutation call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 1ea76503-d121-4912-96c9-dae6ea649700

📥 Commits

Reviewing files that changed from the base of the PR and between 3a80177 and 861e1e1.

⛔ Files ignored due to path filters (3)
  • agent/uv.lock is excluded by !**/*.lock
  • backend/uv.lock is excluded by !**/*.lock
  • frontend/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (92)
  • .gitignore
  • agent/.coverage
  • agent/Dockerfile
  • agent/pyproject.toml
  • agent/src/agent/config.py
  • agent/src/agent/graph.py
  • agent/src/agent/llm.py
  • agent/src/agent/mcp_server.py
  • agent/src/agent/nodes/extractor.py
  • agent/src/agent/nodes/finalizer.py
  • agent/src/agent/nodes/init_flags.py
  • agent/src/agent/nodes/init_skills.py
  • agent/src/agent/nodes/query_builder.py
  • agent/src/agent/nodes/refiner.py
  • agent/src/agent/nodes/satisfaction_check.py
  • agent/src/agent/nodes/schema_explorer.py
  • agent/src/agent/routers/chat.py
  • agent/src/agent/state.py
  • agent/src/agent/utils/esca.py
  • agent/src/agent/utils/flag_bridge.py
  • agent/src/agent/utils/jeen_client.py
  • agent/src/agent/utils/redis_publisher.py
  • agent/src/agent/utils/schema_enrichment.py
  • agent/src/agent/utils/skill_registry.py
  • agent/src/agent/utils/sql.py
  • agent/tests/conftest.py
  • agent/tests/test_cache_and_gates.py
  • agent/tests/test_isolation.py
  • agent/tests/test_resilience.py
  • agent/tests/test_routing.py
  • backend/Dockerfile
  • backend/alembic/versions/f9a3d1c8e205_add_config_schema_flags.py
  • backend/app/config.py
  • backend/app/infra_init.py
  • backend/app/main.py
  • backend/app/routers/__init__.py
  • backend/app/routers/admin_approval.py
  • backend/app/routers/admin_auth.py
  • backend/app/routers/agent.py
  • backend/app/routers/audit.py
  • backend/app/routers/enrichment.py
  • backend/app/routers/evaluation.py
  • backend/app/routers/feedback.py
  • backend/app/routers/flags.py
  • backend/app/routers/health.py
  • backend/app/routers/orchestration.py
  • backend/app/routers/profiling.py
  • backend/app/routers/publish.py
  • backend/app/routers/questions.py
  • backend/app/routers/scopes.py
  • backend/app/services/auth.py
  • backend/app/services/evaluator.py
  • backend/app/services/flag_service.py
  • backend/app/services/join_detection.py
  • backend/app/services/langfuse_client.py
  • backend/app/services/llm_judge.py
  • backend/app/services/profiling_engine.py
  • backend/app/services/scoring.py
  • backend/app/services/trino_client.py
  • backend/pyproject.toml
  • backend/tests/conftest.py
  • core/pyproject.toml
  • core/src/core/__init__.py
  • core/src/core/cache.py
  • core/src/core/config.py
  • core/src/core/langfuse.py
  • core/src/core/models/models.py
  • core/src/core/trino.py
  • docker-compose.yml
  • frontend/.env
  • frontend/index.html
  • frontend/package.json
  • frontend/playwright.config.ts
  • frontend/src/App.tsx
  • frontend/src/api/agent.ts
  • frontend/src/api/flags.ts
  • frontend/src/components/AgentGraph.tsx
  • frontend/src/components/JsonTreeView.tsx
  • frontend/src/components/SchemaPlanDisplay.module.css
  • frontend/src/components/SchemaPlanDisplay.tsx
  • frontend/src/components/TraceTimeline.tsx
  • frontend/src/components/layout/Sidebar.tsx
  • frontend/src/config/constants.ts
  • frontend/src/index.css
  • frontend/src/pages/AgentTestingPage.module.css
  • frontend/src/pages/AgentTestingPage.tsx
  • frontend/src/pages/FlagsPage.css
  • frontend/src/pages/FlagsPage.tsx
  • frontend/src/pages/LandingPage.tsx
  • frontend/src/tests/components.test.tsx
  • frontend/tests/agent-testing.spec.ts
  • frontend/tests/real-agent.spec.ts
💤 Files with no reviewable changes (2)
  • backend/app/services/trino_client.py
  • backend/app/services/scoring.py

Comment thread agent/Dockerfile Outdated
Comment on lines +16 to +17
# Sync dependencies via HTTPS
RUN uv sync --no-dev

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n agent/Dockerfile

Repository: StavPonte11/text2sql-onboarding

Length of output: 726


🌐 Web query:

uv sync --frozen flag documentation python uv package manager

💡 Result:

The --frozen flag in the uv package manager is an option used with the uv sync command (and related commands like uv run) to manage project environments without modifying the lockfile [1][2][3]. When you use uv sync --frozen, the following behaviors apply: 1. Skip Locking: It instructs uv to skip the re-locking process [4][3]. Typically, uv re-locks the project before syncing to ensure the environment matches the latest project configuration [3]. 2. Use Existing Lockfile: It treats the existing uv.lock file as the definitive source of truth for dependency versions [2]. 3. No Updates: It does not check if the lockfile is up-to-date with the pyproject.toml file [2]. Consequently, if there are changes to dependencies in pyproject.toml that have not yet been incorporated into the lockfile, those changes will not be reflected in the environment [2]. 4. Error on Missing Lockfile: If the uv.lock file is missing, uv will exit with an error [2]. This flag can also be set via the environment variable UV_FROZEN [2]. It is distinct from the --locked flag, which asserts that the lockfile is up-to-date and errors if the lockfile needs to be changed to match the current dependencies [2][4].

Citations:


🏁 Script executed:

find . -name "uv.lock" -type f

Repository: StavPonte11/text2sql-onboarding

Length of output: 138


🏁 Script executed:

rg "uv sync" --type dockerfile --type toml --type py

Repository: StavPonte11/text2sql-onboarding

Length of output: 110


🏁 Script executed:

rg "uv sync" --type-list | head -20

Repository: StavPonte11/text2sql-onboarding

Length of output: 465


🏁 Script executed:

rg "uv sync"

Repository: StavPonte11/text2sql-onboarding

Length of output: 192


Use --frozen flag to ensure deterministic dependency resolution in container builds.

Line 17 uses uv sync --no-dev which can re-resolve dependency versions when the lock state changes, making builds nondeterministic. Add the --frozen flag to enforce the existing lockfile without re-locking, ensuring consistent and reproducible container images.

Proposed fix
- RUN uv sync --no-dev
+ RUN uv sync --no-dev --frozen
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Sync dependencies via HTTPS
RUN uv sync --no-dev
# Sync dependencies via HTTPS
RUN uv sync --no-dev --frozen
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/Dockerfile` around lines 16 - 17, The uv sync command on line 17 lacks
the --frozen flag, which allows dependency re-resolution and can result in
nondeterministic builds. Add the --frozen flag to the existing uv sync --no-dev
command to enforce the lockfile without re-locking, ensuring reproducible
container builds.

Comment thread agent/pyproject.toml Outdated
Comment on lines +24 to +27
[tool.uv.sources]
core = { path = "../core" }
esca-sdk = { git = "https://github.com/elirazpevz/esca.git", subdirectory = "packages/esca-sdk" }
python-core-utils = { path = "../python-core-utils" }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Checking local source dependency declarations..."
rg -n 'python-core-utils\s*=\s*\{ path = "\.\./python-core-utils" \}' agent/pyproject.toml backend/pyproject.toml

echo
echo "Checking whether python-core-utils exists in this checkout:"
if [ -d python-core-utils ]; then
  echo "FOUND: python-core-utils/"
else
  echo "MISSING: python-core-utils/"
fi

echo
echo "Tracked files under python-core-utils (if any):"
git ls-files | rg '^python-core-utils/' || true

Repository: StavPonte11/text2sql-onboarding

Length of output: 421


Remove the untracked local path dependency that will fail in CI and clean clones.

agent/pyproject.toml line 27 declares python-core-utils = { path = "../python-core-utils" }, but the directory doesn't exist and isn't tracked in git. This breaks dependency resolution in clean-clone and CI environments. Also present in backend/pyproject.toml line 104.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/pyproject.toml` around lines 24 - 27, Remove the untracked local path
dependency for python-core-utils from the [tool.uv.sources] section in
agent/pyproject.toml (the line declaring python-core-utils = { path =
"../python-core-utils" }). Also locate and remove the identical
python-core-utils path dependency declaration from backend/pyproject.toml at
line 104. These local path references point to non-existent directories that
aren't tracked in git and will cause dependency resolution failures in CI and
clean-clone environments.

Comment thread agent/src/agent/llm.py
Comment on lines +48 to +55
model_flag = _NODE_MODEL_FLAGS.get(node, "QUERY_BUILDER_MODEL")
model = flags.get(model_flag) or settings.LLM_MODEL

# Resolve temperature
if temperature is None:
temp_flag = _NODE_TEMP_FLAGS.get(node)
temperature = float(flags.get(temp_flag, 0.0)) if temp_flag else 0.0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Per-node env fallback is bypassed during model/temperature resolution.

Line 49 and Line 54 force generic fallbacks (settings.LLM_MODEL and 0.0), so node-specific env defaults are ignored whenever runtime flags are absent. That breaks the documented priority and makes non-flagged runs behave differently from flagged runs.

💡 Proposed fix
-    model_flag = _NODE_MODEL_FLAGS.get(node, "QUERY_BUILDER_MODEL")
-    model = flags.get(model_flag) or settings.LLM_MODEL
+    model_flag = _NODE_MODEL_FLAGS.get(node, "QUERY_BUILDER_MODEL")
+    model = flags.get(model_flag) or getattr(settings, model_flag, settings.LLM_MODEL)

     # Resolve temperature
     if temperature is None:
         temp_flag = _NODE_TEMP_FLAGS.get(node)
-        temperature = float(flags.get(temp_flag, 0.0)) if temp_flag else 0.0
+        default_temp = getattr(settings, temp_flag, 0.0) if temp_flag else 0.0
+        raw_temp = flags.get(temp_flag, default_temp) if temp_flag else default_temp
+        temperature = float(raw_temp)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/llm.py` around lines 48 - 55, The model and temperature
resolution logic bypasses per-node environment variable defaults by immediately
falling back to generic defaults. In the code block where model is resolved
(around line 49), after checking the runtime flags dictionary with model_flag,
you need to check for a per-node environment variable before falling back to
settings.LLM_MODEL. Similarly, in the temperature resolution (around line 54),
after checking the temp_flag in the flags dictionary, check for a per-node
environment variable before falling back to 0.0. Implement per-node environment
variable mappings (similar to the existing _NODE_MODEL_FLAGS and
_NODE_TEMP_FLAGS dictionaries) and use os.environ.get() to retrieve those
environment values in the resolution chain, ensuring the priority order is:
runtime flags → per-node environment variables → generic defaults.

Comment on lines +43 to +49
try:
from core.langfuse import get_langfuse_handler
langfuse_handler = get_langfuse_handler()
callbacks = [langfuse_handler] if langfuse_handler else []
except Exception:
callbacks = []

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Initialize langfuse_handler before the guarded import path.

langfuse_handler is referenced at Line 125, but if the Line 44-46 block throws, it is never assigned and this path raises UnboundLocalError.

Suggested fix
-    try:
+    langfuse_handler = None
+    try:
         from core.langfuse import get_langfuse_handler
         langfuse_handler = get_langfuse_handler()
         callbacks = [langfuse_handler] if langfuse_handler else []
     except Exception:
         callbacks = []

Also applies to: 124-125

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/mcp_server.py` around lines 43 - 49, The `langfuse_handler`
variable is not initialized before the try-except block that contains the
guarded import, which means if an exception is raised during the import or
function call to get_langfuse_handler(), the variable will never be assigned.
This causes an UnboundLocalError when `langfuse_handler` is later referenced.
Initialize `langfuse_handler` to None before the try block (before the attempt
to import and call get_langfuse_handler) so that the variable is always defined
regardless of whether the try block succeeds or fails.

Comment on lines 53 to +56
langfuse_prompt = langfuse_client.get_prompt(settings.LANGFUSE_PROMPT_EXTRACTOR)
self.prompt = ChatPromptTemplate.from_messages(langfuse_prompt.get_langchain_prompt())

self.chain = self.prompt | self.llm.with_structured_output(ExtractorOutput, method="json_schema")
self.prompt = ChatPromptTemplate.from_messages(
langfuse_prompt.get_langchain_prompt()
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard prompt retrieval before dereferencing Langfuse prompt objects.

Line 55 assumes langfuse_client.get_prompt(...) always returns an object. Upstream contract allows None, so this can raise and fail extractor setup.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/nodes/extractor.py` around lines 53 - 56, Add a guard check
after calling langfuse_client.get_prompt(settings.LANGFUSE_PROMPT_EXTRACTOR) to
verify that langfuse_prompt is not None before attempting to call
get_langchain_prompt() on it. The upstream contract allows get_prompt() to
return None, so this null check is required to prevent an AttributeError during
extractor initialization. If langfuse_prompt is None, raise an informative error
that indicates the prompt could not be retrieved from Langfuse.

Comment on lines +141 to +146
title: 'Value',
key: 'value',
render: (record: any) => {
const val = record.value || record.val || '';
return <Text code>{renderCellSafe(val)}</Text>;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Preserve valid falsy values in filter/limit rendering.

Using || here masks valid values like 0/false, and limit checks hide LIMIT 0. That can render incorrect plan details to users.

Suggested fix
-        const val = record.value || record.val || '';
+        const val = record.value ?? record.val ?? '';
         return <Text code>{renderCellSafe(val)}</Text>;
       }
     }
   ];
@@
-      {(planData.order_by?.length > 0 || planData.limit) && (
+      {(planData.order_by?.length > 0 || planData.limit != null) && (
@@
-            {planData.limit && (
+            {planData.limit != null && (
               <div>
                 <Text strong>Limit: </Text> <Tag color="magenta">{planData.limit}</Tag>
               </div>
             )}

Also applies to: 215-233

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/components/SchemaPlanDisplay.tsx` around lines 141 - 146, The
render function in the value column uses the || operator to provide a fallback
empty string, which incorrectly masks valid falsy values like 0 and false.
Replace the current logic that does const val = record.value || record.val || ''
with an explicit null/undefined check, such as checking if record.value is not
null and not undefined before using it, otherwise fall back to record.val with
the same explicit check, and finally to an empty string. Apply this same fix to
the similar code section around lines 215-233 that also uses || operators for
rendering limit and filter values.

Comment on lines +793 to +832
useEffect(() => {
if (!traceId) {
setTraceSpans([]);
setSelectedStep(null);
setSelectedStepIndex(null);
return;
}

setLoadingTrace(true);
let attempts = 0;
const maxAttempts = 5;

const fetchTrace = () => {
axios.get(`/api/agent/traces/${traceId}`)
.then(res => {
const data = res.data || [];
if (data.length > 0) {
setTraceSpans(data);
setLoadingTrace(false);
} else if (attempts < maxAttempts) {
attempts++;
setTimeout(fetchTrace, 2000); // retry in 2s
} else {
setTraceSpans([]);
setLoadingTrace(false);
}
})
.catch(err => {
console.error("Failed to fetch trace", err);
if (attempts < maxAttempts) {
attempts++;
setTimeout(fetchTrace, 2000);
} else {
setLoadingTrace(false);
}
});
};

fetchTrace();
}, [traceId]);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cancel pending trace poll timers/requests in the polling effect.

This effect schedules recursive retries but does not clean up timers or in-flight requests, so stale callbacks can update state after unmount or traceId changes.

Suggested fix
   useEffect(() => {
+    let cancelled = false;
+    let retryTimer: ReturnType<typeof setTimeout> | null = null;
+    const controller = new AbortController();
+
     if (!traceId) {
       setTraceSpans([]);
       setSelectedStep(null);
       setSelectedStepIndex(null);
-      return;
+      return () => {
+        cancelled = true;
+        controller.abort();
+        if (retryTimer) clearTimeout(retryTimer);
+      };
     }

     setLoadingTrace(true);
     let attempts = 0;
     const maxAttempts = 5;

     const fetchTrace = () => {
-      axios.get(`/api/agent/traces/${traceId}`)
+      axios.get(`/api/agent/traces/${traceId}`, { signal: controller.signal })
         .then(res => {
+          if (cancelled) return;
           const data = res.data || [];
           if (data.length > 0) {
             setTraceSpans(data);
             setLoadingTrace(false);
           } else if (attempts < maxAttempts) {
             attempts++;
-            setTimeout(fetchTrace, 2000); // retry in 2s
+            retryTimer = setTimeout(fetchTrace, 2000); // retry in 2s
           } else {
             setTraceSpans([]);
             setLoadingTrace(false);
           }
         })
         .catch(err => {
+          if (cancelled) return;
           console.error("Failed to fetch trace", err);
           if (attempts < maxAttempts) {
             attempts++;
-            setTimeout(fetchTrace, 2000);
+            retryTimer = setTimeout(fetchTrace, 2000);
           } else {
             setLoadingTrace(false);
           }
         });
     };

     fetchTrace();
+    return () => {
+      cancelled = true;
+      controller.abort();
+      if (retryTimer) clearTimeout(retryTimer);
+    };
   }, [traceId]);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
useEffect(() => {
if (!traceId) {
setTraceSpans([]);
setSelectedStep(null);
setSelectedStepIndex(null);
return;
}
setLoadingTrace(true);
let attempts = 0;
const maxAttempts = 5;
const fetchTrace = () => {
axios.get(`/api/agent/traces/${traceId}`)
.then(res => {
const data = res.data || [];
if (data.length > 0) {
setTraceSpans(data);
setLoadingTrace(false);
} else if (attempts < maxAttempts) {
attempts++;
setTimeout(fetchTrace, 2000); // retry in 2s
} else {
setTraceSpans([]);
setLoadingTrace(false);
}
})
.catch(err => {
console.error("Failed to fetch trace", err);
if (attempts < maxAttempts) {
attempts++;
setTimeout(fetchTrace, 2000);
} else {
setLoadingTrace(false);
}
});
};
fetchTrace();
}, [traceId]);
useEffect(() => {
let cancelled = false;
let retryTimer: ReturnType<typeof setTimeout> | null = null;
const controller = new AbortController();
if (!traceId) {
setTraceSpans([]);
setSelectedStep(null);
setSelectedStepIndex(null);
return () => {
cancelled = true;
controller.abort();
if (retryTimer) clearTimeout(retryTimer);
};
}
setLoadingTrace(true);
let attempts = 0;
const maxAttempts = 5;
const fetchTrace = () => {
axios.get(`/api/agent/traces/${traceId}`, { signal: controller.signal })
.then(res => {
if (cancelled) return;
const data = res.data || [];
if (data.length > 0) {
setTraceSpans(data);
setLoadingTrace(false);
} else if (attempts < maxAttempts) {
attempts++;
retryTimer = setTimeout(fetchTrace, 2000); // retry in 2s
} else {
setTraceSpans([]);
setLoadingTrace(false);
}
})
.catch(err => {
if (cancelled) return;
console.error("Failed to fetch trace", err);
if (attempts < maxAttempts) {
attempts++;
retryTimer = setTimeout(fetchTrace, 2000);
} else {
setLoadingTrace(false);
}
});
};
fetchTrace();
return () => {
cancelled = true;
controller.abort();
if (retryTimer) clearTimeout(retryTimer);
};
}, [traceId]);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/AgentTestingPage.tsx` around lines 793 - 832, The
useEffect hook with the fetchTrace function does not clean up pending timers or
cancel in-flight axios requests, which can cause state updates after component
unmount or when traceId changes. Add an AbortController for the axios request to
cancel in-flight requests, store the timeoutId returned from setTimeout calls so
they can be cleared, and return a cleanup function from the useEffect that calls
controller.abort() and clears any pending timeouts using clearTimeout. Pass the
abort signal from the controller to the axios.get request configuration.

Comment on lines +40 to +51
const viewTraceBtn = page.getByRole('button', { name: /View Full Trace/i });

// If the button is visible, click it and verify the modal
if (await viewTraceBtn.isVisible()) {
await viewTraceBtn.click();

const modalTitle = page.getByText('Execution Trace', { exact: true });
await expect(modalTitle).toBeVisible();

// Close modal
await page.locator('button.ant-modal-close').click();
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Make trace-view assertion deterministic in completed flow.

This branch currently allows the test to pass even when View Full Trace is missing for a completed response with trace_id, which weakens regression coverage.

Suggested change
-    // If the button is visible, click it and verify the modal
-    if (await viewTraceBtn.isVisible()) {
-      await viewTraceBtn.click();
-      
-      const modalTitle = page.getByText('Execution Trace', { exact: true });
-      await expect(modalTitle).toBeVisible();
-      
-      // Close modal
-      await page.locator('button.ant-modal-close').click();
-    }
+    await expect(viewTraceBtn).toBeVisible();
+    await viewTraceBtn.click();
+    const modalTitle = page.getByText('Execution Trace', { exact: true });
+    await expect(modalTitle).toBeVisible();
+    await page.locator('button.ant-modal-close').click();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const viewTraceBtn = page.getByRole('button', { name: /View Full Trace/i });
// If the button is visible, click it and verify the modal
if (await viewTraceBtn.isVisible()) {
await viewTraceBtn.click();
const modalTitle = page.getByText('Execution Trace', { exact: true });
await expect(modalTitle).toBeVisible();
// Close modal
await page.locator('button.ant-modal-close').click();
}
const viewTraceBtn = page.getByRole('button', { name: /View Full Trace/i });
await expect(viewTraceBtn).toBeVisible();
await viewTraceBtn.click();
const modalTitle = page.getByText('Execution Trace', { exact: true });
await expect(modalTitle).toBeVisible();
await page.locator('button.ant-modal-close').click();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/tests/agent-testing.spec.ts` around lines 40 - 51, The test
currently makes the View Full Trace button verification optional by wrapping it
in an if statement checking isVisible(). This allows the test to pass even when
the button is missing for a completed response with trace_id, weakening
regression coverage. Remove the if condition that wraps the viewTraceBtn
verification and instead use await expect(viewTraceBtn).toBeVisible() to assert
the button must be present and visible before proceeding with the click and
modal verification steps.

Comment on lines +83 to +86
// The reject button should be visible but disabled initially
const rejectBtn = page.getByRole('button', { name: 'Reject' });
await expect(rejectBtn).toBeVisible();
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Assert the reject button disabled state as intended.

The test comment states the button should be disabled initially, but the assertion only checks visibility. Add toBeDisabled() so the intended HITL guard is actually validated.

Suggested change
     const rejectBtn = page.getByRole('button', { name: 'Reject' });
     await expect(rejectBtn).toBeVisible();
+    await expect(rejectBtn).toBeDisabled();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/tests/agent-testing.spec.ts` around lines 83 - 86, In the
agent-testing.spec.ts test file, the reject button test currently only validates
visibility with `toBeVisible()` but the comment explicitly states the button
should be disabled initially. After the existing `await
expect(rejectBtn).toBeVisible();` assertion for the rejectBtn variable, add an
additional assertion using `await expect(rejectBtn).toBeDisabled();` to actually
validate that the button is disabled as intended, ensuring the HITL guard
protection is properly tested.

test.describe('Real Agent Execution', () => {
test('should run query against real backend', async ({ page }) => {
// Navigate to the Studio / Agent Testing page using the local dev server
await page.goto('http://localhost:3001/agent-testing');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Use configured Playwright base URL instead of hardcoded localhost.

Hardcoding host/port makes this test environment-specific and brittle when test infra uses a different base URL.

Suggested change
-    await page.goto('http://localhost:3001/agent-testing');
+    await page.goto('/agent-testing');
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await page.goto('http://localhost:3001/agent-testing');
await page.goto('/agent-testing');
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/tests/real-agent.spec.ts` at line 6, The page.goto() call in the
test uses a hardcoded URL string 'http://localhost:3001/agent-testing' which
makes the test environment-specific and brittle. Replace the hardcoded localhost
URL with a relative path that Playwright can resolve using its configured base
URL from the Playwright configuration. This allows the test to work across
different test environments without modification.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review continued from previous batch...

Comment on lines +48 to +55
existing = session.exec(
select(EnrichmentVersion)
.where(EnrichmentVersion.table_id == table_id)
.order_by(EnrichmentVersion.version.desc())
).first()

next_version = (existing.version + 1) if existing else 1
existing_data: dict = dict(existing.data) if (existing and existing.data) else {}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Version allocation is race-prone under concurrent profiling jobs.

At Lines 48-55, next_version is derived from a read-then-write pattern without concurrency control. Two workers can pick the same next version for the same table_id, causing ambiguous “latest version” reads.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/profiling.py` around lines 48 - 55, The current
implementation has a race condition where multiple concurrent workers can query
the same existing EnrichmentVersion for a given table_id, read the same version
number, and calculate the same next_version value before any of them write. To
fix this, add database-level locking to the select query that retrieves the
existing EnrichmentVersion. Modify the select statement that queries for
EnrichmentVersion by table_id to include pessimistic locking using SQLAlchemy's
with_for_update() method, ensuring that only one worker acquires and increments
the version at a time. This makes the read-then-increment operation atomic at
the database level, preventing concurrent workers from selecting the same
next_version.

Comment on lines +57 to +63
has_human_description = bool(existing_data.get("table_description", "").strip())

new_data = dict(existing_data)
new_data["ai_summary"] = summary
if not has_human_description:
# Promote ai_summary to table_description only when no human annotation exists
new_data["table_description"] = summary

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

AI summaries can become permanently stale after the first run.

At Line 57, any non-empty table_description is treated as human-authored. After the first AI-generated promotion, future runs update only ai_summary, but downstream readers prefer table_description, so they keep serving the old summary.

Suggested fix
-    has_human_description = bool(existing_data.get("table_description", "").strip())
+    current_desc = (existing_data.get("table_description") or "").strip()
+    previous_ai = (existing_data.get("ai_summary") or "").strip()
+    # Treat description as human-authored only if it differs from last AI summary.
+    has_human_description = bool(current_desc and current_desc != previous_ai)

     new_data = dict(existing_data)
     new_data["ai_summary"] = summary
     if not has_human_description:
         # Promote ai_summary to table_description only when no human annotation exists
         new_data["table_description"] = summary
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/profiling.py` around lines 57 - 63, The current logic at
line 57 cannot distinguish between human-authored table descriptions and
AI-generated ones after the initial promotion. Once an ai_summary is promoted to
table_description on the first run, all subsequent runs treat it as
human-authored (because it's non-empty) and only update ai_summary, leaving
table_description permanently stale. Add a metadata field or flag to track
whether table_description is AI-generated or human-authored, then update the
has_human_description check to only consider truly human-authored descriptions,
allowing AI summaries to be re-promoted in future runs when actual human
annotations are absent.

Comment thread backend/app/services/llm_judge.py Outdated
reasoning={"error": "OPENAI_API_KEY is missing. Evaluation skipped."},
confidence_in_judgment=0.0,
)
api_key = settings.OPENAI_API_KEY

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore fail-fast guard for missing OpenAI key.

At Line 129, direct key usage removes the early “key missing” short-circuit and now attempts remote calls even when misconfigured. This adds avoidable latency and noisy failures on every judge invocation.

Suggested fix
-    api_key = settings.OPENAI_API_KEY
+    api_key = getattr(settings, "OPENAI_API_KEY", None)
+    if not api_key:
+        logger.warning("LLM Judge skipped: OPENAI_API_KEY is not configured")
+        return JudgeOutput(
+            table_selection_correctness=0.0,
+            sql_semantic_equivalence=0.0,
+            result_correctness=0.0,
+            hallucination_detected=False,
+            failure_type="judge_unavailable",
+            reasoning={"error": "OPENAI_API_KEY is not configured"},
+            confidence_in_judgment=0.0,
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
api_key = settings.OPENAI_API_KEY
api_key = getattr(settings, "OPENAI_API_KEY", None)
if not api_key:
logger.warning("LLM Judge skipped: OPENAI_API_KEY is not configured")
return JudgeOutput(
table_selection_correctness=0.0,
sql_semantic_equivalence=0.0,
result_correctness=0.0,
hallucination_detected=False,
failure_type="judge_unavailable",
reasoning={"error": "OPENAI_API_KEY is not configured"},
confidence_in_judgment=0.0,
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/services/llm_judge.py` at line 129, The api_key assignment at
line 129 directly retrieves settings.OPENAI_API_KEY without validating it exists
first, removing the fail-fast guard and allowing the code to proceed to remote
calls even when misconfigured. Add a validation check immediately after the
api_key assignment that verifies the key is not None or empty, raising an
appropriate exception (such as ValueError or a configuration error) if the key
is missing. This will restore the early short-circuit behavior and prevent
unnecessary latency and failures on each judge invocation.

Comment on lines +1097 to +1103
+ "\n".join(f" • {l}" for l in col_lines)
+ "\n\nWrite a concise ≤3-sentence description of this table's purpose, "
"what business domain it represents, and which columns are most important "
"for querying. Be specific about what the table contains."
)

import httpx, json as _json

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix Ruff errors in the new summary helper.

Line 1097 uses ambiguous variable name l (E741), and Line 1103 uses multiple imports on one line (E401).

Suggested fix
-            + "\n".join(f"  • {l}" for l in col_lines)
+            + "\n".join(f"  • {line}" for line in col_lines)
             + "\n\nWrite a concise ≤3-sentence description of this table's purpose, "
             "what business domain it represents, and which columns are most important "
             "for querying. Be specific about what the table contains."
         )

-        import httpx, json as _json
+        import httpx
🧰 Tools
🪛 Ruff (0.15.17)

[error] 1097-1097: Ambiguous variable name: l

(E741)


[error] 1103-1103: Multiple imports on one line

Split imports

(E401)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/services/profiling_engine.py` around lines 1097 - 1103, Fix two
Ruff linting errors in the summary helper function. First, replace the ambiguous
single-letter variable name `l` in the list comprehension (on the line with
col_lines) with a more descriptive name like `line` to resolve the E741 error.
Second, split the multi-import statement on line 1103 (the line with `import
httpx, json as _json`) into two separate import statements, one for `httpx` and
one for `json as _json`, to resolve the E401 error.

Source: Linters/SAST tools

Comment thread frontend/src/index.css
@@ -1,22 +1,24 @@
@import url('https://fonts.googleapis.com/css2?family=Fira+Code:wght@400;500;600;700&family=Fira+Sans:wght@300;400;500;600;700&display=swap');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix stylelint violations in font import and font keyword casing.

Current declarations conflict with active stylelint rules and will keep lint red.

Proposed fix
-@import url('https://fonts.googleapis.com/css2?family=Fira+Code:wght@400;500;600;700&family=Fira+Sans:wght@300;400;500;600;700&display=swap');
+@import 'https://fonts.googleapis.com/css2?family=Fira+Code:wght@400;500;600;700&family=Fira+Sans:wght@300;400;500;600;700&display=swap';
@@
-  --sans: 'Fira Sans', system-ui, 'Segoe UI', Roboto, sans-serif;
-  --heading: 'Fira Sans', system-ui, 'Segoe UI', Roboto, sans-serif;
-  --mono: 'Fira Code', ui-monospace, Consolas, monospace;
+  --sans: 'Fira Sans', system-ui, 'Segoe UI', roboto, sans-serif;
+  --heading: 'Fira Sans', system-ui, 'Segoe UI', roboto, sans-serif;
+  --mono: 'Fira Code', ui-monospace, consolas, monospace;

Also applies to: 15-17

🧰 Tools
🪛 Stylelint (17.13.0)

[error] 1-1: Expected "url('https://fonts.googleapis.com/css2?family=Fira+Code:wght@400;500;600;700&family=Fira+Sans:wght@300;400;500;600;700&display=swap')" to be "'https://fonts.googleapis.com/css2?family=Fira+Code:wght@400;500;600;700&family=Fira+Sans:wght@300;400;500;600;700&display=swap'" (import-notation)

(import-notation)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/index.css` at line 1, The `@import` statement at the top of the
file violates stylelint rules regarding URL formatting and font keyword casing.
Change the `@import` url() syntax to use quotes instead (either `@import` 'url' or
`@import` "url"), and verify that all font-related keywords like the display
parameter are using correct casing according to your stylelint configuration.
Apply the same fixes to the similar font import declarations on lines 15-17 to
resolve all lint violations in the font import block.

Source: Linters/SAST tools

Comment on lines +172 to +173
animation: spinLoader 1s cubic-bezier(0.55, 0.25, 0.25, 0.85) infinite;
box-shadow: 0 0 15px rgba(56, 189, 248, 0.2);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Rename keyframes to kebab-case to satisfy the configured lint rule.

These names currently violate keyframes-name-pattern and keep stylelint failing.

Proposed fix
-.customSpinner {
+.customSpinner {
@@
-  animation: spinLoader 1s cubic-bezier(0.55, 0.25, 0.25, 0.85) infinite;
+  animation: spin-loader 1s cubic-bezier(0.55, 0.25, 0.25, 0.85) infinite;
 }
 
-@keyframes spinLoader {
+@keyframes spin-loader {
@@
-.spinnerText {
+.spinnerText {
@@
-  animation: pulseText 2s ease-in-out infinite;
+  animation: pulse-text 2s ease-in-out infinite;
 }
 
-@keyframes pulseText {
+@keyframes pulse-text {
@@
-.animateIn {
-  animation: fadeInUp 0.5s ease forwards;
+.animateIn {
+  animation: fade-in-up 0.5s ease forwards;
 }
 
-@keyframes fadeInUp {
+@keyframes fade-in-up {
@@
-.pulseIcon {
-  animation: pulseIconAnim 1.5s infinite ease-in-out;
+.pulseIcon {
+  animation: pulse-icon-anim 1.5s infinite ease-in-out;
 }
 
-@keyframes pulseIconAnim {
+@keyframes pulse-icon-anim {
@@
-.pulseDot {
+.pulseDot {
@@
-  animation: pulseDotAnim 1.5s infinite ease-in-out;
+  animation: pulse-dot-anim 1.5s infinite ease-in-out;
 }
 
-@keyframes pulseDotAnim {
+@keyframes pulse-dot-anim {

Also applies to: 176-202, 771-771, 774-790

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/AgentTestingPage.module.css` around lines 172 - 173, The
keyframes animation names in the CSS file are currently in camelCase format
(such as spinLoader), which violates the stylelint keyframes-name-pattern rule.
Rename all keyframe definitions and their corresponding animation property
references to use kebab-case naming convention (e.g., spinLoader should become
spin-loader). This includes updating both the `@keyframes` declaration and all
animation property values that reference these keyframes throughout the file,
including the spinLoader animation and all other keyframe animations mentioned
in the related lines.

Source: Linters/SAST tools

Comment on lines +358 to +359
color: var(--text-muted);
font-family: var(--mono);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use a defined fallback for --text-muted color tokens.

--text-muted is referenced here but not defined in the provided root theme vars, so these declarations can resolve invalidly and fall back unpredictably.

Proposed fix
-  color: var(--text-muted);
+  color: var(--text-muted, var(--text));
@@
-  color: var(--text-muted);
+  color: var(--text-muted, var(--text));
@@
-  color: var(--text-muted) !important;
+  color: var(--text-muted, var(--text)) !important;
@@
-  color: var(--text-muted);
+  color: var(--text-muted, var(--text));

Also applies to: 419-420, 697-697, 766-766

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/pages/AgentTestingPage.module.css` around lines 358 - 359, The
CSS custom property `--text-muted` is referenced in multiple locations (lines
358-359, 419-420, 697, and 766) but is not defined in the root theme variables,
causing unpredictable fallback behavior. Update all color declarations using
var(--text-muted) to include a fallback value using the CSS syntax
var(--text-muted, <fallback-color>), or define the --text-muted variable in your
root theme configuration to ensure consistent color rendering across the
application.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
frontend/src/components/SchemaPlanDisplay.tsx (1)

151-152: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use nullish checks so valid falsy plan values are not hidden.

Line 151 uses ||, which masks valid values like 0/false, and Lines 222/236 hide LIMIT 0 due to truthy checks. This can render incorrect plan details.

Suggested minimal fix
-        const val = record.value || record.val || '';
+        const val = record.value ?? record.val ?? '';
@@
-      {(planData.order_by?.length > 0 || planData.limit) && (
+      {(planData.order_by?.length > 0 || planData.limit != null) && (
@@
-            {planData.limit && (
+            {planData.limit != null && (

Also applies to: 222-223, 236-239

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/components/SchemaPlanDisplay.tsx` around lines 151 - 152, In the
SchemaPlanDisplay component, replace the logical OR operator (`||`) with the
nullish coalescing operator (`??`) on line 151 in the expression `record.value
|| record.val || ''` to ensure that valid falsy values like 0 and false are not
treated as missing. Additionally, review and fix the truthy checks at lines
222-223 and 236-239 that may be hiding valid plan values like LIMIT 0, replacing
any logical OR operations with nullish coalescing checks so only null and
undefined are treated as missing values rather than all falsy values.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@frontend/src/components/SchemaPlanDisplay.tsx`:
- Around line 151-152: In the SchemaPlanDisplay component, replace the logical
OR operator (`||`) with the nullish coalescing operator (`??`) on line 151 in
the expression `record.value || record.val || ''` to ensure that valid falsy
values like 0 and false are not treated as missing. Additionally, review and fix
the truthy checks at lines 222-223 and 236-239 that may be hiding valid plan
values like LIMIT 0, replacing any logical OR operations with nullish coalescing
checks so only null and undefined are treated as missing values rather than all
falsy values.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 842d2714-f17c-4a1c-a52e-2f0b8ff1b18c

📥 Commits

Reviewing files that changed from the base of the PR and between 861e1e1 and 6fb25db.

📒 Files selected for processing (2)
  • frontend/src/components/SchemaPlanDisplay.tsx
  • frontend/vite.config.ts

@elirazpevz elirazpevz changed the base branch from main to stav/configurational-feature-flag June 23, 2026 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants