Stav/observability and ux by yuvalkh · Pull Request #22 · StavPonte11/text2sql-onboarding

yuvalkh · 2026-06-25T06:41:37Z

This is mega pr from all the previous PRs

Summary by CodeRabbit

New Features
- Added a new Feature Flags and Execution Modes page for managing app behavior from the UI.
- Introduced execution-path visualization, trace timelines, and detailed step-by-step inspection in the agent testing experience.
- Added streaming updates for agent runs and a “suggest fixes” action during approval flow.
Bug Fixes
- Improved agent run handling with clearer interruptions, retries, and fallback behavior.
- Strengthened health checks and startup validation for key services.

…onfiguration

… (TTS-G1-07 to 09)

…1 through G2-05) TTS-G2-01: Table Scoping Modes - Add scoping_mode field to AgentState ('strict' | 'hybrid') - Add TABLE_SCOPING_MODE config setting (default: hybrid) - Add validate_config_node as START→extractor gatekeeper with InvalidConfigurationException for strict mode + empty allowed_tables - hybrid_search_tables enforces hard allowlist in strict mode - Strict mode injects [STRICT MODE] block into schema explorer prompt - Langfuse trace tagged with scoping_mode=<value> TTS-G2-02: Fallback / Escalation Hierarchy (HITL) - Add escalated, escalation_reason to AgentState - Add hitl_escalation_node compiled with interrupt_before=['hitl_escalation'] - route_schema_explorer escalates after MAX_SCHEMA_RETRIES (3) retries - route_refiner escalates after MAX_REFINER_ITERATIONS exhausted - refiner.py and schema_explorer.py set escalation_reason before handing off - hitl_escalation → extractor edge enables full state-reset resume TTS-G2-03: Advanced Schema Explorer (4-Phase Pipeline) - Add networkx>=3.3 dependency - New utils/schema_enrichment.py with 4 async phases: - run_semantic_typing (LLM column classification) - run_join_graph (BFS via networkx over ForeignKeyMapping + CrossTableProfile) - run_schema_summarization (per-table LLM summaries replacing verbose JSON) - run_ambiguity_detection (column/table name ambiguity notes) - Each phase gated by ENABLE_* config flag (defaults: only AMBIGUITY_DETECT on) - active_schema_phases list emitted to Langfuse trace metadata TTS-G2-04: Satisfaction Check Module - New nodes/satisfaction_check.py with 4 checks: A: Execution success (SATISFACTION_CHECK_EXECUTION) B: Row plausibility min/max (SATISFACTION_CHECK_PLAUSIBILITY) C: Structural column coverage via LLM (SATISFACTION_CHECK_COLUMNS) D: Semantic alignment scored 0-1 (SATISFACTION_CHECK_SEMANTIC, off by default) - satisfaction_fail_count drives routing: < MAX → refiner, >= MAX → hitl_escalation - Replaces direct refiner→finalizer success path - All check results pushed to Langfuse trace metadata TTS-G2-05: Redis Schema Cache - New core/src/core/cache.py with CacheService (get/set/invalidate/invalidate_pattern) - Non-blocking: all Redis errors log a warning and fall through to live data - Key conventions: profile:{table_id}:{version}, ddl:{cat}:{sch}:{tbl}, catalog_valid:... - SCAN-based pattern invalidation (no blocking KEYS command) - Integrated into get_table_profile: cache-first fetch with PROFILE_CACHE_TTL - cache_hit_count / cache_miss_count tracked per schema_explorer invocation - Counters pushed to Langfuse trace at end of schema_explorer_node

…ing, and expand test suite

… enhancements, and Playwright tests

…plement plausibility checks

…nning

coderabbitai · 2026-06-25T06:41:51Z

📝 Walkthrough

Walkthrough

The PR adds feature-flag storage and admin APIs, new agent runtime flags and node routing with HITL escalation, backend trace/profiling endpoints, and frontend admin and agent-review UIs. Tooling, deployment, and test fixtures were updated to support the new flow.

Changes

Feature-flagged agent execution stack

Layer / File(s)	Summary
Workspace and toolchain wiring `.gitignore`, `agent/Dockerfile`, `agent/pyproject.toml`, `backend/Dockerfile`, `backend/pyproject.toml`, `docker-compose.yml`, `backend/app/config.py`, `backend/app/infra_init.py`, `backend/app/routers/...`, `backend/app/services/...`, `frontend/.env`, `frontend/index.html`, `frontend/package.json`, `frontend/playwright.config.ts`, `frontend/vite.config.ts`	Dependency, container, startup, and tooling updates change local bootstrapping for the agent and frontend workspaces.
Flag schema and admin API `backend/alembic/versions/*`, `backend/app/main.py`, `backend/app/routers/{flags.py,__init__.py}`, `backend/app/services/flag_service.py`, `backend/tests/conftest.py`, `core/src/core/models/models.py`	`config` schema tables and flag service/router add persisted flags, execution modes, and startup registration.
Shared cache and runtime primitives `core/src/core/{__init__.py,cache.py,config.py,langfuse.py,trino.py}`, `agent/src/agent/{config.py,state.py,llm.py}`, `agent/src/agent/utils/...`, `agent/tests/{conftest.py,test_isolation.py,test_cache_and_gates.py}`	Shared cache, Langfuse, Trino, runtime-flag, skill, enrichment, and SQL helpers add invocation-scoped defaults and reusable utilities; tests cover handler isolation, judge fail-fast handling, and Redis cache behavior.
Agent nodes and graph `agent/src/agent/{graph.py,nodes/...}`, `agent/tests/{test_cache_and_gates.py,test_routing.py,test_resilience.py}`	The LangGraph topology and node implementations add runtime flag and skill initialization, schema planning, refinement, satisfaction gates, and HITL escalation, with routing tests covering those branches.
Agent transport and backend observability `agent/src/agent/{mcp_server.py,routers/chat.py}`, `backend/app/routers/{agent.py,evaluation.py,orchestration.py,profiling.py,publish.py,health.py}`, `backend/app/services/{evaluator.py,profiling_engine.py,langfuse_client.py,llm_judge.py}`	Agent transport plus backend profiling/evaluation endpoints carry trace IDs, stream node events, and expose fix-suggestion and trace/timeline data.
Frontend flag administration shell `frontend/src/{App.tsx,index.css,config/constants.ts,api/flags.ts,components/layout/Sidebar.tsx}`, `frontend/src/pages/{FlagsPage.tsx,FlagsPage.css,LandingPage.tsx}`	The frontend adds the admin Flags route, flag/mode API client, navigation and theme updates, and the Flags page and landing-page copy changes.
Frontend agent execution and trace UI `frontend/src/{api/agent.ts,components/{AgentGraph.tsx,JsonTreeView.tsx,SchemaPlanDisplay.tsx,SchemaPlanDisplay.module.css,TraceTimeline.tsx},pages/{AgentTestingPage.tsx,AgentTestingPage.module.css}}`, `frontend/src/tests/components.test.tsx`, `frontend/tests/{agent-testing.spec.ts,real-agent.spec.ts}`	The agent sandbox adds execution-path graphs, trace and schema-plan viewers, interrupt forms, and end-to-end checks for completed, interrupted, and real-agent flows.

Sequence Diagram(s)

sequenceDiagram
  participant AgentTestingPage
  participant "backend/app/routers/agent.py" as AgentRouter
  participant "agent/src/agent/mcp_server.py" as MCPServer
  participant "agent/src/agent/graph.py" as AgentGraph
  AgentTestingPage->>AgentRouter: POST /api/agent/chat
  AgentRouter->>MCPServer: chat_with_agent(...)
  MCPServer->>AgentGraph: agent_graph.ainvoke(...)
  AgentTestingPage->>AgentRouter: GET /api/agent/traces/{traceId}
  AgentRouter-->>AgentTestingPage: trace spans JSON

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Possibly related PRs

StavPonte11/text2sql-onboarding#10 — Also rewrites the agent graph’s routing and rejection handling in agent/src/agent/graph.py.
StavPonte11/text2sql-onboarding#13 — Also extends MCP agent wiring in agent/src/agent/mcp_server.py.

Poem

I hopped through flags and traces bright,
and nibbled schemas late at night.
The cache went boop, the graph said “hop,”
the sandbox spun without a stop.
A bunny grins: “This patch is spry!”

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 47.26% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title is related to the PR themes, but it is too vague to clearly describe the main changes.	Use a concise, specific title that names the primary changes, such as observability updates, HITL routing, and UX improvements.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch stav/observability-and-ux

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 58

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

backend/Dockerfile (1)

1-25: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Run the backend as a non-root user.

This image still defaults to root at runtime, so any backend compromise gets full container privileges. Add an unprivileged user after dependency sync and switch to it before startup.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/Dockerfile` around lines 1 - 25, The backend Dockerfile still runs as
root at runtime, so update the final image setup to use an unprivileged user.
After the uv sync step in the Dockerfile, create a non-root user and group,
ensure the app and virtualenv paths are readable/executable by that user, and
switch with USER before the container starts. Keep the existing WORKDIR, PATH,
and uv sync flow intact while making sure the runtime user can still access
/app/backend/.venv/bin and the copied workspace files.
Source: Linters/SAST tools

docker-compose.yml (1)

429-438: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Move these credentials out of source-controlled compose config.

This block hardcodes secrets and passwords directly in the repo, including JWT, Langfuse, and MinIO credentials. Even for local compose, that makes accidental reuse and leakage much more likely.

Suggested direction

-      - JWT_SECRET=...
-      - LANGFUSE_PUBLIC_KEY=...
-      - LANGFUSE_SECRET_KEY=...
-      - MINIO_ACCESS_KEY=...
-      - MINIO_SECRET_KEY=...
+      - JWT_SECRET=${JWT_SECRET}
+      - LANGFUSE_PUBLIC_KEY=${LANGFUSE_PUBLIC_KEY}
+      - LANGFUSE_SECRET_KEY=${LANGFUSE_SECRET_KEY}
+      - MINIO_ACCESS_KEY=${MINIO_ACCESS_KEY}
+      - MINIO_SECRET_KEY=${MINIO_SECRET_KEY}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker-compose.yml` around lines 429 - 438, The docker-compose environment
block is hardcoding secrets and passwords in source control. Move the sensitive
values used by the compose service (for example the JWT_SECRET, LANGFUSE_* keys,
and MINIO_* credentials) out of the compose file and load them from environment
variables or an untracked env file instead; keep the service definitions
referencing those variables so the existing compose setup still works without
embedding secrets in the repo.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agent/Dockerfile`:
- Around line 23-26: The container currently starts `uvicorn` as root, so update
the `Dockerfile` to create a dedicated non-root runtime user and switch to that
user before the `CMD` runs. Make sure the app files and virtualenv referenced by
the existing `PATH`/`uvicorn` setup remain accessible to that user, and keep the
final startup command unchanged aside from the user context.

In `@agent/src/agent/config.py`:
- Around line 69-71: The SCHEMA_CACHE_TTL and PROFILE_CACHE_TTL settings in
Config are unvalidated, so invalid env values can later cause CacheService.set()
to pass non-positive TTLs into Redis SETEX. Add positive-value validation to the
Config fields (or equivalent startup config validation) for these symbols so
only integers greater than zero are accepted, and ensure bad values fail fast
instead of silently disabling caching.

In `@agent/src/agent/graph.py`:
- Around line 65-75: The strict table-scoping check in validate_config_node is
happening before init_flags resolves runtime settings, so
TABLE_SCOPING_MODE=strict can skip the allowed_tables validation. Update the
config validation flow in graph.py so the strict-mode allowed_tables fail-fast
check runs after runtime flags are applied, or is repeated once flags are
initialized, and keep the same InvalidConfigurationException guard in
validate_config_node/schema_explorer-related path.

In `@agent/src/agent/mcp_server.py`:
- Around line 45-50: `langfuse_handler` can remain unbound when
`get_langfuse_handler()` fails, which later breaks the tool response path in
`MCPServer` when building callbacks. Initialize `langfuse_handler` to a safe
default before the try block in `mcp_server.py`, then keep the existing
exception fallback so `callbacks` is still set even if handler creation fails;
make sure the later response handling code references the always-defined
`langfuse_handler` symbol.
- Around line 184-188: The MCP server starts before suggest_fixes is registered,
so the tool is missing in stdio mode. Move the mcp.run(transport="stdio") call
to after all `@mcp.tool`() definitions, or otherwise ensure suggest_fixes is
decorated before __main__ execution begins. Use the mcp object and the
suggest_fixes function as the key locations when reordering the module so every
tool is registered before startup.
- Around line 57-66: The HITL resume path in the MCP server is reusing stale
checkpoint state because `Command(resume=...)` is called without clearing prior
error fields. Update the thread state before resuming, either by using
`Command(update=..., resume=...)` or by issuing the equivalent state update in
`agent_graph` around the existing `state_snapshot` / `ainvoke` flow, so
`last_error`, `trino_error`, and retry/escalation fields are reset before
`resume_value` is applied.

In `@agent/src/agent/nodes/finalizer.py`:
- Around line 74-88: Guard the inline preview serialization in finalizer.py’s
fallback path so it cannot crash when `inline_result_rows` contains
non-JSON-native values. In the `finalizer` logic that builds `preview_info` and
`preview_str`, add safe serialization for values like `datetime`, `date`,
`Decimal`, or a fallback sanitizer/default handler before calling `json.dumps`.
Keep the existing preview behavior and make sure the Esca persistence failure
path still returns a preview instead of raising `TypeError`.
- Line 12: The finalizer LLM is being created once at module load via
get_llm("finalizer"), so both the summary flow in Finalizer and
get_sql_explanation() ignore runtime_flags resolved by init_flags_node. Update
the finalizer stage to instantiate the LLM per invocation using the current
runtime_flags, and thread that dynamic LLM selection through
get_sql_explanation() and the summary path so model choice follows the
flag-driven configuration.

In `@agent/src/agent/nodes/init_skills.py`:
- Around line 40-47: The loaded-skills tracing block in init_loaded_skills is
referencing langfuse_client before the inner import assigns it, which makes the
warning path trigger incorrectly. Move the import of langfuse_client to the top
of the function or module, before the call to
langfuse_client.get_current_trace_id(), and keep get_langfuse_handler and
update_current_span using that already-imported symbol.

In `@agent/src/agent/nodes/query_builder.py`:
- Around line 1-2: The async query_builder_node is still using the synchronous
Redis publisher, which can block the event loop. Update the event publish call
in query_builder_node to use publish_node_event instead of
publish_node_event_sync, and await it like the other async nodes. Keep the
change aligned with the existing RunnableConfig flow so the node remains fully
non-blocking.
- Around line 61-86: The query_builder node is dropping the generated SQL
explanation from the returned AgentState in both the non_interactive and
approved branches. Update the return values in query_builder so they also
include sql_explanation alongside sql_query, refinement_count, trino_error,
feedback, and execution_path. Keep the interrupt payload as-is, but make sure
downstream consumers can read the explanation from the state after the function
returns.

In `@agent/src/agent/nodes/refiner.py`:
- Around line 40-72: The refiner’s runtime-configurable iteration cap in
`refiner` is not aligned with graph routing, so `route_refiner` can send
execution to HITL too early or too late. Update the routing logic to use the
same per-invocation max iteration value derived from `runtime_flags` (or pass it
through the state alongside `refinement_count`) instead of comparing only
against `MAX_REFINER_ITERATIONS`. Keep the threshold used by `refiner` and
`route_refiner` consistent so both paths honor the same limit.
- Around line 107-125: The ESCA fallback path in refiner.py serializes
payload_data before entering the try block, so json.dumps can raise and bypass
the esca_write_failed handling and inline_result_rows fallback. Move the payload
serialization into the same try/except used around get_esca_client().save_data,
or add a dedicated exception handler around json.dumps in the refiner logic, and
make sure the failure path still sets esca_write_failed and returns
inline_result_rows consistently.

In `@agent/src/agent/nodes/satisfaction_check.py`:
- Around line 171-179: The satisfaction_check flow is only recording failures,
but refiner_node is not using them to trigger refinement for non-Trino issues.
Update the satisfaction_check partial payload and the refiner_node decision
logic so semantic, column, and plausibility failures in
satisfaction_failures/last_error cause a refinement pass instead of retrying the
same SQL, using the existing execution_path and failure fields to route into a
real refinement path.

In `@agent/src/agent/nodes/schema_explorer.py`:
- Around line 144-168: The strict allowlist handling in schema_explorer’s table
filtering only matches id, name, and schema.name, so fully qualified
catalog.schema.table entries are rejected and no candidates are surfaced. Update
the strict path in the table-scoping logic to recognize the same fully qualified
identifier format used for LLM-exposed table names, and make the same change
anywhere else the allowlist comparison is duplicated so both strict and related
filtering paths accept catalog.schema.table entries.
- Around line 272-278: The cache key in get_table_profile is only based on
table_id and profile.id, so cached schema results can stay stale after the
TableProfile row is updated in place. Update the cache key generation in
schema_explorer to include a profile freshness token (for example, a
version/updated-at/hash field from the TableProfile) and apply the same key
change anywhere the profile cache is read or written, including the other
referenced cache sites. Ensure the cache lookup and any corresponding set paths
use the same new key shape so re-profiling and summary updates invalidate prior
entries naturally.
- Around line 373-380: The schema enrichment flags are being resolved in
schema_explorer but never used because the corresponding phase blocks are
commented out, so enabling SCHEMA_SEMANTIC_TYPING, SCHEMA_JOIN_GRAPH, or
SCHEMA_AMBIGUITY_DETECT has no effect. Re-enable or restore the guarded
execution in the schema exploration flow so these flags actually control their
matching phases, using the existing variables like schema_semantic_typing,
schema_join_graph, and schema_ambiguity_detect in the relevant phase methods and
control path.

In `@agent/src/agent/routers/chat.py`:
- Around line 81-97: The allowed_tables validation in chat routing is doing a
full catalog scan by loading every Table row and checking each request value in
Python. Update the validation logic in the chat router to query only the
requested identifiers from the Table model, including ids, names, and parsed
schema_name.name pairs, and then verify the matched count instead of
materializing all tables. Use the existing request.allowed_tables handling in
the chat route to keep the fix localized and preserve the current 400 error
behavior for missing tables.

In `@agent/src/agent/utils/flag_bridge.py`:
- Around line 106-107: The flag lookup path in resolve_flags and the related
backend fetch logic currently performs fresh HTTP requests on every run, which
puts backend latency on the hot path. Add a small TTL cache for the flag map in
flag_bridge.py so repeated calls can reuse recent results, and apply
execution_mode overrides from the cached map instead of making a second request
when resolve_flags is invoked.
- Around line 109-113: The backend flag merge in flag_bridge should validate and
coerce values before placing them into resolved, because raw DB JSON can include
invalid types that later break llm.py when flags are parsed. Update the merge
logic in the flag resolution path (including the related handling around the
other referenced block) to accept only known numeric and boolean flags, convert
valid values to the expected Python types, and skip any entries that are null or
malformed instead of copying them through unchanged.

In `@agent/src/agent/utils/redis_publisher.py`:
- Around line 9-14: The sync Redis client created in get_sync_redis() has no
timeout settings, so publish_node_event_sync() can block the request path if
Redis is slow or unreachable. Update the redis.from_url initialization to set
appropriate connect/read/socket timeout behavior (or equivalent Redis client
timeout options) and make sure the same bounded client is used by
publish_node_event_sync() and the related publish path in redis_publisher.

In `@agent/src/agent/utils/schema_enrichment.py`:
- Around line 373-375: The ambiguity extraction flow in the helper that calls
llm.with_structured_output and awaits structured.ainvoke currently assumes the
response is an AmbiguityOutput instance only. Update this path to also accept a
raw dict-shaped result, similar to the guards used in run_semantic_typing() and
run_schema_summarization(), and normalize it before reading ambiguity_notes.
Make sure the helper returns the notes from either shape instead of falling into
the exception path and returning an empty list.
- Around line 199-210: The join-path discovery in schema_enrichment is too
order-dependent because it only loads ForeignKeyMapping and CrossTableProfile
rows for the initial table_ids in one direction, so valid bridges and
reverse-orientation joins can be missed. Update the join graph construction and
traversal in schema_enrichment helpers (including the candidate-path logic
around the ForeignKeyMapping and CrossTableProfile queries) to consider both
directions and all relevant edges between candidate tables, not just
source_table_id/table_id matches from the initial input order.

In `@agent/src/agent/utils/sql.py`:
- Around line 12-19: The clean_sql helper only strips fences when the response
begins with a code block, so preambles like “Here is the query:” can leave
non-SQL text behind. Update the logic in clean_sql to first detect a fenced SQL
block anywhere in the string (using the fenced extraction approach), and only
fall back to the existing start/end fence trimming when no block is found. Keep
the trailing semicolon cleanup after extracting the SQL content.

In `@agent/tests/conftest.py`:
- Around line 68-117: The Redis test double in MockRedisAsync and
MockRedisPipeline is out of sync with SkillRegistry.get_skills(), which now
relies on redis.mget(...) and pipeline.setex(...). Update the mock classes so
MockRedisAsync supports mget for batched reads and MockRedisPipeline supports
setex alongside delete, and make execute() reflect both queued writes and
deletes. Keep the behavior aligned with the cache flow exercised by
SkillRegistry.get_skills() so Redis-backed skill caching is actually covered by
tests.
- Around line 121-128: The mock_trino fixture is returning a
TrinoExecutionResult shape that does not match the real contract used by
core.trino. Update the _execute_query_sync helper in mock_trino to construct
TrinoExecutionResult with the same field names and payload structure as
core/trino.py, including error_message instead of error and rows as
list[list[Any]] rather than dict rows, so tests exercise realistic results.

In `@agent/tests/test_isolation.py`:
- Around line 39-52: The test is patching core.llm.evaluate_with_llm but then
calling the already-imported evaluate_with_llm symbol, so the mock never
applies. Update test_isolation.py to invoke the function through the core.llm
module reference (or import the module and patch/call that same symbol) so the
patched object is actually used. Also remove the broad ImportError swallow; if
core.llm is missing or broken, let the test fail or skip explicitly so it does
not pass silently.

In `@agent/tests/test_resilience.py`:
- Around line 117-119: The test currently only checks that summary exists, which
can pass even if finalizer_node does nothing. Update the assertion in
test_resilience.py to verify the fallback behavior by checking that summary
actually changes from the input state, or mock the downstream LLM path used by
finalizer_node and assert the result was derived from inline_result_rows. Use
the finalizer_node and inline_result_rows symbols to locate the relevant test
setup.

In `@backend/app/infra_init.py`:
- Line 27: Move the os import into the main top-level import section in
infra_init.py so it sits with the other module imports and no longer triggers
Ruff E402. Keep the import grouped with the existing imports near the top of the
file, and remove the standalone import placement currently below the initial
block.
- Line 552: The Iceberg metadata initialization in infra_init.py is reading
DATABASE_URL, but that config can point to SQLite in backend/app/config.py, so
psycopg2.connect() may fail. Add and use a separate Postgres-only DSN/config for
this init path (for example in the Iceberg setup helper that builds conn_str)
and stop falling back to the app database URL so the metadata connection is
always Postgres-compatible.
- Around line 27-38: The infra initialization still reads environment values
directly in infra_init instead of using app.config.settings, so it bypasses the
centralized settings contract and .env loading. Add the missing MINIO_HOST,
MINIO_ACCESS_KEY, and MINIO_SECRET_KEY fields to Settings, then update
infra_init to consume all infra values from settings rather than os.getenv for
MINIO, TRINO, and OPENMETADATA configuration.

In `@backend/app/main.py`:
- Line 129: The flags router is being mounted globally through
api_router.include_router(flags.router), which exposes the flag map endpoint to
unauthenticated callers. Update the flags-related routing so the map endpoint in
the flags router is protected by admin/service authentication, or move it to an
internal-only router before including it in the main API router. Verify the
change against the flags.router and its map handler so the full flag map is no
longer publicly reachable.
- Around line 44-45: The warning in main.py incorrectly labels the missing
LLM_API_KEY branch as a startup failure even though startup continues. Update
the log message in the startup settings check to use non-failure wording, and
keep the reference in the same conditional around settings.LLM_API_KEY and
logger.warning so dashboards and alerts are not triggered by a false failure
signal.

In `@backend/app/routers/__init__.py`:
- Line 12: The __all__ definition in the app.routers package is using module
objects instead of export names, which breaks wildcard imports. Update the
__all__ assignment in the package initializer to list the module names as
strings for tables, enrichment, questions, evaluation, flags, publish, scopes,
and audit so that from app.routers import * works correctly.

In `@backend/app/routers/agent.py`:
- Around line 44-49: The structured HITL rejection payload is being narrowed by
the `resume_value` union in `agent.py`, causing `QueryApproval` fields like
`rejection_category` and `suggested_fix` to be lost. Update the `resume_value`
typing used by the agent router so the full rejection object is preserved,
either by expanding `QueryApproval` to include the missing fields or by
narrowing the union to a type that keeps the complete payload intact. Use the
`resume_value` declaration in the agent router as the place to fix the schema.

In `@backend/app/routers/flags.py`:
- Around line 70-76: The read endpoints in Flag router are exposing internal
rollout state without authentication. Update the handlers in the flag router,
including get_flag_map and the mode-related endpoints around the referenced flag
service methods, to require admin or explicit service-to-service auth instead of
being publicly callable. Reuse the existing _get_admin dependency or equivalent
auth guard consistently across /map, /modes/map, and /modes/{name} so internal
consumers still work while unauthenticated callers are blocked.

In `@backend/app/routers/health.py`:
- Around line 200-207: The health check in the ESCA configuration path currently
returns a config_error payload while still being treated as HTTP 200, so update
the health endpoint logic in the health router to send a 503 response whenever
api_key/base_url are missing or dummy values are detected. Use the existing
config validation branch in the health handler to map this misconfiguration case
to a failing HTTP status instead of a successful one.

In `@backend/app/services/flag_service.py`:
- Around line 153-166: The override map in FlagService.get_map is including
reset FeatureFlag rows with a None value, which causes deleted flags to leak
into the cached map. Update get_map to filter out rows whose value is None
before building and caching the map, or change delete() so it truly removes the
row; use the existing get_map, _try_cache_get, _try_cache_set, and delete
methods to keep the “missing row means no DB override” behavior intact.

In `@backend/app/services/profiling_engine.py`:
- Around line 1086-1092: The column summary builder in profiling_engine is still
forwarding raw categorical top values into the LLM context, which should be
removed. Update the logic around result.column_stats, c.is_categorical, and
c.top_values so it only includes schema-level metadata (for example count/shape
or a generic categorical indicator) and never extracts or joins the actual value
strings. While making this change, also make the summary generation resilient to
malformed top_values entries so a missing "value" key does not break the whole
profiling summary; apply the same fix to the other categorical-summary block
mentioned in the review.

In `@core/src/core/cache.py`:
- Around line 113-117: The catalog preflight cache key in catalog_valid_key() is
missing the catalog dimension, so identical schema/table sets can collide across
different catalogs. Update catalog_valid_key() to accept the catalog identifier
and include it in the returned Redis key alongside schema and the table hash,
then adjust any callers of catalog_valid_key() to pass the catalog so preflight
results stay isolated per catalog.

In `@core/src/core/langfuse.py`:
- Around line 7-12: The Langfuse handler is being created before the process has
the required environment configuration, so tracing can initialize without
credentials. In get_langfuse_handler(), ensure LANGFUSE_PUBLIC_KEY,
LANGFUSE_SECRET_KEY, and LANGFUSE_BASE_URL are loaded into the process from .env
before constructing CallbackHandler(), and keep the dependency focused on
creating the handler rather than passing unsupported config directly into
CallbackHandler.

In `@core/src/core/models/models.py`:
- Around line 854-857: `ExecutionModeUpsert.flag_overrides` is currently an
unrestricted dict, so execution modes can store invalid flag names or mismatched
value types. Update the execution-mode upsert flow that accepts
`ExecutionModeUpsert` to validate every override key/value against the existing
`FeatureFlag` registry before persisting, using the same rules as
`validate_flag_type()` for direct flag writes. Reject the upsert with a clear
validation error if any override is unknown or has the wrong type, and keep
`get_mode_overrides()` returning only already-validated data.

In `@core/src/core/trino.py`:
- Around line 72-75: The parameter dispatch in the SQL execution path is using
truthiness instead of presence, so empty containers skip the bound-parameter
branch. Update the logic in the method that calls cur.execute to check params is
not None rather than if params, and keep the parameterized cur.execute(sql,
params) path for empty lists/tuples/dicts as well.

In `@docker-compose.yml`:
- Line 409: The startup flow for the backend runs app.seed and app.infra_init
immediately after container start, but _ensure_minio_bucket in infra_init can
hit MinIO before it is ready. Update the initialization path to wait/retry for
MinIO readiness inside _ensure_minio_bucket, or change the docker-compose MinIO
dependency to a health-based gate instead of service_started. Keep the fix
localized to app.infra_init and the compose command so backend startup is not
flaky when MinIO comes up slowly.

In `@frontend/src/api/flags.ts`:
- Around line 65-75: The execution-mode helpers in flags.ts are building URLs
with raw name values, so names containing reserved characters can break the
request path. Update getMode, upsertMode, and deleteMode to encode the mode name
before interpolating it into /flags/modes/${name}, using the existing
fetchWithAdminEmail calls as the entry points for the fix.
- Around line 4-12: The admin email header is being used as the auth credential
in fetchWithAdminEmail, which is forgeable from the browser. Update this flow so
the client sends a real authenticated session/JWT instead of X-Admin-Email, and
adjust the backend flags router/auth checks to derive the actor from verified
auth context rather than a client-supplied email. Keep the fetch helper and any
related flag write paths aligned with the new authentication mechanism.

In `@frontend/src/App.tsx`:
- Around line 105-123: The initial document direction and language are not
synced with i18n.language, so a persisted Hebrew locale renders Ant Design RTL
while the root DOM stays LTR until LanguageToggle runs. Update App and the
existing language-sync logic so the same i18n.language value also sets the
document-level dir and lang on first render, not only inside LanguageToggle,
keeping html attributes aligned with ConfigProvider direction.

In `@frontend/src/components/SchemaPlanDisplay.tsx`:
- Around line 179-183: The review UI is dropping valid falsy plan values because
it uses truthy checks in the value rendering and limit handling logic. Update
SchemaPlanDisplay’s cell render path and the planData.limit handling to prefer
nullish checks instead of `||`/truthiness so `0` and `false` are preserved; use
the existing render function and limit-related logic to locate the affected
branches, and make the limit condition explicit with `!== undefined` so a valid
`LIMIT 0` is still shown.

In `@frontend/src/index.css`:
- Around line 1-17: Normalize the new typography block in index.css to satisfy
Stylelint: replace the remote font import with the project’s preferred import
notation and make the font-family keywords in the :root variables consistent
with the configured casing rules. Update the `@import` line and the
--sans/--heading/--mono declarations so the values match the lint expectations
in this stylesheet and the violations go away.

In `@frontend/src/pages/AgentTestingPage.module.css`:
- Around line 172-214: The new animation names in AgentTestingPage.module.css
violate the enforced kebab-case keyframe naming rule. Rename the `@keyframes`
definitions and every matching animation reference used by classes like
spinnerText, animateIn, and the other affected animation styles (including
pulseIconAnim and pulseDotAnim) to kebab-case so stylelint passes consistently
across the module.

In `@frontend/src/pages/AgentTestingPage.tsx`:
- Around line 962-991: The SSE subscription is created too late in the
AgentTestingPage flow, so early execution events can be missed before the stream
opens. Move the EventSource setup into the start of the run lifecycle in
AgentTestingPage (around the mutate/submit path used for chatMutation) so it
subscribes before or at the same time the POST begins, and keep the existing
onmessage/onerror handling and cleanup tied to threadId. Ensure the live
execution path state is connected before triggering the request rather than
waiting for chatMutation.isPending to flip.
- Around line 907-947: The trace polling in AgentTestingPage’s useEffect can
outlive the current traceId and write stale spans after a new run or reset.
Update the fetchTrace retry chain to track and clear any pending timeout when
the effect re-runs or unmounts, and ignore late axios responses if traceId has
changed before updating setTraceSpans or setLoadingTrace. Use the useEffect
cleanup around fetchTrace, attempts, and the setTimeout retry path to cancel the
old poller safely.

In `@frontend/src/pages/FlagsPage.tsx`:
- Around line 422-436: Prevent submitting a mode when modeDraft.name is empty,
since submitMode and the related save path still call upsertModeMutation with a
blank name and hit /flags/modes/ instead of blocking in the UI. Add a guard in
submitMode (and the other referenced save flow) to return early or disable
submission when the name is missing, and ensure any validation around
modeDraft.name is checked before constructing the request payload.
- Around line 163-213: The icon-only action buttons in FlagsPage are missing
accessible names, so screen readers cannot identify the
edit/save/cancel/reset/delete controls even though they have tooltips. Add
explicit accessible labels to each icon-only Button in the flag row actions and
related action areas, using the existing action context in the FlagsPage
component so the labels match the visible tooltip meaning. Update the relevant
button instances in the edit/display controls and the additional action blocks
referenced in this page to ensure every icon-only action has a clear accessible
name.
- Around line 37-56: The admin flags UI is only rendering the hard-coded
FLAG_CATEGORIES entries, so any backend flag not listed there is omitted from
the page. Update FlagsPage so the category filtering/selection logic includes
uncategorized flags returned by /flags/ as a fallback bucket (or equivalent),
ensuring every flag can still render and be edited; use the FLAG_CATEGORIES
definition and filteredCategories logic in FlagsPage to locate the change.

In `@frontend/src/pages/LandingPage.tsx`:
- Around line 353-366: The footer links in LandingPage’s footer sections are
still placeholder anchors with href="#" and should be replaced before shipping.
Update the anchor targets in the footer markup for the Platform, Resources, and
Support sections to point to real routes or documented destinations, and make
sure the footer items behave as actual navigation links for keyboard and
screen-reader users.

In `@frontend/tests/agent-testing.spec.ts`:
- Around line 28-37: The agent-testing spec is targeting outdated success-path
UI selectors, so update the assertions in the test to match the current
components. In the flow that uses page.getByRole for the submit button, change
the button name from the old Ask Agent label to the current Run Agent label, and
replace the wait on the global .agent-result selector with the actual
AgentResultDisplay success indicators (the CSS-module-backed text such as Agent
Execution Successful or TASK COMPLETED). Keep the rest of the interaction logic
in the test unchanged.

In `@frontend/tests/real-agent.spec.ts`:
- Around line 4-27: The real-agent Playwright spec is not CI-safe because it
hardcodes a local dev URL and asserts on text that the page does not render.
Update the test in real-agent.spec.ts to use Playwright’s configured base URL
(instead of localhost:3001) and align the success wait in the test flow with the
actual completion labels rendered by the agent-testing page, using the existing
test('should run query against real backend') block and its page.goto/expect
locator checks.

---

Outside diff comments:
In `@backend/Dockerfile`:
- Around line 1-25: The backend Dockerfile still runs as root at runtime, so
update the final image setup to use an unprivileged user. After the uv sync step
in the Dockerfile, create a non-root user and group, ensure the app and
virtualenv paths are readable/executable by that user, and switch with USER
before the container starts. Keep the existing WORKDIR, PATH, and uv sync flow
intact while making sure the runtime user can still access
/app/backend/.venv/bin and the copied workspace files.

In `@docker-compose.yml`:
- Around line 429-438: The docker-compose environment block is hardcoding
secrets and passwords in source control. Move the sensitive values used by the
compose service (for example the JWT_SECRET, LANGFUSE_* keys, and MINIO_*
credentials) out of the compose file and load them from environment variables or
an untracked env file instead; keep the service definitions referencing those
variables so the existing compose setup still works without embedding secrets in
the repo.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 631c59c1-2a13-4093-9ede-ee978dc04af6

📥 Commits

Reviewing files that changed from the base of the PR and between 1e07297 and a1c6cfa.

⛔ Files ignored due to path filters (3)

agent/uv.lock is excluded by !**/*.lock
backend/uv.lock is excluded by !**/*.lock
frontend/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (90)

.gitignore
agent/.coverage
agent/Dockerfile
agent/pyproject.toml
agent/src/agent/config.py
agent/src/agent/graph.py
agent/src/agent/llm.py
agent/src/agent/mcp_server.py
agent/src/agent/nodes/extractor.py
agent/src/agent/nodes/finalizer.py
agent/src/agent/nodes/init_flags.py
agent/src/agent/nodes/init_skills.py
agent/src/agent/nodes/query_builder.py
agent/src/agent/nodes/refiner.py
agent/src/agent/nodes/satisfaction_check.py
agent/src/agent/nodes/schema_explorer.py
agent/src/agent/routers/chat.py
agent/src/agent/state.py
agent/src/agent/utils/esca.py
agent/src/agent/utils/flag_bridge.py
agent/src/agent/utils/jeen_client.py
agent/src/agent/utils/redis_publisher.py
agent/src/agent/utils/schema_enrichment.py
agent/src/agent/utils/skill_registry.py
agent/src/agent/utils/sql.py
agent/tests/conftest.py
agent/tests/test_cache_and_gates.py
agent/tests/test_isolation.py
agent/tests/test_resilience.py
agent/tests/test_routing.py
backend/Dockerfile
backend/alembic/versions/f9a3d1c8e205_add_config_schema_flags.py
backend/app/config.py
backend/app/infra_init.py
backend/app/main.py
backend/app/routers/__init__.py
backend/app/routers/admin_auth.py
backend/app/routers/agent.py
backend/app/routers/audit.py
backend/app/routers/evaluation.py
backend/app/routers/feedback.py
backend/app/routers/flags.py
backend/app/routers/health.py
backend/app/routers/orchestration.py
backend/app/routers/profiling.py
backend/app/routers/publish.py
backend/app/routers/questions.py
backend/app/routers/scopes.py
backend/app/services/auth.py
backend/app/services/evaluator.py
backend/app/services/flag_service.py
backend/app/services/join_detection.py
backend/app/services/langfuse_client.py
backend/app/services/llm_judge.py
backend/app/services/profiling_engine.py
backend/app/services/scoring.py
backend/app/services/trino_client.py
backend/pyproject.toml
backend/tests/conftest.py
core/src/core/__init__.py
core/src/core/cache.py
core/src/core/config.py
core/src/core/langfuse.py
core/src/core/models/models.py
core/src/core/trino.py
docker-compose.yml
frontend/.env
frontend/index.html
frontend/package.json
frontend/playwright.config.ts
frontend/src/App.tsx
frontend/src/api/agent.ts
frontend/src/api/flags.ts
frontend/src/components/AgentGraph.tsx
frontend/src/components/JsonTreeView.tsx
frontend/src/components/SchemaPlanDisplay.module.css
frontend/src/components/SchemaPlanDisplay.tsx
frontend/src/components/TraceTimeline.tsx
frontend/src/components/layout/Sidebar.tsx
frontend/src/config/constants.ts
frontend/src/index.css
frontend/src/pages/AgentTestingPage.module.css
frontend/src/pages/AgentTestingPage.tsx
frontend/src/pages/FlagsPage.css
frontend/src/pages/FlagsPage.tsx
frontend/src/pages/LandingPage.tsx
frontend/src/tests/components.test.tsx
frontend/tests/agent-testing.spec.ts
frontend/tests/real-agent.spec.ts
frontend/vite.config.ts

💤 Files with no reviewable changes (2)

backend/app/services/trino_client.py
backend/app/services/scoring.py

coderabbitai · 2026-06-26T11:32:24Z

 # Add virtualenv bin to PATH to run uvicorn
 ENV PATH="/app/agent/.venv/bin:$PATH"

-CMD ["uvicorn", "agent.main:app", "--host", "0.0.0.0", "--port", "8001"]
+CMD ["uvicorn", "agent.main:app", "--host", "0.0.0.0", "--port", "8001"]


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Drop root before the image starts.

The final container still runs uvicorn as root. Any compromise in agent.main gets full container privileges unnecessarily. Create a dedicated runtime user and switch to it before CMD.

Suggested change

# Add virtualenv bin to PATH to run uvicorn ENV PATH="/app/agent/.venv/bin:$PATH" + +RUN addgroup --system app && adduser --system --ingroup app app && \ + chown -R app:app /app +USER app CMD ["uvicorn", "agent.main:app", "--host", "0.0.0.0", "--port", "8001"]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Add virtualenv bin to PATH to run uvicorn

ENV PATH="/app/agent/.venv/bin:$PATH"

CMD ["uvicorn", "agent.main:app", "--host", "0.0.0.0", "--port", "8001"]

CMD ["uvicorn", "agent.main:app", "--host", "0.0.0.0", "--port", "8001"]

# Add virtualenv bin to PATH to run uvicorn

ENV PATH="/app/agent/.venv/bin:$PATH"

RUN addgroup --system app && adduser --system --ingroup app app && \

chown -R app:app /app

USER app

CMD ["uvicorn", "agent.main:app", "--host", "0.0.0.0", "--port", "8001"]

🧰 Tools

🪛 Checkov (3.3.1)

[low] 1-26: Ensure that HEALTHCHECK instructions have been added to container images

(CKV_DOCKER_2)

[low] 1-26: Ensure that a user for the container has been created

(CKV_DOCKER_3)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/Dockerfile` around lines 23 - 26, The container currently starts `uvicorn` as root, so update the `Dockerfile` to create a dedicated non-root runtime user and switch to that user before the `CMD` runs. Make sure the app files and virtualenv referenced by the existing `PATH`/`uvicorn` setup remain accessible to that user, and keep the final startup command unchanged aside from the user context.

Source: Linters/SAST tools

coderabbitai · 2026-06-26T11:32:25Z

+    # ── G2-05: Redis Schema Cache ─────────────────────────────────────────────
+    SCHEMA_CACHE_TTL: int = 600    # seconds — DDL content
+    PROFILE_CACHE_TTL: int = 1800  # seconds — table profile statistics


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Validate cache TTLs as positive values.

These settings are unbounded, but CacheService.set() feeds them into Redis SETEX, which rejects ttl <= 0 and only logs the failure. A bad env var here quietly disables schema/profile caching for the whole service.

Suggested change

- SCHEMA_CACHE_TTL: int = 600 # seconds — DDL content - PROFILE_CACHE_TTL: int = 1800 # seconds — table profile statistics + SCHEMA_CACHE_TTL: int = Field(default=600, gt=0) # seconds — DDL content + PROFILE_CACHE_TTL: int = Field(default=1800, gt=0) # seconds — table profile statistics

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# ── G2-05: Redis Schema Cache ─────────────────────────────────────────────

SCHEMA_CACHE_TTL: int = 600 # seconds — DDL content

PROFILE_CACHE_TTL: int = 1800 # seconds — table profile statistics

# ── G2-05: Redis Schema Cache ─────────────────────────────────────────────

SCHEMA_CACHE_TTL: int = Field(default=600, gt=0) # seconds — DDL content

PROFILE_CACHE_TTL: int = Field(default=1800, gt=0) # seconds — table profile statistics

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/src/agent/config.py` around lines 69 - 71, The SCHEMA_CACHE_TTL and PROFILE_CACHE_TTL settings in Config are unvalidated, so invalid env values can later cause CacheService.set() to pass non-positive TTLs into Redis SETEX. Add positive-value validation to the Config fields (or equivalent startup config validation) for these symbols so only integers greater than zero are accepted, and ensure bad values fail fast instead of silently disabling caching.

coderabbitai · 2026-06-26T11:32:25Z

+    mode: str = state.get("scoping_mode") or settings.TABLE_SCOPING_MODE
+
+    if mode == "strict":
+        allowed = state.get("allowed_tables")
+        if not allowed:
+            raise InvalidConfigurationException(
+                "scoping_mode='strict' requires allowed_tables to be a non-empty list. "
+                "Execution aborted to prevent unrestricted table access."
+            )
+
+    return {"scoping_mode": mode, "execution_path": ["validate_config"]}


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Validate table scoping after runtime flags are resolved.

validate_config_node runs before init_flags, so TABLE_SCOPING_MODE=strict coming from execution-mode/runtime flags can bypass the allowed_tables fail-fast check and only take effect later in schema_explorer.

Proposed fix

def validate_config_node(state: AgentState, config: RunnableConfig | None = None) -> dict: @@ - mode: str = state.get("scoping_mode") or settings.TABLE_SCOPING_MODE + runtime_flags = state.get("runtime_flags") or {} + mode: str = ( + state.get("scoping_mode") + or runtime_flags.get("TABLE_SCOPING_MODE") + or settings.TABLE_SCOPING_MODE + ) @@ -workflow.add_edge(START, "validate_config") -workflow.add_edge("validate_config", "init_flags") -workflow.add_edge("init_flags", "init_skills") +workflow.add_edge(START, "init_flags") +workflow.add_edge("init_flags", "validate_config") +workflow.add_edge("validate_config", "init_skills")

Also applies to: 229-231

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/src/agent/graph.py` around lines 65 - 75, The strict table-scoping check in validate_config_node is happening before init_flags resolves runtime settings, so TABLE_SCOPING_MODE=strict can skip the allowed_tables validation. Update the config validation flow in graph.py so the strict-mode allowed_tables fail-fast check runs after runtime flags are applied, or is repeated once flags are initialized, and keep the same InvalidConfigurationException guard in validate_config_node/schema_explorer-related path.

coderabbitai · 2026-06-26T11:32:25Z

+    try:
+        from core.langfuse import get_langfuse_handler
+        langfuse_handler = get_langfuse_handler()
+        callbacks = [langfuse_handler] if langfuse_handler else []
+    except Exception:
+        callbacks = []


🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win

Initialize langfuse_handler before the try.

If handler creation raises, callbacks is set but langfuse_handler remains unbound; Line 127 then crashes the tool response path.

Proposed fix

+ langfuse_handler = None try: from core.langfuse import get_langfuse_handler langfuse_handler = get_langfuse_handler() callbacks = [langfuse_handler] if langfuse_handler else []

Also applies to: 126-127

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/src/agent/mcp_server.py` around lines 45 - 50, `langfuse_handler` can remain unbound when `get_langfuse_handler()` fails, which later breaks the tool response path in `MCPServer` when building callbacks. Initialize `langfuse_handler` to a safe default before the try block in `mcp_server.py`, then keep the existing exception fallback so `callbacks` is still set even if handler creation fails; make sure the later response handling code references the always-defined `langfuse_handler` symbol.

coderabbitai · 2026-06-26T11:32:25Z

    allowed = allowed_tables or []
    statuses = allowed_statuses or ["production"]
    allowed_tables_set = []
    allowed_ids = set()
+
    for table in all_tables:
-        is_allowed = (
-            table.status in statuses or
-            (allowed and (
-                table.id in allowed or
-                table.name in allowed or
-                f"{table.schema_name}.{table.name}" in allowed
-            ))
-        )
+        if scoping_mode == "strict":
+            # Hard allowlist: only tables explicitly named in allowed_tables
+            is_allowed = bool(
+                allowed
+                and (
+                    table.id in allowed
+                    or table.name in allowed
+                    or f"{table.schema_name}.{table.name}" in allowed
+                )
+            )
+        else:
+            # Hybrid: production/status union OR explicit allowed list
+            is_allowed = table.status in statuses or (
+                allowed
+                and (
+                    table.id in allowed
+                    or table.name in allowed
+                    or f"{table.schema_name}.{table.name}" in allowed
+                )


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Accept fully qualified names in strict allowlists.

The candidate table names exposed to the LLM use catalog.schema.table, but strict filtering only accepts id, name, or schema.name. A valid fully qualified allowlist entry will produce no candidates.

Proposed fix

- allowed = allowed_tables or [] + allowed = set(allowed_tables or []) @@ table.id in allowed or table.name in allowed or f"{table.schema_name}.{table.name}" in allowed + or f"{table.catalog}.{table.schema_name}.{table.name}" in allowed @@ table.id in allowed or table.name in allowed or f"{table.schema_name}.{table.name}" in allowed + or f"{table.catalog}.{table.schema_name}.{table.name}" in allowed

Also applies to: 436-438

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/src/agent/nodes/schema_explorer.py` around lines 144 - 168, The strict allowlist handling in schema_explorer’s table filtering only matches id, name, and schema.name, so fully qualified catalog.schema.table entries are rejected and no candidates are surfaced. Update the strict path in the table-scoping logic to recognize the same fully qualified identifier format used for LLM-exposed table names, and make the same change anywhere else the allowlist comparison is duplicated so both strict and related filtering paths accept catalog.schema.table entries.

coderabbitai · 2026-06-26T11:32:26Z

+@import url('https://fonts.googleapis.com/css2?family=Fira+Code:wght@400;500;600;700&family=Fira+Sans:wght@300;400;500;600;700&display=swap');
+
 :root {
-  --text: #6b6375;
-  --text-h: #08060d;
-  --bg: #fff;
-  --border: #e5e4e7;
-  --code-bg: #f4f3ec;
-  --accent: #aa3bff;
-  --accent-bg: rgba(170, 59, 255, 0.1);
-  --accent-border: rgba(170, 59, 255, 0.5);
-  --social-bg: rgba(244, 243, 236, 0.5);
-  --shadow: rgba(0, 0, 0, 0.1) 0 10px 15px -3px, rgba(0, 0, 0, 0.05) 0 4px 6px -2px;
+  --text: #94A3B8; /* slate-400 */
+  --text-h: #F8FAFC; /* slate-50 */
+  --bg: #020617; /* slate-950 */
+  --border: #1E293B; /* slate-800 */
+  --code-bg: #0F172A; /* slate-900 */
+  --accent: #22C55E; /* green-500 */
+  --accent-bg: rgba(34, 197, 94, 0.1);
+  --accent-border: rgba(34, 197, 94, 0.5);
+  --social-bg: rgba(15, 23, 42, 0.5);
+  --shadow: rgba(0, 0, 0, 0.4) 0 10px 15px -3px, rgba(0, 0, 0, 0.25) 0 4px 6px -2px;

-  --sans: system-ui, 'Segoe UI', Roboto, sans-serif;
-  --heading: system-ui, 'Segoe UI', Roboto, sans-serif;
-  --mono: ui-monospace, Consolas, monospace;
+  --sans: 'Fira Sans', system-ui, 'Segoe UI', Roboto, sans-serif;
+  --heading: 'Fira Sans', system-ui, 'Segoe UI', Roboto, sans-serif;
+  --mono: 'Fira Code', ui-monospace, Consolas, monospace;


📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Fix the new Stylelint violations in this block.

The added @import url(...) form and the unquoted Roboto/Consolas keywords already trip the configured import-notation and value-keyword-case rules, so this stylesheet will stay lint-red until those values are normalized.

🧰 Tools

🪛 Stylelint (17.13.0)

[error] 1-1: Expected "url('https://fonts.googleapis.com/css2?family=Fira+Code:wght@400;500;600;700&family=Fira+Sans:wght@300;400;500;600;700&display=swap')" to be "'https://fonts.googleapis.com/css2?family=Fira+Code:wght@400;500;600;700&family=Fira+Sans:wght@300;400;500;600;700&display=swap'" (import-notation)

(import-notation)

[error] 15-15: Expected "Roboto" to be "roboto" (value-keyword-case)

(value-keyword-case)

[error] 16-16: Expected "Roboto" to be "roboto" (value-keyword-case)

(value-keyword-case)

[error] 17-17: Expected "Consolas" to be "consolas" (value-keyword-case)

(value-keyword-case)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/src/index.css` around lines 1 - 17, Normalize the new typography block in index.css to satisfy Stylelint: replace the remote font import with the project’s preferred import notation and make the font-family keywords in the :root variables consistent with the configured casing rules. Update the `@import` line and the --sans/--heading/--mono declarations so the values match the lint expectations in this stylesheet and the violations go away.

Source: Linters/SAST tools

coderabbitai · 2026-06-26T11:32:26Z

+const FLAG_CATEGORIES: Record<string, string[]> = {
+  Extraction: [
+    'EXTRACTOR_MODEL', 'EXTRACTOR_TEMPERATURE', 'EXTRACTOR_TOP_K_TABLES', 'TABLE_SCOPING_MODE',
+  ],
+  'Schema Explorer': [
+    'MAX_PROFILES_TO_FETCH', 'PROFILE_FETCH_CONCURRENCY', 'SCHEMA_CACHE_TTL', 'PROFILE_CACHE_TTL',
+    'SCHEMA_SEMANTIC_TYPING', 'SCHEMA_JOIN_GRAPH', 'SCHEMA_SUMMARIZATION', 'SCHEMA_AMBIGUITY_DETECT',
+    'SCHEMA_SUMMARY_MODEL', 'SCHEMA_TOP_K_JOINS',
+  ],
+  'Query Builder': ['QUERY_BUILDER_MODEL', 'QUERY_BUILDER_TEMPERATURE'],
+  Refiner: ['MAX_REFINER_ITERATIONS', 'MAX_SCHEMA_REPLAN_ITERATIONS', 'REFINER_MODEL'],
+  'Satisfaction Check': [
+    'SATISFACTION_CHECK_ENABLED', 'SATISFACTION_CHECK_EXECUTION', 'SATISFACTION_CHECK_PLAUSIBILITY',
+    'SATISFACTION_CHECK_COLUMNS', 'SATISFACTION_CHECK_SEMANTIC', 'SATISFACTION_MIN_ROWS',
+    'SATISFACTION_MAX_ROWS', 'SATISFACTION_SEMANTIC_THRESHOLD', 'SATISFACTION_JUDGE_MODEL',
+  ],
+  Skills: ['SKILLS_ENABLED', 'SKILLS_HOT_RELOAD', 'SKILLS_CACHE_TTL'],
+  Evaluation: ['LLM_JUDGE_ENABLED', 'EVAL_PARALLEL_WORKERS', 'EVAL_JUDGE_MODEL'],
+  'Catalog Validation': ['CATALOG_VALIDATION_ENABLED', 'CATALOG_CACHE_TTL'],
+};


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don't drop uncategorized flags from the admin UI.

filteredCategories only iterates the hard-coded FLAG_CATEGORIES, so any flag returned by /flags/ that is not listed there never renders and cannot be edited. That makes newly registered backend flags effectively unmanaged until the frontend is patched.

Also applies to: 387-399

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/src/pages/FlagsPage.tsx` around lines 37 - 56, The admin flags UI is only rendering the hard-coded FLAG_CATEGORIES entries, so any backend flag not listed there is omitted from the page. Update FlagsPage so the category filtering/selection logic includes uncategorized flags returned by /flags/ as a fallback bucket (or equivalent), ensuring every flag can still render and be edited; use the FLAG_CATEGORIES definition and filteredCategories logic in FlagsPage to locate the change.

coderabbitai · 2026-06-26T11:32:26Z

+      <div className="flag-row__value">
+        {editing && flag.type !== 'bool' ? (
+          <div className="flag-row__edit-controls">
+            {editor()}
+            <Tooltip title="Save">
+              <Button type="primary" size="small" icon={<Save size={12} />} onClick={handleSave} loading={isSaving} />
+            </Tooltip>
+            <Tooltip title="Cancel">
+              <Button size="small" icon={<X size={12} />} onClick={handleCancel} />
+            </Tooltip>
+          </div>
+        ) : (
+          <div className="flag-row__display">
+            {displayValue()}
+            {flag.type !== 'bool' && (
+              <Tooltip title="Edit value">
+                <Button
+                  type="text"
+                  size="small"
+                  icon={<Edit3 size={12} />}
+                  onClick={() => { setDraft(flag.value); setEditing(true); }}
+                />
+              </Tooltip>
+            )}
+          </div>
+        )}
+      </div>
+
+      <div className="flag-row__actions">
+        <span className="flag-row__owner">{flag.owner}</span>
+        {flag.last_modified_by && flag.last_modified_by !== 'seed' && (
+          <span className="flag-row__modifier">by {flag.last_modified_by}</span>
+        )}
+        <Popconfirm
+          title={`Reset ${flag.name} to env default?`}
+          description="This clears the DB override and falls back to the environment variable."
+          onConfirm={() => onReset(flag.name)}
+          okText="Reset"
+          okButtonProps={{ danger: true }}
+        >
+          <Tooltip title="Reset to env default">
+            <Button
+              type="text"
+              size="small"
+              danger
+              icon={<RotateCcw size={12} />}
+              loading={isResetting}
+              disabled={flag.value === null}
+            />
+          </Tooltip>
+        </Popconfirm>


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Add accessible names to the icon-only action buttons.

These controls rely on tooltips for meaning, but tooltips do not give the underlying buttons an accessible name. Screen-reader users will hit unlabeled edit/save/reset/delete actions here.

Also applies to: 239-249, 684-691

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/src/pages/FlagsPage.tsx` around lines 163 - 213, The icon-only action buttons in FlagsPage are missing accessible names, so screen readers cannot identify the edit/save/cancel/reset/delete controls even though they have tooltips. Add explicit accessible labels to each icon-only Button in the flag row actions and related action areas, using the existing action context in the FlagsPage component so the labels match the visible tooltip meaning. Update the relevant button instances in the edit/display controls and the additional action blocks referenced in this page to ensure every icon-only action has a clear accessible name.

coderabbitai · 2026-06-26T11:32:26Z

+  const submitMode = () => {
+    // Construct overrides object from overridesList
+    const overridesObj: Record<string, any> = {};
+    for (const item of overridesList) {
+      overridesObj[item.name] = item.value;
+    }
+
+    upsertModeMutation.mutate({
+      name: modeDraft.name,
+      data: { 
+        description: modeDraft.description, 
+        flag_overrides: overridesObj, 
+        is_active: modeDraft.is_active 
+      },
+    });


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Prevent submitting a new mode with an empty name.

The save path sends modeDraft.name straight into the /flags/modes/{name} request. When the field is blank, the UI still submits and only fails after calling the backend with /flags/modes/.

Suggested guard

const submitMode = () => { + if (!editingMode && !modeDraft.name.trim()) { + message.error('Mode name is required'); + return; + } + const overridesObj: Record<string, any> = {}; for (const item of overridesList) { overridesObj[item.name] = item.value; }

Also applies to: 561-579

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/src/pages/FlagsPage.tsx` around lines 422 - 436, Prevent submitting a mode when modeDraft.name is empty, since submitMode and the related save path still call upsertModeMutation with a blank name and hit /flags/modes/ instead of blocking in the UI. Add a guard in submitMode (and the other referenced save flow) to return early or disable submission when the name is missing, and ensure any validation around modeDraft.name is checked before constructing the request payload.

coderabbitai · 2026-06-26T11:32:26Z

            <div className="footer-col">
-              <h4>Product</h4>
+              <h4>Platform</h4>
              <a href="#">Features</a>
-              <a href="#">Pricing</a>
              <a href="#">Integrations</a>
            </div>
            <div className="footer-col">
              <h4>Resources</h4>
              <a href="#">Documentation</a>
              <a href="#">API Reference</a>
-              <a href="#">Blog</a>
            </div>
            <div className="footer-col">
-              <h4>Company</h4>
-              <a href="#">About Us</a>
-              <a href="#">Careers</a>
-              <a href="#">Contact</a>
+              <h4>Support</h4>
+              <a href="#">Help Center</a>
+              <a href="#">IT Service Desk</a>


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Replace the footer placeholder links before shipping this page.

These href="#" anchors do not navigate anywhere and just jump the viewport, so the footer looks broken for keyboard and screen-reader users.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/src/pages/LandingPage.tsx` around lines 353 - 366, The footer links in LandingPage’s footer sections are still placeholder anchors with href="#" and should be replaced before shipping. Update the anchor targets in the footer markup for the Platform, Resources, and Support sections to point to real routes or documented destinations, and make sure the footer items behave as actual navigation links for keyboard and screen-reader users.

coderabbitai

Review continued from previous batch...

coderabbitai · 2026-06-26T11:33:38Z

    if resume_value is not None:
        state_snapshot = await agent_graph.aget_state(config)
        if not state_snapshot.values:
-            return json.dumps({"error": f"Thread ID '{thread_id}' not found or has no active session."})
-
-        result = await agent_graph.ainvoke(
-            Command(resume=resume_value),
-            config=config
-        )
+            return json.dumps(
+                {
+                    "error": f"Thread ID '{thread_id}' not found or has no active session."
+                }
+            )
+
+        result = await agent_graph.ainvoke(Command(resume=resume_value), config=config)


🎯 Functional Correctness | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify whether any transport/test path updates graph state before HITL resume. rg -n -C3 'hitl_escalation|resume_value|Command\(resume|update_state|aupdate_state' agent backend frontend

Repository: StavPonte11/text2sql-onboarding

Length of output: 17249

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the HITL resume paths and any use of update_state/aupdate_state in the repo. rg -n -C 2 'update_state|aupdate_state|Command\(resume|resume_value|hitl_escalation' agent backend frontend | sed -n '1,240p'

Repository: StavPonte11/text2sql-onboarding

Length of output: 13333

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the HITL resume paths and any use of update_state/aupdate_state in the repo. rg -n -C 2 'update_state|aupdate_state|Command\(resume|resume_value|hitl_escalation' agent backend frontend | sed -n '1,240p'

Repository: StavPonte11/text2sql-onboarding

Length of output: 13333

🌐 Web query:

LangGraph interrupt_before Command(resume=...) update_state semantics resume payload merge checkpoint state

💡 Result:

When using interrupt_before or interrupt_after in LangGraph, execution pauses at the specified node boundary [1][2]. To resume, you pass a Command object to the graph's invoke, ainvoke, stream, or astream method [1][3]. Key Semantics of Command(resume=...): - Resuming with Command(resume=value): The value provided becomes the return value of the interrupt call inside the graph [1][4]. - Resuming execution: You must use the same thread ID that was active when the interrupt occurred [1][2]. When resuming, pass Command(resume=...) as the input to the graph method [1][5]. - Multi-interrupt handling: You can pass a dictionary mapping interrupt IDs to their respective resume values if the graph has encountered multiple interrupts (e.g., in parallel branches) [3][6]. - Node re-execution: When a graph resumes after an interrupt, the node where the interrupt occurred restarts from the beginning [1][7][4]. Because code preceding the interrupt point will run again, it is important to ensure that side effects within nodes are idempotent [7][4]. State Updates and Resumption: - Combining resume with state updates: You can provide both a resume value and a state update simultaneously by using Command(resume=value, update=your_state_dict) [3][8]. - Update semantics: The update dictionary is applied using the node's configured writers or reducers [9][10]. If no reducer is specified, the default behavior is to overwrite the key [10]. - Preserving interrupts: Note that calling update_state on a checkpoint often creates a new checkpoint, which can clear existing interrupt/task metadata from the state [11]. When you need to update state and resume simultaneously, passing the update within the Command object alongside the resume value is the recommended approach to maintain continuity [11][8]. Execution Flow: - When resuming, pass None if no resume value is needed [1][12]. - Do not pass the initial state dict when resuming; doing so will restart the graph from the entry point, effectively losing the checkpoint context [12][8]. - LangGraph restores completed task results from the checkpointer; nodes that successfully completed before the interrupt are not re-executed [7][10].

Citations:

1: https://docs.langchain.com/oss/python/langgraph/interrupts

2: https://github.com/langchain-ai/langgraph/blob/main/docs/docs/concepts/breakpoints.md

3: https://reference.langchain.com/python/langgraph/types/Command

4: https://docs.langchain.com/oss/python/langgraph/graph-api

5: https://langchain-ai-langgraph-40.mintlify.app/concepts/human-in-the-loop

6: Support multiple resume values with Command.resume langchain-ai/langgraph#4406

7: https://langchain-ai.github.io/langgraph/how-tos/wait-user-input-functional/

8: https://docs.langchain.com/oss/python/langgraph/graph-api.md

9: https://docs.langchain.com/oss/python/langgraph/use-time-travel

10: https://docs.langchain.com/oss/python/langgraph/use-graph-api

11: https://forum.langchain.com/t/how-to-update-graph-state-while-preserving-interrupts/1655

12: https://theneuralbase.com/agent-patterns/learn/advanced/langgraph-interrupt-before/

🌐 Web query:

LangGraph interrupt_before Command(resume=...) update_state semantics resume payload merge checkpoint state

💡 Result:

In LangGraph, when a graph is paused via an interrupt (either static via interrupt_before/interrupt_after or dynamic via the interrupt function), you resume execution by invoking the graph again and passing a Command object with the resume parameter [1][2][3]. Semantics of Command(resume=...): The value provided to resume=... is passed as the return value of the interrupt call inside the node [1][4][5]. When you resume, the graph re-executes the node where the interrupt occurred from its beginning [1][4][5]. Because code before the interrupt call will run again upon resumption, any side effects within that node must be idempotent [6][4]. State Management and Payload: 1. Update and Resume: You can update the graph state and resume in a single step using Command(update=..., resume=...) [7]. Alternatively, you can use graph.update_state to modify the state before triggering the resume with an invocation [7][2]. 2. Resuming: To resume, you pass the Command object to the same thread ID that was used when the interrupt occurred [1][5]. 3. Multiple Interrupts: If multiple interrupts exist (e.g., in subgraphs), Command(resume=...) can accept a dictionary mapping interrupt IDs to their respective resume values [3][8]. This is the recommended approach for handling complex graphs with multiple pending interruptions [8]. Checkpointing: LangGraph checkpoints the graph state at super-step boundaries [4]. When you resume, LangGraph restores the state from the checkpoint and replays forward [6]. If you are using the Functional API, LangGraph replays completed task and subgraph results from the checkpointer to maintain the integrity of the execution flow [6]. Because the node containing the interrupt re-executes, ensure that any logic preceding the interrupt handles re-entry safely [6][4].

Citations:

1: https://docs.langchain.com/oss/python/langgraph/interrupts

2: https://github.com/langchain-ai/langgraph/blob/main/docs/docs/concepts/breakpoints.md

3: https://reference.langchain.com/python/langgraph/types/Command

4: https://docs.langchain.com/oss/python/langgraph/graph-api

5: https://docs.langchain.com/oss/javascript/langgraph/interrupts

6: https://langchain-ai.github.io/langgraph/how-tos/wait-user-input-functional/

7: https://langchain-ai-langgraph-40.mintlify.app/concepts/human-in-the-loop

8: Support multiple resume values with Command.resume langchain-ai/langgraph#4406

🌐 Web query:

LangGraph interrupt_before Command(resume=...) update_state semantics resume payload merge checkpoint state

💡 Result:

In LangGraph, interrupt_before and Command(resume=...) are core components for implementing human-in-the-loop workflows [1][2]. Below is a synthesis of their semantics, state management, and interaction. Interrupt Semantics and interrupt_before interrupt_before is a configuration option set during graph compilation (or at runtime) that pauses execution immediately before a specified node executes [1][3]. When the graph hits an interrupt, it saves the current state to a checkpointer [1][4]. The execution halts, and the graph returns a state snapshot containing the interrupt information to the client [4][5]. Resuming with Command(resume=...) To continue execution after an interrupt, you must re-invoke or stream the graph using the Command primitive [1][6]. 1. Resume Payload: When you pass Command(resume=value) as input, that value is provided to the point where execution paused [1][7]. 2. Re-execution: Execution restarts from the beginning of the node where the interrupt occurred (or the node immediately preceding the breakpoint if using interrupt_before) [1][8][2]. Because of this, any code within that node preceding the interrupt call will be executed again; therefore, logic inside such nodes should be idempotent [8][2]. 3. State Management: The resume value is used to progress the graph, often by updating the graph's state based on the input provided (e.g., human approval or edits) [4][9]. Interaction with update_state - update_state: This method allows you to manually modify the graph's state at any time while the graph is paused at a checkpoint [3][10]. You can update specific state keys before resuming [4][3]. - Unified Update and Resume: You can perform a state update and resume the graph in a single step by passing Command(update=..., resume=...) [4][7]. This is frequently used to apply human edits to the state and continue the workflow simultaneously [4]. - Crucial Pattern: When resuming after an interrupt, always pass None as the input (or Command if providing a resume value) to the graph's invoke or stream method [1][11]. Passing an initial input dictionary will re-trigger the graph from the very beginning, bypassing your existing checkpoints and state progress [7][11]. Summary of Key Behaviors - Checkpoints: A checkpointer is required for both interrupt_before and resume functionality [1][4]. - Idempotency: Because nodes re-execute from the start upon resumption, ensure side effects within these nodes are handled safely [8][2]. - Multiple Interrupts: If multiple interrupts occur (e.g., in subgraphs), Command(resume=...) supports passing a mapping of {interrupt_id: value} to handle them collectively [6][9][12].

Citations:

1: https://docs.langchain.com/oss/python/langgraph/interrupts

2: https://github.com/langchain-ai/langgraph/blob/main/docs/docs/concepts/human_in_the_loop.md

3: https://github.com/langchain-ai/langgraph/blob/main/docs/docs/concepts/breakpoints.md

4: https://langchain-ai-langgraph-40.mintlify.app/concepts/human-in-the-loop

5: https://docs.langchain.com/langsmith/add-human-in-the-loop.md

6: https://reference.langchain.com/python/langgraph/types/Command

7: https://docs.langchain.com/oss/python/langgraph/graph-api

8: https://langchain-ai.github.io/langgraph/how-tos/wait-user-input-functional/

9: https://forum.langchain.com/t/auto-resuming-challenges-in-langgraph/1657

10: https://reference.langchain.com/python/langgraph-sdk/_async/threads/ThreadsClient/update_state

11: https://theneuralbase.com/agent-patterns/learn/advanced/langgraph-interrupt-before/

12: Support multiple resume values with Command.resume langchain-ai/langgraph#4406

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect how stale error fields influence routing/state after HITL resume. rg -n -C 3 'last_error|trino_error|escalation_reason|escalated|resume_value|update_state|Command\(resume' agent/src/agent | sed -n '1,260p'

Repository: StavPonte11/text2sql-onboarding

Length of output: 17095

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect how stale error fields influence routing/state after HITL resume. rg -n -C 3 'last_error|trino_error|escalation_reason|escalated|resume_value|update_state|Command\(resume' agent/src/agent | sed -n '1,260p'

Repository: StavPonte11/text2sql-onboarding

Length of output: 17095

Apply the corrected state before resuming HITL.

Command(resume=...) alone reuses the checkpointed last_error / trino_error / retry state. Update the thread first, or use Command(update=..., resume=...), to clear the stale error and escalation fields before resuming; otherwise the graph can immediately route back into hitl_escalation.

🧰 Tools

🪛 ast-grep (0.44.0)

[info] 59-63: use jsonify instead of json.dumps for JSON output
Context: json.dumps(
{
"error": f"Thread ID '{thread_id}' not found or has no active session."
}
)
Note: [CWE-116] Improper Encoding or Escaping of Output.

(use-jsonify)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/src/agent/mcp_server.py` around lines 57 - 66, The HITL resume path in the MCP server is reusing stale checkpoint state because `Command(resume=...)` is called without clearing prior error fields. Update the thread state before resuming, either by using `Command(update=..., resume=...)` or by issuing the equivalent state update in `agent_graph` around the existing `state_snapshot` / `ainvoke` flow, so `last_error`, `trino_error`, and retry/escalation fields are reset before `resume_value` is applied.

coderabbitai · 2026-06-26T11:33:39Z

 if __name__ == "__main__":
-    mcp.run()
+    mcp.run(transport="stdio")
+
+@mcp.tool()
+async def suggest_fixes(thread_id: str, category: str) -> str:


🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win

Register all tools before starting the MCP server.

When run as __main__, mcp.run(...) executes before suggest_fixes is decorated, so that tool is never registered for stdio server mode.

Proposed fix

-if __name__ == "__main__": - mcp.run(transport="stdio") - `@mcp.tool`() async def suggest_fixes(thread_id: str, category: str) -> str: @@ except Exception as e: return json.dumps([]) + +if __name__ == "__main__": + mcp.run(transport="stdio")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if __name__ == "__main__":

mcp.run()

mcp.run(transport="stdio")

@mcp.tool()

async def suggest_fixes(thread_id: str, category: str) -> str:

`@mcp.tool`()

async def suggest_fixes(thread_id: str, category: str) -> str:

...

if __name__ == "__main__":

mcp.run(transport="stdio")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/src/agent/mcp_server.py` around lines 184 - 188, The MCP server starts before suggest_fixes is registered, so the tool is missing in stdio mode. Move the mcp.run(transport="stdio") call to after all `@mcp.tool`() definitions, or otherwise ensure suggest_fixes is decorated before __main__ execution begins. Use the mcp object and the suggest_fixes function as the key locations when reordering the module so every tool is registered before startup.

coderabbitai · 2026-06-26T11:33:39Z

+from agent.utils.esca import get_esca_client

-llm = ChatOpenAI(model=settings.LLM_MODEL, base_url=settings.LLM_BASE_URL, api_key=settings.LLM_API_KEY, temperature=0)
+llm = get_llm("finalizer")


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Instantiate the finalizer LLM per invocation.

Because llm is created once at Line 12, both the summary path (Lines 101-114) and get_sql_explanation() (Lines 58-60) stay pinned to env defaults and ignore the runtime_flags resolved by init_flags_node. That breaks the new flag-driven model selection for the finalizer stage.

Proposed fix

-llm = get_llm("finalizer") - -async def get_sql_explanation(sql_query: str | None) -> str: +async def get_sql_explanation( + sql_query: str | None, runtime_flags: dict | None = None +) -> str: """Ask LLM to explain the SQL query in natural language.""" if not sql_query: return "No SQL query was generated." @@ - chain = prompt_sql_explanation | llm + chain = prompt_sql_explanation | get_llm("finalizer", runtime_flags=runtime_flags) response = await chain.ainvoke({"sql_query": sql_query}) return response.content async def finalizer_node(state: AgentState, config: RunnableConfig | None = None): """Summarize data.""" + runtime_flags = state.get("runtime_flags") or {} + llm = get_llm("finalizer", runtime_flags=runtime_flags) @@ - sql_task = get_sql_explanation(state.get("sql_query")) + sql_task = get_sql_explanation(state.get("sql_query"), runtime_flags)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

llm = get_llm("finalizer")

async def get_sql_explanation(

sql_query: str | None, runtime_flags: dict | None = None

) -> str:

"""Ask LLM to explain the SQL query in natural language."""

if not sql_query:

return "No SQL query was generated."

chain = prompt_sql_explanation | get_llm("finalizer", runtime_flags=runtime_flags)

response = await chain.ainvoke({"sql_query": sql_query})

return response.content

async def finalizer_node(state: AgentState, config: RunnableConfig | None = None):

"""Summarize data."""

runtime_flags = state.get("runtime_flags") or {}

llm = get_llm("finalizer", runtime_flags=runtime_flags)

sql_task = get_sql_explanation(state.get("sql_query"), runtime_flags)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/src/agent/nodes/finalizer.py` at line 12, The finalizer LLM is being created once at module load via get_llm("finalizer"), so both the summary flow in Finalizer and get_sql_explanation() ignore runtime_flags resolved by init_flags_node. Update the finalizer stage to instantiate the LLM per invocation using the current runtime_flags, and thread that dynamic LLM selection through get_sql_explanation() and the summary path so model choice follows the flag-driven configuration.

coderabbitai · 2026-06-26T11:33:39Z

+        if inline_result_rows is not None:
+            limit = 5
+            preview_rows = inline_result_rows[:limit]
+            columns = (
+                list(preview_rows[0].keys())
+                if preview_rows and isinstance(preview_rows[0], dict)
+                else []
+            )
+            preview_info = {
+                "columns": columns,
+                "preview_rows": preview_rows,
+                "preview_count": len(preview_rows),
+                "total_rows": len(inline_result_rows),
+            }
+            preview_str = json.dumps(preview_info, indent=2)


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Guard the inline preview serializer.

The new fallback at Lines 74-88 runs specifically when Esca persistence fails, but inline_result_rows is arbitrary query output. If any row contains datetime, date, Decimal, or another non-JSON-native value, json.dumps(preview_info, indent=2) raises TypeError and the fallback path crashes instead of recovering.

Proposed fix

- preview_str = json.dumps(preview_info, indent=2) + preview_str = json.dumps(preview_info, indent=2, default=str)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if inline_result_rows is not None:

limit = 5

preview_rows = inline_result_rows[:limit]

columns = (

list(preview_rows[0].keys())

if preview_rows and isinstance(preview_rows[0], dict)

else []

)

preview_info = {

"columns": columns,

"preview_rows": preview_rows,

"preview_count": len(preview_rows),

"total_rows": len(inline_result_rows),

}

preview_str = json.dumps(preview_info, indent=2)

if inline_result_rows is not None:

limit = 5

preview_rows = inline_result_rows[:limit]

columns = (

list(preview_rows[0].keys())

if preview_rows and isinstance(preview_rows[0], dict)

else []

)

preview_info = {

"columns": columns,

"preview_rows": preview_rows,

"preview_count": len(preview_rows),

"total_rows": len(inline_result_rows),

}

preview_str = json.dumps(preview_info, indent=2, default=str)

🧰 Tools

🪛 ast-grep (0.44.0)

[info] 87-87: use jsonify instead of json.dumps for JSON output
Context: json.dumps(preview_info, indent=2)
Note: [CWE-116] Improper Encoding or Escaping of Output.

(use-jsonify)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/src/agent/nodes/finalizer.py` around lines 74 - 88, Guard the inline preview serialization in finalizer.py’s fallback path so it cannot crash when `inline_result_rows` contains non-JSON-native values. In the `finalizer` logic that builds `preview_info` and `preview_str`, add safe serialization for values like `datetime`, `date`, `Decimal`, or a fallback sanitizer/default handler before calling `json.dumps`. Keep the existing preview behavior and make sure the Esca persistence failure path still returns a preview instead of raising `TypeError`.

coderabbitai · 2026-06-26T11:33:39Z

+        if loaded_skills:
+            try:
+                from core.langfuse import get_langfuse_handler
+                langfuse_handler = get_langfuse_handler()
+                if langfuse_client.get_current_trace_id():
+                    skill_names = [s.get("displayName") or s.get("name", "Unknown") for s in loaded_skills]
+                    from agent.langfuse_client import langfuse_client
+                    langfuse_client.update_current_span(


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Import langfuse_client before using it.

The inner import on Line 46 makes langfuse_client local to the function, so Line 44 references it before assignment and always falls into the warning path when skills load.

Proposed fix

if loaded_skills: try: - from core.langfuse import get_langfuse_handler - langfuse_handler = get_langfuse_handler() + from agent.langfuse_client import langfuse_client if langfuse_client.get_current_trace_id(): skill_names = [s.get("displayName") or s.get("name", "Unknown") for s in loaded_skills] - from agent.langfuse_client import langfuse_client langfuse_client.update_current_span( metadata={"skills_loaded": skill_names} )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if loaded_skills:

try:

from core.langfuse import get_langfuse_handler

langfuse_handler = get_langfuse_handler()

if langfuse_client.get_current_trace_id():

skill_names = [s.get("displayName") or s.get("name", "Unknown") for s in loaded_skills]

from agent.langfuse_client import langfuse_client

langfuse_client.update_current_span(

if loaded_skills:

try:

from agent.langfuse_client import langfuse_client

if langfuse_client.get_current_trace_id():

skill_names = [s.get("displayName") or s.get("name", "Unknown") for s in loaded_skills]

langfuse_client.update_current_span(

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/src/agent/nodes/init_skills.py` around lines 40 - 47, The loaded-skills tracing block in init_loaded_skills is referencing langfuse_client before the inner import assigns it, which makes the warning path trigger incorrectly. Move the import of langfuse_client to the top of the function or module, before the call to langfuse_client.get_current_trace_id(), and keep get_langfuse_handler and update_current_span using that already-imported symbol.

coderabbitai · 2026-06-26T11:33:40Z

+  animation: spinLoader 1s cubic-bezier(0.55, 0.25, 0.25, 0.85) infinite;
+  box-shadow: 0 0 15px rgba(56, 189, 248, 0.2);
+}
+
+@keyframes spinLoader {
+  to {
+    transform: rotate(360deg);
+  }
+}
+
+.spinnerText {
+  color: var(--text-h);
+  font-size: 14.5px;
+  font-weight: 500;
+  letter-spacing: 0.5px;
+  animation: pulseText 2s ease-in-out infinite;
+}
+
+@keyframes pulseText {
+  0%, 100% { opacity: 0.7; }
+  50% { opacity: 1; }
+}
+
+.actionRow {
+  display: flex;
+  gap: 16px;
+  margin-top: 24px;
+}
+
+/* Animations */
+@keyframes fadeInUp {
+  from {
+    opacity: 0;
+    transform: translateY(20px);
+  }
+  to {
+    opacity: 1;
+    transform: translateY(0);
+  }
+}
+
+.animateIn {
+  animation: fadeInUp 0.5s ease forwards;


📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

The new animation names violate the enforced keyframe naming rule.

spinLoader, pulseText, fadeInUp, pulseIconAnim, and pulseDotAnim are all flagged by stylelint's kebab-case rule, so this module will keep failing lint until the @keyframes names and their animation: references are renamed.

Also applies to: 771-789

🧰 Tools

🪛 Stylelint (17.13.0)

[error] 176-176: Expected keyframe name "spinLoader" to be kebab-case (keyframes-name-pattern)

(keyframes-name-pattern)

[error] 190-190: Expected keyframe name "pulseText" to be kebab-case (keyframes-name-pattern)

(keyframes-name-pattern)

[error] 202-202: Expected keyframe name "fadeInUp" to be kebab-case (keyframes-name-pattern)

(keyframes-name-pattern)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/src/pages/AgentTestingPage.module.css` around lines 172 - 214, The new animation names in AgentTestingPage.module.css violate the enforced kebab-case keyframe naming rule. Rename the `@keyframes` definitions and every matching animation reference used by classes like spinnerText, animateIn, and the other affected animation styles (including pulseIconAnim and pulseDotAnim) to kebab-case so stylelint passes consistently across the module.

Source: Linters/SAST tools

coderabbitai · 2026-06-26T11:33:40Z

+  useEffect(() => {
+    if (!traceId) {
+      setTraceSpans([]);
+      setSelectedStep(null);
+      setSelectedStepIndex(null);
+      return;
+    }
+
+    setLoadingTrace(true);
+    let attempts = 0;
+    const maxAttempts = 5;
+
+    const fetchTrace = () => {
+      axios
+        .get(`/api/agent/traces/${traceId}`)
+        .then((res) => {
+          const data = res.data || [];
+          if (data.length > 0) {
+            setTraceSpans(data);
+            setLoadingTrace(false);
+          } else if (attempts < maxAttempts) {
+            attempts++;
+            setTimeout(fetchTrace, 2000); // retry in 2s
+          } else {
+            setTraceSpans([]);
+            setLoadingTrace(false);
+          }
+        })
+        .catch((err) => {
+          console.error('Failed to fetch trace', err);
+          if (attempts < maxAttempts) {
+            attempts++;
+            setTimeout(fetchTrace, 2000);
+          } else {
+            setLoadingTrace(false);
+          }
+        });
+    };
+
+    fetchTrace();
+  }, [traceId]);


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Cancel the trace retry chain when traceId changes or the page resets.

Each empty/error response schedules another setTimeout(fetchTrace, 2000), but this effect never clears those timers or ignores late responses. An old poller can therefore write a previous trace back into traceSpans after the user starts a new run or resets the page.

Proposed fix

useEffect(() => { if (!traceId) { setTraceSpans([]); setSelectedStep(null); setSelectedStepIndex(null); return; } setLoadingTrace(true); + let cancelled = false; + let retryId: ReturnType<typeof setTimeout> | null = null; let attempts = 0; const maxAttempts = 5; const fetchTrace = () => { axios .get(`/api/agent/traces/${traceId}`) .then((res) => { + if (cancelled) return; const data = res.data || []; if (data.length > 0) { setTraceSpans(data); setLoadingTrace(false); } else if (attempts < maxAttempts) { attempts++; - setTimeout(fetchTrace, 2000); // retry in 2s + retryId = setTimeout(fetchTrace, 2000); } else { setTraceSpans([]); setLoadingTrace(false); } }) .catch((err) => { + if (cancelled) return; console.error('Failed to fetch trace', err); if (attempts < maxAttempts) { attempts++; - setTimeout(fetchTrace, 2000); + retryId = setTimeout(fetchTrace, 2000); } else { setLoadingTrace(false); } }); }; fetchTrace(); + + return () => { + cancelled = true; + if (retryId) clearTimeout(retryId); + }; }, [traceId]);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

useEffect(() => {

if (!traceId) {

setTraceSpans([]);

setSelectedStep(null);

setSelectedStepIndex(null);

return;

}

setLoadingTrace(true);

let attempts = 0;

const maxAttempts = 5;

const fetchTrace = () => {

axios

.get(`/api/agent/traces/${traceId}`)

.then((res) => {

const data = res.data || [];

if (data.length > 0) {

setTraceSpans(data);

setLoadingTrace(false);

} else if (attempts < maxAttempts) {

attempts++;

setTimeout(fetchTrace, 2000); // retry in 2s

} else {

setTraceSpans([]);

setLoadingTrace(false);

}

})

.catch((err) => {

console.error('Failed to fetch trace', err);

if (attempts < maxAttempts) {

attempts++;

setTimeout(fetchTrace, 2000);

} else {

setLoadingTrace(false);

}

});

};

fetchTrace();

}, [traceId]);

useEffect(() => {

if (!traceId) {

setTraceSpans([]);

setSelectedStep(null);

setSelectedStepIndex(null);

return;

}

setLoadingTrace(true);

let cancelled = false;

let retryId: ReturnType<typeof setTimeout> | null = null;

let attempts = 0;

const maxAttempts = 5;

const fetchTrace = () => {

axios

.get(`/api/agent/traces/${traceId}`)

.then((res) => {

if (cancelled) return;

const data = res.data || [];

if (data.length > 0) {

setTraceSpans(data);

setLoadingTrace(false);

} else if (attempts < maxAttempts) {

attempts++;

retryId = setTimeout(fetchTrace, 2000);

} else {

setTraceSpans([]);

setLoadingTrace(false);

}

})

.catch((err) => {

if (cancelled) return;

console.error('Failed to fetch trace', err);

if (attempts < maxAttempts) {

attempts++;

retryId = setTimeout(fetchTrace, 2000);

} else {

setLoadingTrace(false);

}

});

};

fetchTrace();

return () => {

cancelled = true;

if (retryId) clearTimeout(retryId);

};

}, [traceId]);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/src/pages/AgentTestingPage.tsx` around lines 907 - 947, The trace polling in AgentTestingPage’s useEffect can outlive the current traceId and write stale spans after a new run or reset. Update the fetchTrace retry chain to track and clear any pending timeout when the effect re-runs or unmounts, and ignore late axios responses if traceId has changed before updating setTraceSpans or setLoadingTrace. Use the useEffect cleanup around fetchTrace, attempts, and the setTimeout retry path to cancel the old poller safely.

coderabbitai · 2026-06-26T11:33:40Z

+  useEffect(() => {
+    if (!threadId || !chatMutation.isPending) return;
+
+    const eventSource = new EventSource(`/api/agent/stream/${threadId}`);
+
+    eventSource.onmessage = (event) => {
+      try {
+        const data = JSON.parse(event.data);
+        if (data.node) {
+          setExecutionPath((prev) => {
+            if (prev.length > 0 && prev[prev.length - 1] === data.node) {
+              return prev;
+            }
+            return [...prev, data.node];
+          });
+        }
+      } catch (err) {
+        console.error('SSE parse error:', err);
+      }
+    };
+
+    eventSource.onerror = (err) => {
+      console.error('SSE connection error:', err);
+      eventSource.close();
+    };
+
+    return () => {
+      eventSource.close();
+    };
+  }, [threadId, chatMutation.isPending]);


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

The 300ms timeout does not actually subscribe before the run starts.

EventSource is only opened after chatMutation.isPending flips inside mutate(), so the POST can already be in flight before the stream exists. Early node events can be missed, which leaves the live execution path and thought-process UI incomplete until the final response arrives.

Also applies to: 1006-1014, 1019-1035

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/src/pages/AgentTestingPage.tsx` around lines 962 - 991, The SSE subscription is created too late in the AgentTestingPage flow, so early execution events can be missed before the stream opens. Move the EventSource setup into the start of the run lifecycle in AgentTestingPage (around the mutate/submit path used for chatMutation) so it subscribes before or at the same time the POST begins, and keep the existing onmessage/onerror handling and cleanup tied to threadId. Ensure the live execution path state is connected before triggering the request rather than waiting for chatMutation.isPending to flip.

coderabbitai · 2026-06-26T11:33:40Z

+    const submitBtn = page.getByRole('button', { name: /Ask Agent/i });
+    if (await submitBtn.isVisible()) {
+      await submitBtn.click();
+    } else {
+      await page.keyboard.press('Enter');
+    }
+
+    // Wait for the agent to complete
+    // We expect either a success card or an interrupt card
+    await expect(page.locator('.agent-result')).toBeVisible({ timeout: 60000 });


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Update the success-path selectors to the current UI.

AgentChatInput renders a Run Agent button, not Ask Agent, and AgentResultDisplay uses CSS-module classes plus Agent Execution Successful/TASK COMPLETED instead of a global .agent-result marker. As written, this test falls back to Enter and then waits on a selector the page never renders.

Proposed fix

- const submitBtn = page.getByRole('button', { name: /Ask Agent/i }); + const submitBtn = page.getByRole('button', { name: /Run Agent/i }); @@ - await expect(page.locator('.agent-result')).toBeVisible({ timeout: 60000 }); + await expect(page.getByText('Agent Execution Successful')).toBeVisible({ timeout: 60000 });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const submitBtn = page.getByRole('button', { name: /Ask Agent/i });

if (await submitBtn.isVisible()) {

await submitBtn.click();

} else {

await page.keyboard.press('Enter');

}

// Wait for the agent to complete

// We expect either a success card or an interrupt card

await expect(page.locator('.agent-result')).toBeVisible({ timeout: 60000 });

const submitBtn = page.getByRole('button', { name: /Run Agent/i });

if (await submitBtn.isVisible()) {

await submitBtn.click();

} else {

await page.keyboard.press('Enter');

}

// Wait for the agent to complete

// We expect either a success card or an interrupt card

await expect(page.getByText('Agent Execution Successful')).toBeVisible({ timeout: 60000 });

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/tests/agent-testing.spec.ts` around lines 28 - 37, The agent-testing spec is targeting outdated success-path UI selectors, so update the assertions in the test to match the current components. In the flow that uses page.getByRole for the submit button, change the button name from the old Ask Agent label to the current Run Agent label, and replace the wait on the global .agent-result selector with the actual AgentResultDisplay success indicators (the CSS-module-backed text such as Agent Execution Successful or TASK COMPLETED). Keep the rest of the interaction logic in the test unchanged.

coderabbitai · 2026-06-26T11:33:40Z

+  test('should run query against real backend', async ({ page }) => {
+    // Navigate to the Studio / Agent Testing page using the local dev server
+    await page.goto('http://localhost:3001/agent-testing');
+
+    // Fill the chat input
+    const input = page.getByPlaceholder('Ask the agent to query a table...');
+    await expect(input).toBeVisible();
+    await input.fill('how many orders in the last month?');
+
+    // Click submit
+    const submitBtn = page.getByRole('button', { name: /Run Agent/i });
+    if (await submitBtn.isVisible()) {
+      await submitBtn.click();
+    } else {
+      await page.keyboard.press('Enter');
+    }
+
+    // Wait up to 3 minutes for the agent to finish
+    test.setTimeout(180000);
+
+    // Look for either Agent Completed or Agent Needs Approval or Agent Error
+    await expect(
+      page.locator('text=Agent Completed').or(page.locator('text=Agent Needs Approval')).or(page.locator('text=Agent Error'))
+    ).toBeVisible({ timeout: 180000 });


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Don’t ship this as a default CI Playwright spec.

page.goto('http://localhost:3001/agent-testing') bypasses Playwright’s configured base URL and requires a live local stack, while the success assertion waits for Agent Completed even though the page renders Agent Execution Successful/TASK COMPLETED. In automation this will either fail immediately when no dev server is running or burn 180s on successful runs.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/tests/real-agent.spec.ts` around lines 4 - 27, The real-agent Playwright spec is not CI-safe because it hardcodes a local dev URL and asserts on text that the page does not render. Update the test in real-agent.spec.ts to use Playwright’s configured base URL (instead of localhost:3001) and align the success wait in the test flow with the actual completion labels rendered by the agent-testing page, using the existing test('should run query against real backend') block and its page.goto/expect locator checks.

Stav Ponte added 13 commits June 16, 2026 11:58

feat: resolve Group 1 critical refactoring bugs and optimize Docker c…

0928097

…onfiguration

feat: implement ESCA fallback, context injection, and unify utilities…

4eff068

… (TTS-G1-07 to 09)

fix: address CodeRabbit review feedback on PR #15

341808d

chore: remove dead code and fix linting errors

9b4f557

fix: address CodeRabbit PR comments

63ca448

refactor: architecture & resilience updates, add scratch gitignores

6a45199

feat: resolve local LLM schema enrichment OOM crash, handle dict pars…

e9c4009

…ing, and expand test suite

feat: implement Jeen Skills Integration Framework (G3)

3cb1ede

feat: implement feature flags UI overrides builder and admin attribution

69bf431

feat: inline agent graph visualization, trace viewer, structured HITL…

2bf1999

… enhancements, and Playwright tests

feat: optimize agent latency, enrich schema explorer payloads, and im…

861e1e1

…plement plausibility checks

fix: render empty schema tables/columns gracefully and fix Vitest sca…

6fb25db

…nning

pulled from main and fixed conflicts

a1c6cfa

coderabbitai Bot reviewed Jun 26, 2026

View reviewed changes

-llm = get_llm("finalizer")
+async def get_sql_explanation(
+    sql_query: str | None, runtime_flags: dict | None = None
+) -> str:
+    """Ask LLM to explain the SQL query in natural language."""
+    if not sql_query:
+        return "No SQL query was generated."
+    chain = prompt_sql_explanation | get_llm("finalizer", runtime_flags=runtime_flags)
+    response = await chain.ainvoke({"sql_query": sql_query})
+    return response.content
+async def finalizer_node(state: AgentState, config: RunnableConfig | None = None):
+    """Summarize data."""
+    runtime_flags = state.get("runtime_flags") or {}
+    llm = get_llm("finalizer", runtime_flags=runtime_flags)
+    sql_task = get_sql_explanation(state.get("sql_query"), runtime_flags)

Conversation

yuvalkh commented Jun 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

yuvalkh commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading