feat: agent architecture and performance optimizations#16
Conversation
…1 through G2-05)
TTS-G2-01: Table Scoping Modes
- Add scoping_mode field to AgentState ('strict' | 'hybrid')
- Add TABLE_SCOPING_MODE config setting (default: hybrid)
- Add validate_config_node as START→extractor gatekeeper with
InvalidConfigurationException for strict mode + empty allowed_tables
- hybrid_search_tables enforces hard allowlist in strict mode
- Strict mode injects [STRICT MODE] block into schema explorer prompt
- Langfuse trace tagged with scoping_mode=<value>
TTS-G2-02: Fallback / Escalation Hierarchy (HITL)
- Add escalated, escalation_reason to AgentState
- Add hitl_escalation_node compiled with interrupt_before=['hitl_escalation']
- route_schema_explorer escalates after MAX_SCHEMA_RETRIES (3) retries
- route_refiner escalates after MAX_REFINER_ITERATIONS exhausted
- refiner.py and schema_explorer.py set escalation_reason before handing off
- hitl_escalation → extractor edge enables full state-reset resume
TTS-G2-03: Advanced Schema Explorer (4-Phase Pipeline)
- Add networkx>=3.3 dependency
- New utils/schema_enrichment.py with 4 async phases:
- run_semantic_typing (LLM column classification)
- run_join_graph (BFS via networkx over ForeignKeyMapping + CrossTableProfile)
- run_schema_summarization (per-table LLM summaries replacing verbose JSON)
- run_ambiguity_detection (column/table name ambiguity notes)
- Each phase gated by ENABLE_* config flag (defaults: only AMBIGUITY_DETECT on)
- active_schema_phases list emitted to Langfuse trace metadata
TTS-G2-04: Satisfaction Check Module
- New nodes/satisfaction_check.py with 4 checks:
A: Execution success (SATISFACTION_CHECK_EXECUTION)
B: Row plausibility min/max (SATISFACTION_CHECK_PLAUSIBILITY)
C: Structural column coverage via LLM (SATISFACTION_CHECK_COLUMNS)
D: Semantic alignment scored 0-1 (SATISFACTION_CHECK_SEMANTIC, off by default)
- satisfaction_fail_count drives routing: < MAX → refiner, >= MAX → hitl_escalation
- Replaces direct refiner→finalizer success path
- All check results pushed to Langfuse trace metadata
TTS-G2-05: Redis Schema Cache
- New core/src/core/cache.py with CacheService (get/set/invalidate/invalidate_pattern)
- Non-blocking: all Redis errors log a warning and fall through to live data
- Key conventions: profile:{table_id}:{version}, ddl:{cat}:{sch}:{tbl}, catalog_valid:...
- SCAN-based pattern invalidation (no blocking KEYS command)
- Integrated into get_table_profile: cache-first fetch with PROFILE_CACHE_TTL
- cache_hit_count / cache_miss_count tracked per schema_explorer invocation
- Counters pushed to Langfuse trace at end of schema_explorer_node
…ing, and expand test suite
📝 WalkthroughWalkthroughThis PR introduces "Group 2" agent hardening: a new satisfaction-check gate, HITL escalation node, schema enrichment phases (semantic typing, join graph, summarization, ambiguity detection), Redis-backed profile caching, a shared LLM factory, and expanded ChangesAgent G2 Pipeline Hardening
Backend Langfuse Migration, Cleanup, and Infrastructure
Sequence Diagram(s)sequenceDiagram
participant Client
participant chat_endpoint
participant agent_graph
participant validate_config_node
participant schema_explorer_node
participant refiner_node
participant satisfaction_check_node
participant hitl_escalation_node
participant finalizer_node
Client->>chat_endpoint: POST /chat (query, allowed_tables)
chat_endpoint->>agent_graph: ainvoke(state, config[langfuse_handler])
agent_graph->>validate_config_node: check strict-mode allowed_tables
validate_config_node-->>agent_graph: pass / InvalidConfigurationException
agent_graph->>schema_explorer_node: hybrid search + enrichment phases
schema_explorer_node-->>agent_graph: schema_plan or hallucinated→retry/HITL
agent_graph->>refiner_node: execute SQL + accumulate error_history
refiner_node-->>agent_graph: success(inline_result_rows) or escalation_reason
agent_graph->>satisfaction_check_node: run 4 checks (exec/plausibility/columns/semantic)
satisfaction_check_node-->>agent_graph: satisfaction_failures + fail_count
alt fail_count >= SATISFACTION_MAX_FAILURES
agent_graph->>hitl_escalation_node: interrupt_before pause
hitl_escalation_node-->>Client: interrupted response (sql_query, schema_plan)
Client->>chat_endpoint: resume_value
chat_endpoint->>agent_graph: Command(resume=resume_value)
else pass
agent_graph->>finalizer_node: build preview (ESCA or inline_result_rows fallback)
finalizer_node-->>chat_endpoint: summary + sql_explanation + raw_data_ref
chat_endpoint-->>Client: completed ChatResponse
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 23
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
agent/pyproject.toml (1)
7-27:⚠️ Potential issue | 🟠 MajorAdd
python-core-utilsto runtime dependencies in[project].dependencies.
python-core-utilsis imported directly in three files (agent/src/agent/main.py,agent/src/agent/routers/chat.py, andagent/src/agent/nodes/schema_explorer.py) but is only declared as a source mapping in[tool.uv.sources](line 27), not in thedependencieslist. Source mappings do not guarantee runtime availability; in clean environments this will surface asModuleNotFoundError.Proposed fix
dependencies = [ "fastapi>=0.115.6", "uvicorn[standard]>=0.32.1", "langgraph", "langchain-ollama", "core", + "python-core-utils", "esca-sdk", "pydantic-settings>=2.7.0", "trino>=0.328.0",🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/pyproject.toml` around lines 7 - 27, The dependency `python-core-utils` is declared as a source mapping in the [tool.uv.sources] section but is missing from the [project].dependencies list, which will cause ModuleNotFoundError in clean environments where it is imported in main.py, routers/chat.py, and nodes/schema_explorer.py. Add `python-core-utils` as an entry in the dependencies list under [project], similar to how other local path dependencies like `core` are included.backend/app/infra_init.py (1)
660-667:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
_seed_trino_datais not idempotent whenDELETEfails.If
DELETEis unsupported, the code still executesINSERTunconditionally, so every startup appends duplicate seed rows.Suggested fix
try: # Clear existing data to apply updated seeds cleanly try: _trino_exec(f"DELETE FROM {table['fqn']}") except Exception as e: # If DELETE is not supported (e.g. some Iceberg configs require specific formats), we ignore and fall back to count checks logger.debug("DELETE on %s failed: %s", table["fqn"], e) + existing = _trino_exec(table["count_sql"], ignore_errors=True) + if existing and existing[0][0] > 0: + logger.info( + "[InfraInit] Skipping seed for '%s' (rows already present)", + table["fqn"], + ) + continue _trino_exec(table["seed_sql"]) logger.info("[InfraInit] Seeded sample data into '%s' ✓", table["fqn"])🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/infra_init.py` around lines 660 - 667, The _seed_trino_data function executes the INSERT statement unconditionally even when the DELETE operation fails, causing duplicate seed data on repeated executions. When the DELETE fails and the exception is caught, before executing the seed_sql INSERT, add a count check using a Trino query to verify if the table is empty. Only proceed with the INSERT statement if the count is zero, otherwise skip the insert to maintain idempotency.backend/app/services/langfuse_client.py (1)
88-103:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate docstring to reflect removed method.
The docstring on Line 102 still references
get_prompt_as_langchain(name), but this method was removed in this PR. Please update the docstring to remove this reference.📝 Proposed fix
""" Langfuse connection — mirrors LangfuseTracer from the main application. Data members (DMs): public_key — Langfuse public API key. private_key — Langfuse secret API key. host — Langfuse server URL. client — Instantiated sdk.Langfuse client (None if disabled). Implements Connection: connect(), is_alive(), logout(). Additional methods (present in the main app's LangfuseTracer): get_prompt(name) — fetch a prompt by name. - get_prompt_as_langchain(name) — fetch and convert to a LangChain prompt. """🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/services/langfuse_client.py` around lines 88 - 103, The docstring for the LangfuseTracer class still lists the removed method `get_prompt_as_langchain(name)` in the "Additional methods" section. Locate the docstring in the LangfuseTracer class and remove the line that references `get_prompt_as_langchain(name) — fetch and convert to a LangChain prompt.` while keeping the reference to the remaining `get_prompt(name)` method.core/src/core/models/models.py (1)
325-325:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReintroduce a constrained status type for
EvalResult.status.Line 325 stores status as unconstrained
str, which allows invalid persisted states and can silently skew downstream status-filtered metrics. Also, the inline comment says"pass" | "fail"but writers already use"pending". Use aStrEnum(or equivalent DB-level constraint) forpending/pass/fail.Suggested fix
+class EvalResultStatus(StrEnum): + pending = "pending" + passed = "pass" + failed = "fail" + class EvalResult(SQLModel, table=True): @@ - status: str # "pass" | "fail" + status: EvalResultStatus = Field(default=EvalResultStatus.pending)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/src/core/models/models.py` at line 325, The status field in EvalResult at line 325 is currently typed as an unconstrained str, which allows invalid values to be persisted and contradicts the inline comment that only lists "pass" | "fail" when actual usage includes "pending". Define a StrEnum with the valid status values pending, pass, and fail, then change the status field type annotation from str to use this new enum type instead of the unconstrained string type.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.gitignore:
- Line 42: The python-core-utils/ directory is currently ignored in .gitignore
but both Dockerfiles attempt to copy it, which will fail during Docker builds.
Resolve this discrepancy by choosing one of three approaches: (1) remove the
python-core-utils/ entry from .gitignore and commit the directory to the
repository, (2) convert python-core-utils/ into a git submodule and update both
Dockerfiles to initialize it during the build, or (3) if the directory is not
actually needed in the Docker images, remove the COPY python-core-utils
directives from the Dockerfiles. Ensure the chosen approach is consistently
applied across all affected files so that Docker builds succeed in CI
environments.
In `@agent/src/agent/config.py`:
- Around line 51-58: Add Pydantic Field constraints to validate the satisfaction
and cache configuration values in the config class. For SATISFACTION_MIN_ROWS
and SATISFACTION_MAX_ROWS, ensure they are positive integers with MIN_ROWS less
than MAX_ROWS. For SATISFACTION_SEMANTIC_THRESHOLD, constrain it to the range
[0, 1]. For SATISFACTION_MAX_FAILURES, ensure it is a positive integer. For
SCHEMA_CACHE_TTL and PROFILE_CACHE_TTL, ensure both are positive integers. Use
Pydantic validators like gt (greater than), ge (greater than or equal), le (less
than or equal), and Field constraints that match the patterns already
established elsewhere in this config class.
In `@agent/src/agent/nodes/finalizer.py`:
- Around line 72-76: The isinstance check for preview_rows[0] in the column
extraction logic is checking if each row is a dict, but the result.rows
structure from TrinoExecutionResult contains lists, not dicts. Fix the
isinstance check to verify that preview_rows[0] is a list instead of a dict.
Additionally, update the test mock in conftest.py that incorrectly defines rows
as dicts to instead define rows as lists to match the actual
TrinoExecutionResult structure, which will ensure the column extraction logic is
properly tested with the correct data type.
In `@agent/src/agent/nodes/refiner.py`:
- Around line 8-9: Remove the duplicate import statement for langfuse_client
from the agent.langfuse_client module in the refiner.py file. Keep only a single
import statement importing langfuse_client from agent.langfuse_client and delete
the redundant duplicate line.
In `@agent/src/agent/nodes/satisfaction_check.py`:
- Around line 43-48: The column extraction logic in the satisfaction_check
function incorrectly assumes rows are dictionaries when they are actually lists
(as returned by Trino through refiner_node). Instead of checking if rows[0] is a
dict and calling .keys(), you need to handle the case where rows are lists and
extract column names from the state's column metadata or adjust the logic to
work with list-based rows. This will ensure the columns list is properly
populated so that Check C and Check D are not skipped due to the empty columns
guard condition in their conditional statements.
In `@agent/src/agent/state.py`:
- Around line 23-36: The TypedDict in state.py currently models optional fields
incorrectly. The fields last_error, hallucinated_tables, esca_write_failed,
inline_result_rows, error_history, schema_explorer_retry_count, escalated,
escalation_reason, satisfaction_failures, and satisfaction_fail_count are
conditionally set and accessed with defaults throughout the workflow, but the
current type definition with total=True (default) requires all keys to exist.
Wrap each of these field definitions with NotRequired[...] to properly indicate
they are optional. Additionally, change the scoping_mode field from str | None
to NotRequired[Literal["strict", "hybrid"] | None] to document the valid
constraint values and mark it as truly optional. Import NotRequired from typing
(or typing_extensions if needed) and Literal from typing to support these type
annotations.
In `@agent/src/agent/utils/schema_enrichment.py`:
- Around line 271-277: The BFS fallback path construction in the loop (where
node_names are iterated and _bfs_shortest_path is called) is missing the "joins"
key that the networkx branch includes. Update the paths.append() call to include
a "joins" key with a placeholder value (such as an empty list) alongside the
existing "from", "to", and "path" keys, ensuring downstream consumers receive a
consistent dictionary structure regardless of whether networkx is available.
- Around line 363-369: The `run_ambiguity_detection` function assumes the LLM
response is always an `AmbiguityOutput` model instance, but when using
`method="json_schema"`, the LLM may return a dict instead. This causes an
`AttributeError` when accessing `.ambiguity_notes`. Add defensive handling
similar to what `run_semantic_typing` and `run_schema_summarization` do: check
if the result is a dict and extract the ambiguity_notes field accordingly (e.g.,
`result.get("ambiguity_notes", [])` for dict responses), while continuing to
handle model instances by accessing the `.ambiguity_notes` attribute as before.
In `@agent/tests/test_isolation.py`:
- Around line 39-52: The test imports evaluate_with_llm directly at the module
level, but patches core.llm.evaluate_with_llm, causing the patch to be bypassed
since the local binding is used instead. Additionally, the silent ImportError
catch masks that core.llm does not exist—the function is actually located in
backend/app/services/llm_judge.py. Fix this by: (1) removing the direct import
of evaluate_with_llm at the top of the try block, (2) importing the correct
module path where evaluate_with_llm actually lives
(backend/app/services/llm_judge.py), (3) updating the patch target to match the
actual module location, (4) removing the silent ImportError exception handler so
test failures are not hidden.
In `@backend/app/infra_init.py`:
- Line 27: The `import os` statement is not placed with the other standard
library imports at the top of the file, which violates Ruff's E402 rule (module
level import not at top of file). Move the `import os` import from line 27 to
the top of the file where the other standard library imports are grouped,
ensuring it is placed before any third-party or local imports.
- Around line 33-36: The environment variables _TRINO_HOST, _TRINO_PORT, and
_TRINO_USER are now being read from the environment but are not actually being
used when registering the OpenMetadata service. Find where the OpenMetadata
service registration occurs (likely a function that sets up the OM service
connection) and replace the hardcoded Trino endpoint values (currently hardcoded
as trino:8080) with the corresponding environment-driven values from _TRINO_HOST
and _TRINO_PORT to ensure OM service registration respects environment
overrides.
In `@backend/app/main.py`:
- Around line 43-45: Remove the hard-fail behavior from the OPENAI_API_KEY
validation during startup. Instead of raising RuntimeError immediately when the
key is missing, remove or comment out the raise statement so the application can
continue starting. Move the OPENAI_API_KEY validation to the judge-specific code
paths where it is actually used (not in the global startup sequence), so that
only requests requiring the judge feature will fail if the key is missing.
Alternatively, add a configuration flag that controls whether OPENAI_API_KEY
validation is enforced at startup versus only when judge endpoints are accessed.
In `@backend/app/routers/health.py`:
- Around line 215-219: The fallback logic in the _ping_esca function uses
client.load_head with a dummy ID to simulate health checks when ping() is
unavailable, which pollutes ESCA logs and metrics with unnecessary requests.
Instead of using this fallback, modify the function to fail fast with a clear,
descriptive error message if the client does not have a ping method available.
This will prevent dummy requests from being sent while making it clear that the
ESCA client needs to implement a proper ping method or health check endpoint.
- Around line 208-209: The hardcoded magic strings "dummy" and
"http://localhost:8000" in the condition checking for ESCA configuration
validation are fragile and could break if defaults change. Extract these
sentinel values as named constants in the settings module and import them into
this health check endpoint, or remove the localhost check entirely since
legitimate local development configurations should not be rejected. Update the
conditional logic to reference these constants instead of embedding the magic
strings directly.
- Around line 196-206: The nested try-except block attempting to import from
app.config is dead code since backend/app/config.py does not define ESCA_API_KEY
or ESCA_URL settings. Remove the second try block (the fallback import from
app.config) and instead directly set api_key and base_url to None in the outer
except block when the initial import from agent.config fails. This simplifies
the logic and accurately reflects that ESCA settings only exist in agent.config.
In `@backend/app/services/profiling_engine.py`:
- Around line 17-18: The import statement that retrieves `TrinoExecutionResult`
from `app.services.trino_client` is incorrect because that module is empty,
causing the timeout exception handler to crash at runtime. Change the import
source from `app.services.trino_client` to `core` (which is already imported in
the file for `execute_query_sync`). This ensures `TrinoExecutionResult` is
properly available with all required fields (`success`, `rows`, `columns`,
`error_message`) when the exception handler needs to return an error result.
In `@backend/pyproject.toml`:
- Line 13: The dependency specification for langfuse uses an overly broad
version constraint that allows major version upgrades (v3.x and v4.x) containing
breaking changes to APIs like client.trace() that your code depends on. Replace
the `langfuse>=2.0.0` constraint with a bounded range `>=2.56.2,<3.0.0` that
permits patch and minor version updates within the v2.x series while preventing
breaking major version upgrades.
In `@core/src/core/cache.py`:
- Around line 75-83: The current implementation buffers all delete operations
from the entire SCAN loop into a single pipeline and executes it once at the
end, which can cause memory and latency spikes for large match sets. Instead,
move the pipeline creation and execution inside the while loop in the
scan_delete_matching method. Create a new pipeline for each batch of keys
retrieved by SCAN, delete the keys in that batch immediately via pipe.execute(),
then continue to the next SCAN iteration. This bounds the memory and latency
cost per batch rather than accumulating everything until the final flush.
- Around line 134-139: The get_cache_service function silently ignores
mismatches between the provided redis_url argument and the URL used to create
the cached _cache_instance, which can route callers to an incorrect Redis
backend. Modify the function to detect when a different redis_url is supplied
than what the existing _cache_instance was initialized with, and either raise an
error to alert the caller of the mismatch or store the redis_url and validate
consistency on subsequent calls. This ensures that unexpected URL changes are
not silently ignored.
In `@core/src/core/config.py`:
- Around line 7-9: The LANGFUSE_BASE_URL configuration setting added in the
config.py file is not being consumed by the Langfuse handler creation in
core/src/core/langfuse.py, making the config setting non-functional. Update the
Langfuse handler instantiation to read and use the LANGFUSE_BASE_URL setting
from the configuration instead of relying on hardcoded defaults. Additionally,
ensure the configuration property name aligns with the environment variable
naming convention used in the deployment setup (docker-compose.yml uses
LANGFUSE_HOST), so either rename LANGFUSE_BASE_URL to LANGFUSE_HOST or ensure
both naming conventions are properly mapped during configuration initialization.
In `@core/src/core/langfuse.py`:
- Around line 11-12: The exception handler in the Langfuse CallbackHandler
initialization block currently uses logger.error() with string formatting, which
loses the full traceback context needed for debugging. Replace the
logger.error() call with logger.exception(), which automatically captures and
logs the complete stack trace along with the error message. This change should
be made in the except block that handles the Exception during Langfuse handler
initialization.
In `@core/src/core/trino.py`:
- Around line 72-75: The conditional check `if params:` at line 72 incorrectly
treats empty containers (empty list, dict, or tuple) as falsy values, causing
them to bypass the parameterized execute path even though they are valid
parameter values according to the type hint. Replace `if params:` with `if
params is not None:` to explicitly check for None while allowing empty
containers to be passed to the parameterized cur.execute(sql, params) call. This
ensures all valid parameter types—including empty sequences—are routed to the
correct execution path.
In `@docker-compose.yml`:
- Line 407: The backend service command at line 407 couples initialization jobs
(app.seed and app.infra_init) directly to the uvicorn startup, causing the API
server to fail if these jobs fail and re-running side effects on every container
restart. Create a separate one-shot initialization service in the
docker-compose.yml that runs only the app.seed and app.infra_init commands
independently, then modify the backend service command to run only the uvicorn
server (uvicorn app.main:app --host 0.0.0.0 --port 8000). Use docker-compose
depends_on or proper service ordering to ensure the init service completes
before the backend service attempts to start, keeping the backend focused solely
on serving traffic.
---
Outside diff comments:
In `@agent/pyproject.toml`:
- Around line 7-27: The dependency `python-core-utils` is declared as a source
mapping in the [tool.uv.sources] section but is missing from the
[project].dependencies list, which will cause ModuleNotFoundError in clean
environments where it is imported in main.py, routers/chat.py, and
nodes/schema_explorer.py. Add `python-core-utils` as an entry in the
dependencies list under [project], similar to how other local path dependencies
like `core` are included.
In `@backend/app/infra_init.py`:
- Around line 660-667: The _seed_trino_data function executes the INSERT
statement unconditionally even when the DELETE operation fails, causing
duplicate seed data on repeated executions. When the DELETE fails and the
exception is caught, before executing the seed_sql INSERT, add a count check
using a Trino query to verify if the table is empty. Only proceed with the
INSERT statement if the count is zero, otherwise skip the insert to maintain
idempotency.
In `@backend/app/services/langfuse_client.py`:
- Around line 88-103: The docstring for the LangfuseTracer class still lists the
removed method `get_prompt_as_langchain(name)` in the "Additional methods"
section. Locate the docstring in the LangfuseTracer class and remove the line
that references `get_prompt_as_langchain(name) — fetch and convert to a
LangChain prompt.` while keeping the reference to the remaining
`get_prompt(name)` method.
In `@core/src/core/models/models.py`:
- Line 325: The status field in EvalResult at line 325 is currently typed as an
unconstrained str, which allows invalid values to be persisted and contradicts
the inline comment that only lists "pass" | "fail" when actual usage includes
"pending". Define a StrEnum with the valid status values pending, pass, and
fail, then change the status field type annotation from str to use this new enum
type instead of the unconstrained string type.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: dee6554e-010f-48fd-89a8-c9a2cdf58c07
⛔ Files ignored due to path filters (2)
agent/uv.lockis excluded by!**/*.lockbackend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (57)
.gitignoreagent/Dockerfileagent/pyproject.tomlagent/src/agent/config.pyagent/src/agent/graph.pyagent/src/agent/llm.pyagent/src/agent/mcp_server.pyagent/src/agent/nodes/extractor.pyagent/src/agent/nodes/finalizer.pyagent/src/agent/nodes/query_builder.pyagent/src/agent/nodes/refiner.pyagent/src/agent/nodes/satisfaction_check.pyagent/src/agent/nodes/schema_explorer.pyagent/src/agent/routers/chat.pyagent/src/agent/state.pyagent/src/agent/utils/esca.pyagent/src/agent/utils/schema_enrichment.pyagent/src/agent/utils/sql.pyagent/tests/conftest.pyagent/tests/test_cache_and_gates.pyagent/tests/test_isolation.pyagent/tests/test_resilience.pyagent/tests/test_routing.pybackend/Dockerfilebackend/app/config.pybackend/app/infra_init.pybackend/app/main.pybackend/app/routers/admin_approval.pybackend/app/routers/admin_auth.pybackend/app/routers/audit.pybackend/app/routers/enrichment.pybackend/app/routers/evaluation.pybackend/app/routers/feedback.pybackend/app/routers/health.pybackend/app/routers/orchestration.pybackend/app/routers/publish.pybackend/app/routers/questions.pybackend/app/routers/scopes.pybackend/app/services/auth.pybackend/app/services/evaluator.pybackend/app/services/join_detection.pybackend/app/services/langfuse_client.pybackend/app/services/llm_judge.pybackend/app/services/profiling_engine.pybackend/app/services/scoring.pybackend/app/services/trino_client.pybackend/pyproject.tomlcore/pyproject.tomlcore/src/core/__init__.pycore/src/core/cache.pycore/src/core/config.pycore/src/core/langfuse.pycore/src/core/models/models.pycore/src/core/trino.pydocker-compose.ymlfrontend/.envfrontend/index.html
💤 Files with no reviewable changes (2)
- backend/app/services/trino_client.py
- backend/app/services/scoring.py
| agent/scratch/* | ||
| fix_agent_langfuse.py | ||
| fix_langfuse.py | ||
| python-core-utils/ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Tracked files under python-core-utils ==="
git ls-files | rg '^python-core-utils/' || true
echo
echo "=== Submodule status (if any) ==="
git submodule status || true
echo
echo "=== COPY references for python-core-utils ==="
rg -n 'COPY\s+python-core-utils' agent/Dockerfile backend/Dockerfile docker-compose.ymlRepository: StavPonte11/text2sql-onboarding
Length of output: 337
Both agent/Dockerfile and backend/Dockerfile reference an ignored directory.
python-core-utils/ is ignored in .gitignore yet both Dockerfiles attempt to COPY python-core-utils (agent/Dockerfile line 12, backend/Dockerfile line 13). The directory is neither tracked in git nor a submodule, so Docker builds will fail with a missing-path error in CI environments. Either unignore the directory and commit it, establish it as a git submodule, or remove the COPY directives if the directory should not be part of the image.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.gitignore at line 42, The python-core-utils/ directory is currently ignored
in .gitignore but both Dockerfiles attempt to copy it, which will fail during
Docker builds. Resolve this discrepancy by choosing one of three approaches: (1)
remove the python-core-utils/ entry from .gitignore and commit the directory to
the repository, (2) convert python-core-utils/ into a git submodule and update
both Dockerfiles to initialize it during the build, or (3) if the directory is
not actually needed in the Docker images, remove the COPY python-core-utils
directives from the Dockerfiles. Ensure the chosen approach is consistently
applied across all affected files so that Docker builds succeed in CI
environments.
| SATISFACTION_MIN_ROWS: int = 1 | ||
| SATISFACTION_MAX_ROWS: int = 50_000 | ||
| SATISFACTION_SEMANTIC_THRESHOLD: float = 0.75 | ||
| SATISFACTION_MAX_FAILURES: int = 2 # escalate to HITL after this many check failures | ||
|
|
||
| # ── G2-05: Redis Schema Cache ───────────────────────────────────────────── | ||
| SCHEMA_CACHE_TTL: int = 600 # seconds — DDL content | ||
| PROFILE_CACHE_TTL: int = 1800 # seconds — table profile statistics |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n agent/src/agent/config.py | head -100Repository: StavPonte11/text2sql-onboarding
Length of output: 3067
🏁 Script executed:
rg "SATISFACTION_MIN_ROWS|SATISFACTION_MAX_ROWS|SATISFACTION_SEMANTIC_THRESHOLD|SATISFACTION_MAX_FAILURES|SCHEMA_CACHE_TTL|PROFILE_CACHE_TTL" --type py -B 2 -A 2Repository: StavPonte11/text2sql-onboarding
Length of output: 5927
🏁 Script executed:
rg "validator|field_validator|model_validator" agent/src/agent/config.py -A 3Repository: StavPonte11/text2sql-onboarding
Length of output: 57
🏁 Script executed:
cat -n agent/src/agent/nodes/satisfaction_check.py | head -150Repository: StavPonte11/text2sql-onboarding
Length of output: 6850
Add bounds validation for satisfaction thresholds and cache TTLs.
These env-driven values currently accept invalid ranges (e.g., negative or zero TTLs, semantic threshold outside [0,1], or inverted row bounds), which silently degrade gate decisions and cache behavior:
SATISFACTION_MIN_ROWS≤ 0 causes the plausibility check to never failSATISFACTION_MAX_ROWS≤ 0 or <MIN_ROWSinverts the check logicSATISFACTION_SEMANTIC_THRESHOLDoutside[0,1]makes the check always pass or always failSATISFACTION_MAX_FAILURES≤ 0 breaks escalation logicSCHEMA_CACHE_TTLandPROFILE_CACHE_TTL≤ 0 fail on Redissetexoperations
Apply Pydantic Field constraints (already used elsewhere in this config class):
Proposed fix
- SATISFACTION_MIN_ROWS: int = 1
- SATISFACTION_MAX_ROWS: int = 50_000
- SATISFACTION_SEMANTIC_THRESHOLD: float = 0.75
- SATISFACTION_MAX_FAILURES: int = 2 # escalate to HITL after this many check failures
+ SATISFACTION_MIN_ROWS: int = Field(default=1, ge=0)
+ SATISFACTION_MAX_ROWS: int = Field(default=50_000, ge=1)
+ SATISFACTION_SEMANTIC_THRESHOLD: float = Field(default=0.75, ge=0.0, le=1.0)
+ SATISFACTION_MAX_FAILURES: int = Field(default=2, ge=1) # escalate to HITL after this many check failures
@@
- SCHEMA_CACHE_TTL: int = 600 # seconds — DDL content
- PROFILE_CACHE_TTL: int = 1800 # seconds — table profile statistics
+ SCHEMA_CACHE_TTL: int = Field(default=600, gt=0) # seconds — DDL content
+ PROFILE_CACHE_TTL: int = Field(default=1800, gt=0) # seconds — table profile statistics🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agent/src/agent/config.py` around lines 51 - 58, Add Pydantic Field
constraints to validate the satisfaction and cache configuration values in the
config class. For SATISFACTION_MIN_ROWS and SATISFACTION_MAX_ROWS, ensure they
are positive integers with MIN_ROWS less than MAX_ROWS. For
SATISFACTION_SEMANTIC_THRESHOLD, constrain it to the range [0, 1]. For
SATISFACTION_MAX_FAILURES, ensure it is a positive integer. For SCHEMA_CACHE_TTL
and PROFILE_CACHE_TTL, ensure both are positive integers. Use Pydantic
validators like gt (greater than), ge (greater than or equal), le (less than or
equal), and Field constraints that match the patterns already established
elsewhere in this config class.
| columns = ( | ||
| list(preview_rows[0].keys()) | ||
| if preview_rows and isinstance(preview_rows[0], dict) | ||
| else [] | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the structure of inline_result_rows from refiner
rg -n "inline_result_rows" -A3 -B3 agent/src/agent/nodes/refiner.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 569
🏁 Script executed:
cat -n agent/src/core/trino.py | head -100Repository: StavPonte11/text2sql-onboarding
Length of output: 131
🏁 Script executed:
rg -n "def.*rows" agent/src/core/trino.py -A5Repository: StavPonte11/text2sql-onboarding
Length of output: 139
🏁 Script executed:
rg -n "result = " agent/src/agent/nodes/refiner.py -B5 -A2Repository: StavPonte11/text2sql-onboarding
Length of output: 640
🏁 Script executed:
find agent -name "*.py" -type f | head -20Repository: StavPonte11/text2sql-onboarding
Length of output: 724
🏁 Script executed:
rg -n "execute_query_sync" -A10Repository: StavPonte11/text2sql-onboarding
Length of output: 11234
🏁 Script executed:
rg -n "class.*Result|def.*rows" --type py | grep -i trinoRepository: StavPonte11/text2sql-onboarding
Length of output: 140
🏁 Script executed:
cat -n core/src/core/trino.py | sed -n '17,100p'Repository: StavPonte11/text2sql-onboarding
Length of output: 3461
🏁 Script executed:
rg -n "rows.*=" core/src/core/trino.py -A2 -B2Repository: StavPonte11/text2sql-onboarding
Length of output: 812
Column extraction fails: isinstance(preview_rows[0], dict) check will always be False for inline result rows.
inline_result_rows comes from result.rows, which is typed as list[list[Any]] in TrinoExecutionResult (core/src/core/trino.py:19) and populated as [list(r) for r in rows] (line 84). The code at lines 73–74 checks isinstance(preview_rows[0], dict), but each row is a list, not a dict. This means columns will always be [] when falling back to inline results, leaving the preview without column headers.
The test mock in agent/tests/conftest.py:119 incorrectly defines rows as dicts, masking this bug.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agent/src/agent/nodes/finalizer.py` around lines 72 - 76, The isinstance
check for preview_rows[0] in the column extraction logic is checking if each row
is a dict, but the result.rows structure from TrinoExecutionResult contains
lists, not dicts. Fix the isinstance check to verify that preview_rows[0] is a
list instead of a dict. Additionally, update the test mock in conftest.py that
incorrectly defines rows as dicts to instead define rows as lists to match the
actual TrinoExecutionResult structure, which will ensure the column extraction
logic is properly tested with the correct data type.
| from agent.langfuse_client import langfuse_client | ||
| from agent.langfuse_client import langfuse_client |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Duplicate import statement.
langfuse_client is imported twice from the same module.
♻️ Remove duplicate import
from agent.langfuse_client import langfuse_client
-from agent.langfuse_client import langfuse_client📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from agent.langfuse_client import langfuse_client | |
| from agent.langfuse_client import langfuse_client | |
| from agent.langfuse_client import langfuse_client |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agent/src/agent/nodes/refiner.py` around lines 8 - 9, Remove the duplicate
import statement for langfuse_client from the agent.langfuse_client module in
the refiner.py file. Keep only a single import statement importing
langfuse_client from agent.langfuse_client and delete the redundant duplicate
line.
| rows = state.get("inline_result_rows") or [] | ||
| columns: list[str] = [] | ||
|
|
||
| # Attempt to derive column names from the first result row | ||
| if rows and isinstance(rows[0], dict): | ||
| columns = list(rows[0].keys()) |
There was a problem hiding this comment.
Column extraction assumes dict rows but inline rows are lists.
Same issue as finalizer.py: the code assumes rows[0] is a dict (line 47-48), but inline_result_rows from refiner_node contains list rows per Trino's return format. The columns list will be empty, causing Check C and Check D to be skipped (since they guard on and columns).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agent/src/agent/nodes/satisfaction_check.py` around lines 43 - 48, The column
extraction logic in the satisfaction_check function incorrectly assumes rows are
dictionaries when they are actually lists (as returned by Trino through
refiner_node). Instead of checking if rows[0] is a dict and calling .keys(), you
need to handle the case where rows are lists and extract column names from the
state's column metadata or adjust the logic to work with list-based rows. This
will ensure the columns list is properly populated so that Check C and Check D
are not skipped due to the empty columns guard condition in their conditional
statements.
| def get_cache_service(redis_url: str) -> CacheService: | ||
| """Return a module-level singleton CacheService.""" | ||
| global _cache_instance | ||
| if _cache_instance is None: | ||
| _cache_instance = CacheService(redis_url) | ||
| return _cache_instance |
There was a problem hiding this comment.
Do not silently ignore redis_url after first singleton initialization.
At Line 134-139, get_cache_service(redis_url) always returns the first-created instance, even if a different redis_url is later supplied. That can silently route callers to the wrong Redis backend.
Suggested change
-_cache_instance: CacheService | None = None
+_cache_instances: dict[str, CacheService] = {}
def get_cache_service(redis_url: str) -> CacheService:
- """Return a module-level singleton CacheService."""
- global _cache_instance
- if _cache_instance is None:
- _cache_instance = CacheService(redis_url)
- return _cache_instance
+ """Return one CacheService singleton per Redis URL."""
+ if redis_url not in _cache_instances:
+ _cache_instances[redis_url] = CacheService(redis_url)
+ return _cache_instances[redis_url]🧰 Tools
🪛 Ruff (0.15.17)
[warning] 136-136: Using the global statement to update _cache_instance is discouraged
(PLW0603)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@core/src/core/cache.py` around lines 134 - 139, The get_cache_service
function silently ignores mismatches between the provided redis_url argument and
the URL used to create the cached _cache_instance, which can route callers to an
incorrect Redis backend. Modify the function to detect when a different
redis_url is supplied than what the existing _cache_instance was initialized
with, and either raise an error to alert the caller of the mismatch or store the
redis_url and validate consistency on subsequent calls. This ensures that
unexpected URL changes are not silently ignored.
| LANGFUSE_PUBLIC_KEY: str = "" | ||
| LANGFUSE_SECRET_KEY: str = "" | ||
| LANGFUSE_BASE_URL: str = "https://cloud.langfuse.com" |
There was a problem hiding this comment.
Align Langfuse host setting with actual runtime wiring.
At Line 7-9, LANGFUSE_BASE_URL is added, but core/src/core/langfuse.py does not consume these settings when creating the handler. This creates a non-functional config surface and drifts from deployment env naming (docker-compose.yml Line 431 uses LANGFUSE_HOST).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@core/src/core/config.py` around lines 7 - 9, The LANGFUSE_BASE_URL
configuration setting added in the config.py file is not being consumed by the
Langfuse handler creation in core/src/core/langfuse.py, making the config
setting non-functional. Update the Langfuse handler instantiation to read and
use the LANGFUSE_BASE_URL setting from the configuration instead of relying on
hardcoded defaults. Additionally, ensure the configuration property name aligns
with the environment variable naming convention used in the deployment setup
(docker-compose.yml uses LANGFUSE_HOST), so either rename LANGFUSE_BASE_URL to
LANGFUSE_HOST or ensure both naming conventions are properly mapped during
configuration initialization.
| except Exception as e: | ||
| logger.error(f"Failed to initialize Langfuse CallbackHandler: {e}") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd core && head -20 src/core/langfuse.py | cat -nRepository: StavPonte11/text2sql-onboarding
Length of output: 596
🏁 Script executed:
cd core && sed -n '1,30p' src/core/langfuse.py | cat -nRepository: StavPonte11/text2sql-onboarding
Length of output: 596
🏁 Script executed:
cd core && rg "logger\." src/core/langfuse.py -A 2 -B 2Repository: StavPonte11/text2sql-onboarding
Length of output: 231
Log the full traceback when Langfuse handler initialization fails.
At line 12, only the exception message is logged. Using logger.error() with string formatting loses the full traceback context, making failures difficult to diagnose. Use logger.exception() instead, which automatically captures the complete stack trace.
Proposed fix
except Exception as e:
- logger.error(f"Failed to initialize Langfuse CallbackHandler: {e}")
+ logger.exception("Failed to initialize Langfuse CallbackHandler")
return None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception as e: | |
| logger.error(f"Failed to initialize Langfuse CallbackHandler: {e}") | |
| except Exception as e: | |
| logger.exception("Failed to initialize Langfuse CallbackHandler") |
🧰 Tools
🪛 Ruff (0.15.17)
[warning] 11-11: Do not catch blind exception: Exception
(BLE001)
[warning] 12-12: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
[warning] 12-12: Logging statement uses f-string
(G004)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@core/src/core/langfuse.py` around lines 11 - 12, The exception handler in the
Langfuse CallbackHandler initialization block currently uses logger.error() with
string formatting, which loses the full traceback context needed for debugging.
Replace the logger.error() call with logger.exception(), which automatically
captures and logs the complete stack trace along with the error message. This
change should be made in the except block that handles the Exception during
Langfuse handler initialization.
Source: Linters/SAST tools
| if params: | ||
| cur.execute(sql, params) | ||
| else: | ||
| cur.execute(sql) |
There was a problem hiding this comment.
Use explicit None check for parameterized execution.
Line 72 uses if params:. Empty containers ([], ()) are valid according to the function's type hint (tuple | dict | list | None) but currently route to cur.execute(sql) instead of the parameterized path since empty sequences evaluate to False in Python.
Change to if params is not None: to ensure all non-None parameter values—including empty containers—are passed to the parameterized execute path.
Proposed fix
- if params:
+ if params is not None:
cur.execute(sql, params)
else:
cur.execute(sql)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if params: | |
| cur.execute(sql, params) | |
| else: | |
| cur.execute(sql) | |
| if params is not None: | |
| cur.execute(sql, params) | |
| else: | |
| cur.execute(sql) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@core/src/core/trino.py` around lines 72 - 75, The conditional check `if
params:` at line 72 incorrectly treats empty containers (empty list, dict, or
tuple) as falsy values, causing them to bypass the parameterized execute path
even though they are valid parameter values according to the type hint. Replace
`if params:` with `if params is not None:` to explicitly check for None while
allowing empty containers to be passed to the parameterized cur.execute(sql,
params) call. This ensures all valid parameter types—including empty
sequences—are routed to the correct execution path.
| secrets: | ||
| - deploy_key | ||
| # command: sh -c "python -m app.seed && python -m app.infra_init && uvicorn app.main:app --host 0.0.0.0 --port 8000" | ||
| command: sh -c "python -m app.seed && python -m app.infra_init && uvicorn app.main:app --host 0.0.0.0 --port 8000" |
There was a problem hiding this comment.
Decouple initialization jobs from API startup.
At Line 407, app.seed and app.infra_init are hard-coupled to backend boot. Any failure/latency in either job keeps uvicorn down, and every container restart re-runs initialization side effects. Move these to a one-shot init service/job (or guard them for strict idempotency) and keep backend command focused on serving traffic.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docker-compose.yml` at line 407, The backend service command at line 407
couples initialization jobs (app.seed and app.infra_init) directly to the
uvicorn startup, causing the API server to fail if these jobs fail and
re-running side effects on every container restart. Create a separate one-shot
initialization service in the docker-compose.yml that runs only the app.seed and
app.infra_init commands independently, then modify the backend service command
to run only the uvicorn server (uvicorn app.main:app --host 0.0.0.0 --port
8000). Use docker-compose depends_on or proper service ordering to ensure the
init service completes before the backend service attempts to start, keeping the
backend focused solely on serving traffic.
| LANGFUSE_PROMPT_REJECTION_ROUTER: str = "text2sql/rejection_router" | ||
|
|
||
| # ── G2-01: Table Scoping ────────────────────────────────────────────────── | ||
| TABLE_SCOPING_MODE: Literal["strict", "hybrid"] = "hybrid" |
There was a problem hiding this comment.
I think this should be on the request level and not on global level. so you can pass the configuration as argument when invoking the agent
There was a problem hiding this comment.
realized it acts as default - should be called default table scoping mode then
| in graph.py inspects `satisfaction_failures` to decide the next node. | ||
| """ | ||
| # ── Global gate ─────────────────────────────────────────────────────────── | ||
| if not settings.SATISFACTION_CHECK_ENABLED: |
There was a problem hiding this comment.
shouldn't this happen in the router function?
| if settings.SATISFACTION_CHECK_PLAUSIBILITY: | ||
| n = len(rows) | ||
| if n < settings.SATISFACTION_MIN_ROWS: | ||
| failures.append( | ||
| f"[CHECK_B] Result returned {n} rows — below minimum {settings.SATISFACTION_MIN_ROWS}." | ||
| ) | ||
| elif n > settings.SATISFACTION_MAX_ROWS: | ||
| failures.append( | ||
| f"[CHECK_B] Result returned {n} rows — exceeds maximum {settings.SATISFACTION_MAX_ROWS}." | ||
| ) |
There was a problem hiding this comment.
are we sure this is an expected behavior?
| logger = logging.getLogger(__name__) | ||
|
|
||
| # Fixed TTL for catalog pre-flight validation cache (not configurable) | ||
| CATALOG_VALID_TTL = 300 |
| self._redis: aioredis.Redis = aioredis.from_url( | ||
| redis_url, | ||
| decode_responses=False, # return raw bytes so callers control decoding | ||
| socket_connect_timeout=2, | ||
| socket_timeout=2, | ||
| ) |
There was a problem hiding this comment.
in the python-core-utils package, we have some redis stuff. we should use it for the connection, and its also worth considering wether to move some of the stuff here to there
| return _langfuse_handler | ||
| """FastAPI dependency to inject an isolated Langfuse CallbackHandler.""" | ||
| try: | ||
| return CallbackHandler() |
There was a problem hiding this comment.
are the env vars the default vars, allowing them to be removed?
| _cache = get_cache_service(settings.REDIS_URL) | ||
|
|
||
| # G2-02 limits | ||
| MAX_SCHEMA_RETRIES = 3 |
|
|
||
| human_message = ( | ||
| f"Question: {user_query}\n" | ||
| f"Query Enrichments (extra context for ambiguous terms): {json.dumps(enrichments)}" |
There was a problem hiding this comment.
יש פה לא מעט דברים שנכנסים לפרומפט - אולי שווה לעשות עם פרומפט טמפלייט או לשמור חלק בלנגפיוז
|
|
||
| class SemanticAnnotation(BaseModel): | ||
| table_column: str = Field(description="The full table_name.column_name identifier") | ||
| semantic_type: str = Field(description="Must be one of: id | timestamp | category | metric | text | geo | unknown") |
There was a problem hiding this comment.
enum for data validation?
| col_list.append(f"{tname}.{col['name']} ({col.get('type', '?')})") | ||
|
|
||
| prompt_text = ( | ||
| "Classify each column with one of: id | timestamp | category | metric | text | geo | unknown.\n" |
There was a problem hiding this comment.
maybe we should run it once in the profiling instead of in each invocation?
There was a problem hiding this comment.
relevant to most of the functions on this file
| The human then calls graph.update_state() to inject corrected state and | ||
| clears sql_query / last_error / trino_error / escalated / escalation_reason. |
There was a problem hiding this comment.
Im not sure that what happens?
There was a problem hiding this comment.
also im not sure clearing the state is the right call
|
|
||
| memory = MemorySaver() | ||
| agent_graph = workflow.compile(checkpointer=memory) | ||
| agent_graph = workflow.compile( |
There was a problem hiding this comment.
it would be really nice to add a graph of the graph to the readme (so I can also see visually what it looks like)
Summary of Work
This PR introduces the G2 architecture refactors and performance improvements:
KeyError: 'table_column'crashes.Open Questions & Future Actions
test_tts_g1_02_langfuse_handler_isolation) and LLM judge tests currently run in the agent repository's test suite. Should these tests be migrated tobackend/tests/since they validate API endpoints and backend services?llama3.2:1b) fails to handle Ollama's structured grammar JSON constraints, crashing the runner process withunexpected EOF. Should we restrict Advanced Schema Explorer features (like semantic typing and schema summarization) to larger models (e.g.,gemma4:e4b), or fallback to unstructured text parsing for small local models?TABLE_NOT_FOUNDerrors when querying tables belonging to non-default schemas (e.g.,complex_retail.customers), should the query builder dynamically prefix tables using metadata registered in the database, instead of relying solely onTRINO_SCHEMAenv defaults?Summary by CodeRabbit
Release Notes
New Features
Improvements
Configuration