feat: implement feature flags & execution modes builder#18
Conversation
📝 WalkthroughWalkthroughThis PR adds backend-managed feature flags and execution modes, wires them into agent runtime setup and graph routing, expands schema/refinement safeguards and health checks, updates build inputs and dependency sources, and adds an admin UI for managing flags and modes. ChangesFlags platform and agent runtime hardening
Sequence Diagram(s)sequenceDiagram
participant Admin
participant FlagsPage
participant FlagsAPI as Backend /api/flags
participant FlagService
participant Agent
participant FlagBridge
Admin->>FlagsPage: edit flag or mode
FlagsPage->>FlagsAPI: PATCH /flags/{name} or PUT /flags/modes/{name}
FlagsAPI->>FlagService: set(...) / upsert_mode(...)
Agent->>FlagBridge: resolve_flags(execution_mode)
FlagBridge->>FlagsAPI: GET /flags/map
FlagBridge->>FlagsAPI: GET /flags/modes/{execution_mode}
FlagBridge-->>Agent: merged runtime_flags
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 44
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
backend/app/infra_init.py (1)
652-667:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSeed path is no longer idempotent when
DELETEfails.If
DELETE FROM <table>fails, the code still unconditionally runsINSERT, so repeated startups can duplicate seed rows. That violates the module’s idempotency contract and can corrupt test/demo datasets.Suggested fix
def _seed_trino_data() -> None: @@ for table in _TABLES: try: + should_seed = True try: _trino_exec(f"DELETE FROM {table['fqn']}") except Exception as e: logger.debug("DELETE on %s failed: %s", table["fqn"], e) + # Preserve idempotency when delete is unavailable + rows = _trino_exec(table["count_sql"], ignore_errors=True) + existing = rows[0][0] if rows else 0 + should_seed = existing == 0 - _trino_exec(table["seed_sql"]) - logger.info("[InfraInit] Seeded sample data into '%s' ✓", table["fqn"]) + if should_seed: + _trino_exec(table["seed_sql"]) + logger.info("[InfraInit] Seeded sample data into '%s' ✓", table["fqn"]) + else: + logger.info("[InfraInit] Skipped seeding '%s' (already populated)", table["fqn"]) except Exception as exc: logger.warning( "[InfraInit] Could not seed '%s' (non-fatal): %s", table["fqn"], exc )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/infra_init.py` around lines 652 - 667, The _seed_trino_data function violates idempotency when the DELETE statement fails because it unconditionally executes the INSERT via _trino_exec(table["seed_sql"]) even when the delete fallback is triggered. When DELETE fails, you should fall back to checking if the table already contains data (using a count check as mentioned in the comment) before deciding whether to insert the seed data, rather than unconditionally proceeding with the INSERT after catching the DELETE exception.backend/app/services/llm_judge.py (1)
129-139:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFail fast when
OPENAI_API_KEYis empty.Line 129 now always proceeds to outbound HTTP even when the key is missing/blank, then fails later. Add a preflight return to avoid unnecessary remote calls and produce a deterministic local failure.
Suggested patch
api_key = settings.OPENAI_API_KEY + if not api_key: + return JudgeOutput( + table_selection_correctness=0.0, + sql_semantic_equivalence=0.0, + result_correctness=0.0, + hallucination_detected=False, + failure_type="judge_unavailable", + reasoning={"error": "OPENAI_API_KEY is missing"}, + confidence_in_judgment=0.0, + ) try: # Real LLM Execution via OpenAI API🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/services/llm_judge.py` around lines 129 - 139, The code retrieves the OPENAI_API_KEY and proceeds directly to making an HTTP request without validating that the key is actually present. Add a preflight check immediately after the api_key assignment from settings.OPENAI_API_KEY to verify the key is not empty or None, and return early with an appropriate error response if validation fails. This prevents unnecessary remote calls and ensures the failure happens locally with deterministic behavior before entering the try block with the httpx.Client.core/src/core/models/models.py (1)
325-335:⚠️ Potential issue | 🟠 Major | ⚡ Quick winConstrain
EvalResult.statusto a closed set.Using plain
strhere allows invalid persisted statuses and can skew pass/fail aggregations and downstream routing logic.Suggested fix
+class EvalResultStatus(StrEnum): + pass_ = "pass" + fail = "fail" + class EvalResult(SQLModel, table=True): @@ - status: str # "pass" | "fail" + status: EvalResultStatus class EvalResultRead(SQLModel): @@ - status: str + status: EvalResultStatus🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/src/core/models/models.py` around lines 325 - 335, The status field in both the EvalResult and EvalResultRead classes is defined as a plain str type, which allows any arbitrary string value to be persisted. Constrain the status field to only accept valid values by replacing the str type with a Literal type or enum that explicitly defines the allowed values (such as "pass" and "fail"). Apply this constraint to the status field in both the main model class and the EvalResultRead SQLModel class to ensure type safety and prevent invalid statuses from being persisted.docker-compose.yml (1)
412-438:⚠️ Potential issue | 🟠 Major | ⚡ Quick winWire backend Redis to the compose Redis service.
Backend now depends on flags/cache wiring but has no
REDIS_URLoverride, so it will fall back toredis://localhost:6379frombackend/app/config.pyinside the container.backend/app/routers/flags.pyconstructsFlagServicefrom this value, so cache/flags calls can fail at runtime.Suggested fix
backend: @@ environment: - DATABASE_URL=postgresql://postgres:postgres@db:5432/text2sql + - REDIS_URL=redis://redis:6379 @@ depends_on: db: condition: service_healthy + redis: + condition: service_healthy trino: condition: service_healthy openmetadata-server: condition: service_healthy🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docker-compose.yml` around lines 412 - 438, Add the REDIS_URL environment variable to the backend service's environment section in the docker-compose configuration. The backend service currently lacks this configuration and will fall back to redis://localhost:6379 from backend/app/config.py, which will fail inside the container since localhost doesn't resolve to the Redis service. Set REDIS_URL to point to the Redis service using the docker-compose service name, such as redis://redis:6379, so that FlagService in backend/app/routers/flags.py can properly initialize cache and flags functionality.agent/src/agent/routers/chat.py (1)
26-33:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
execution_modeis not propagated into agent state.
init_flags_noderesolves mode overrides fromstate["execution_mode"], but this endpoint never accepts or forwards that field for new invocations.Suggested fix
class ChatRequest(BaseModel): @@ hitl_enabled: bool = True + execution_mode: str | None = None @@ result = await agent_graph.ainvoke( { "user_query": request.query, "allowed_tables": request.allowed_tables, "allowed_statuses": request.allowed_statuses, "active_extractors": active_extractors, "non_interactive": not request.hitl_enabled, + "execution_mode": request.execution_mode, }, config=config, )Also applies to: 136-143
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/src/agent/routers/chat.py` around lines 26 - 33, The ChatRequest model class is missing an execution_mode field that is required by the init_flags_node function which resolves mode overrides from state["execution_mode"]. Add an execution_mode parameter to the ChatRequest BaseModel (similar to how query, thread_id, and resume_value are defined) with an appropriate type and default value, and ensure that this field is propagated into the agent state when creating new invocations. The same fix also needs to be applied to the related code block at lines 136-143 in the same file.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.gitignore:
- Line 42: The entry python-core-utils/ in the .gitignore file at line 42
prevents the directory from being tracked in version control, but both the
agent/ and backend/ Dockerfiles and the pyproject.toml files require this
directory as a path dependency to build and sync correctly. Remove the
python-core-utils/ line from .gitignore to allow the directory to be tracked in
the repository, or if this is an external dependency, update the ignore rule and
ensure the dependency is provided through an alternative method such as a Git
submodule or published package.
In `@agent/src/agent/graph.py`:
- Around line 4-6: The docstring at the beginning of the file describing the
node order is outdated and does not match the actual graph implementation.
Update the docstring to include all intermediate nodes between validate_config
and extractor. Specifically, add init_flags and init_skills nodes to the flow so
the documented order reads START → validate_config → init_flags → init_skills →
extractor → schema_explorer instead of skipping directly from validate_config to
extractor. This will match the actual node connections defined in the graph
construction code.
In `@agent/src/agent/llm.py`:
- Around line 52-55: The temperature flag value is being directly cast to float
without error handling, which will raise a ValueError if the flag contains a
malformed value like an empty string or non-numeric text. Wrap the float
conversion in a try-except block to defensively handle ValueError exceptions,
defaulting to 0.0 when conversion fails. Apply this error handling to the
temperature assignment where float(flags.get(temp_flag, 0.0)) is called,
optionally logging a warning when an invalid value is encountered.
In `@agent/src/agent/mcp_server.py`:
- Around line 60-76: The table validation logic uses a nested loop that iterates
through all_tables for each entry in allowed_tables, creating O(n×m) complexity.
Instead of this approach, build a single lookup set once after fetching
all_tables that contains all valid table identifiers (the table id, name, and
schema_name.name combinations). Then iterate through allowed_tables and check
each allowed entry against this pre-built lookup set in constant time. This will
reduce the complexity to O(n+m) by eliminating the inner loop that searches
through all_tables for each allowed table entry.
In `@agent/src/agent/nodes/extractor.py`:
- Around line 51-54: The code in the LLMExtractor class calls
`.get_langchain_prompt()` on the result of `langfuse_client.get_prompt()`
without validating that the result is not None. Add a guard check after the
`langfuse_client.get_prompt(settings.LANGFUSE_PROMPT_EXTRACTOR)` call to ensure
the returned prompt is not None before attempting to call
`.get_langchain_prompt()` on it. If the prompt is None or the lookup fails,
raise an appropriate error to prevent the LLMExtractor construction from
proceeding with invalid state.
In `@agent/src/agent/nodes/finalizer.py`:
- Line 10: The module-level llm assignment using get_llm("finalizer") is frozen
at import time before state["runtime_flags"] exists, preventing runtime flag
overrides from being applied. Move the get_llm("finalizer") call from the module
level into each function or method that uses it (referenced in the locations
around lines 56-57 and 96-99) so that the LLM selection happens at execution
time when runtime flags are available and can be applied for model overrides.
- Around line 49-54: The issue is that langfuse_client.get_prompt() method calls
throw exceptions when prompt retrieval fails, with no fallback mechanism. Add
the fallback parameter to each get_prompt() call across all affected node files
to provide default prompts when Langfuse is unavailable, or wrap the
get_prompt() calls in try-except blocks with fallback prompt templates to ensure
the agent continues operating when prompt fetching fails. This applies to all
instances where get_prompt() is used to retrieve prompts like langfuse_prompt
for SQL explanation and other prompt retrievals.
In `@agent/src/agent/nodes/init_skills.py`:
- Line 8: The module-level singleton _skill_registry = SkillRegistry() creates
shared mutable state that can cause race conditions across concurrent requests,
especially when _skill_registry.redis is modified. Instead of creating the
SkillRegistry instance at module scope, refactor the code to create a new
instance for each invocation or request, or use a factory pattern that manages
instances on a per-request basis. This ensures that concurrent requests do not
share or interfere with each other's SkillRegistry state and redis client
configuration.
- Around line 20-23: The issue is in how skills_enabled, hot_reload, and
cache_ttl values are being parsed from runtime_flags using direct bool() and
int() conversions. Direct bool() on string values like "false" incorrectly
converts to True, and int() can raise exceptions on None or invalid strings
before proper error handling occurs. Fix this by implementing safe parsing: for
boolean flags, explicitly check if the string value equals "true" or "false"
(case-insensitive) rather than using bool() directly, and for the cache_ttl
integer, wrap the int() conversion in a try-except block to gracefully fall back
to the settings default when parsing fails, ensuring all three flag assignments
handle invalid or missing values safely before they reach the rest of the node
logic.
In `@agent/src/agent/nodes/refiner.py`:
- Around line 67-70: The code calls get_langchain_prompt() on the result of
langfuse_client.get_prompt() without validating that the prompt was successfully
retrieved. If the prompt fetch fails and returns None, the subsequent method
call will crash instead of handling the error gracefully. Add a null check after
the langfuse_client.get_prompt() call to verify that langfuse_prompt is not None
before calling get_langchain_prompt() on it. If the prompt is None, handle this
error appropriately (log the error and either use a fallback or raise a
meaningful exception) to ensure the refiner can recover safely.
- Line 38: The code directly casts the result of runtime_flags.get() to an
integer without validating the retrieved value, which will raise an exception if
the runtime flag is malformed, nullable, or not a valid integer. Before the
int() cast in the max_iterations assignment, validate that the retrieved value
from runtime_flags.get("MAX_REFINER_ITERATIONS", MAX_REFINER_ITERATIONS) is not
None and is a valid integer string or numeric value, and provide appropriate
fallback handling (such as using the default MAX_REFINER_ITERATIONS constant) if
validation fails.
- Around line 55-60: The stop condition for the refiner node has an off-by-one
error that causes it to exceed the configured maximum iterations. In the
refiner.py file where the refinement_count condition is checked, change the
comparison operator from `if count >= max_iterations:` to `if count >
max_iterations:` to ensure the node stops at exactly the maximum number of
iterations rather than one iteration beyond it. This will prevent the
refinement_count from exceeding the intended limit.
- Around line 97-99: The json.dumps(payload_data) call in the payload
serialization is occurring before the try-except block for ESCA error handling,
which means non-serializable Trino values can crash the process even when query
execution succeeds. Move the line containing json.dumps(payload_data).encode()
and the raw_ref assignment inside the try block that handles ESCA exceptions, so
serialization errors are properly caught and handled along with other
query-related exceptions.
In `@agent/src/agent/nodes/satisfaction_check.py`:
- Around line 86-91: The structured output handling in both the
ColumnCoverageOutput block (lines 86-91) and the corresponding Check D block
(lines 106-112) assumes the result is a model instance, but some LLM providers
may return dicts instead. When attribute access like result.satisfies_question
or result.reason is attempted on a dict, it throws an AttributeError which gets
silently caught, allowing low-quality outputs to pass validation. Validate that
the result returned from ainvoke is actually an instance of ColumnCoverageOutput
(and its Check D equivalent) before accessing attributes. If the result is not
the expected type, handle it explicitly rather than relying on exception
catching to skip the validation.
- Around line 67-76: The min_rows and max_rows variables obtained from runtime
flags may arrive as strings, causing TypeError when compared with the integer n
in the satisfaction check comparisons. Convert the results of the _f() calls for
SATISFACTION_MIN_ROWS and SATISFACTION_MAX_ROWS to integers to ensure type
compatibility before they are used in the conditional expressions with n.
In `@agent/src/agent/nodes/schema_explorer.py`:
- Around line 410-416: The strict mode instructions added to the prompt (around
the scoping_mode == "strict" check with allowed_tables) are not actually
enforced during plan validation. The table verification logic that follows
(around line 556+) only checks table existence but does not validate that tables
belong to the allowed_tables allowlist when in strict mode. Add a validation
check in the table verification section that ensures when scoping_mode is
"strict", all tables in the generated plan are contained within the
allowed_tables list, raising an appropriate error if a disallowed table is
detected.
- Around line 323-355: The numeric runtime flags profile_fetch_concurrency and
max_profiles_to_fetch are converted directly with int() without validation. Add
validation after the int conversion to ensure profile_fetch_concurrency is a
positive integer (since it is passed to asyncio.Semaphore which requires a
positive value) and max_profiles_to_fetch is non-negative. Wrap the int()
conversions in try-except blocks to handle invalid values that cannot be
converted to integers, falling back to the default settings values in such
cases. This prevents ValueError crashes from invalid inputs and ensures
Semaphore receives a valid positive concurrency value.
In `@agent/src/agent/utils/flag_bridge.py`:
- Around line 132-133: The `overrides` dictionary retrieved from
`mode_data.get("flag_overrides")` can contain `None` values that will overwrite
valid defaults when merged into `resolved` using the
`resolved.update(overrides)` call. Filter out `None` values from the `overrides`
dictionary before the update operation by only including key-value pairs where
the value is not `None`, similar to how DB overrides are handled elsewhere in
the codebase. This prevents null values from breaking downstream casts and
usages that depend on the default values.
In `@agent/src/agent/utils/jeen_client.py`:
- Around line 12-21: In the __init__ method of the JeenSkillClient class, the
code calls .rstrip("/") on settings.JEEN_LLM_CORE_URL at line 13 before checking
if it is None, which will cause an AttributeError if the setting is missing. Add
a null check to ensure settings.JEEN_LLM_CORE_URL is not None before calling
.rstrip("/") on it. You can assign the base_url conditionally or use a ternary
operator to handle the None case, such as assigning an empty string or None to
self.base_url when the setting is missing.
In `@agent/src/agent/utils/schema_enrichment.py`:
- Around line 190-200: The join-path discovery logic currently only loads
relationships in one direction. In the ForeignKeyMapping query around line 190,
expand the where condition to also include cases where the referenced table (not
just table_id) is in the candidate table_ids. Similarly, in the
CrossTableProfile query around line 198, also check for entries where the target
table (not just source_table_id) is in table_ids. Additionally, the path search
logic starting around line 223 and line 247 only searches directed paths from
source to target, so expand these searches to also check the reverse direction
where candidate tables appear as targets in the relationships, ensuring
bidirectional relationship discovery.
- Around line 364-367: The code at the point where result is obtained from
structured.ainvoke(prompt) assumes result is always a model object with an
ambiguity_notes attribute, but some LLM providers return a dictionary instead,
causing an AttributeError that silently fails and returns an empty list. Modify
the code after the ainvoke call to check if result is a dictionary and extract
ambiguity_notes using bracket notation (result["ambiguity_notes"]), or if it's a
model object, use dot notation (result.ambiguity_notes). This pattern should
match how other phases in this module handle dict-or-model results from
structured outputs.
In `@agent/src/agent/utils/skill_registry.py`:
- Around line 56-63: In the for loop iterating over fetched skills in the
caching block, add validation to check if the 'id' key exists in each skill
dictionary before accessing skill['id'] on the line that builds the cache key.
If a skill lacks the 'id' field, skip that skill and optionally log a warning
about the malformed skill object, ensuring the redis pipeline continues
processing valid skills without failing silently on KeyError exceptions.
In `@agent/tests/conftest.py`:
- Around line 68-77: The MockRedisPipeline class is missing methods that
SkillRegistry depends on for cache operations. Add the mget and setex methods to
MockRedisPipeline to match the actual Redis pipeline contract used by
SkillRegistry.get_skills. The mget method should accept multiple keys and record
the command, while setex should accept a key, expiration time, and value and
record the command. Both methods should append their operation details to the
commands list so that execute can return appropriate mock responses simulating
successful cache operations.
In `@agent/tests/test_isolation.py`:
- Around line 39-52: The test has two issues: First, the import of
evaluate_with_llm at the beginning of the try block creates a local binding
before the patch is applied, so when the function is called on line 47, it may
still reference the original function instead of the mocked version. Move the
import statement to after the patch decorator is applied, or import the module
and access evaluate_with_llm through it (like core.llm.evaluate_with_llm) so the
patch is effective. Second, the broad ImportError exception handler at the end
of the block silently catches and ignores import errors, which can hide real
test failures. Remove this exception handler entirely or replace it with more
specific error handling that only catches genuine missing module errors during
initial setup, not failures during test execution.
In `@backend/alembic/versions/f9a3d1c8e205_add_config_schema_flags.py`:
- Around line 40-57: The feature_flag_audit_log table creation has a duplicate
index on the flag_name column. Remove the index=True parameter from the
flag_name column definition in the op.create_table call for
feature_flag_audit_log, since the same index is already explicitly created below
via op.create_index with the name ix_feature_flag_audit_log_flag_name.
In `@backend/app/config.py`:
- Line 9: The OPENAI_API_KEY field is currently typed as a plain string, which
risks accidental exposure through logging or serialization. Change the type
annotation of OPENAI_API_KEY from str | None to use Pydantic's SecretStr type
instead (e.g., SecretStr | None), and import SecretStr from the pydantic library
at the top of the file. This ensures sensitive API keys are properly protected
and not exposed in string representations or logs.
In `@backend/app/infra_init.py`:
- Line 27: The `import os` statement is placed after module-level executable
code, which violates Ruff's E402 rule that requires all imports to be at the top
of the module. Move the `import os` import statement to the beginning of the
file, before any other module-level code or executable statements, to ensure
proper import ordering.
- Around line 28-38: Replace the localhost-based default endpoints in the
environment variable initialization with appropriate containerized service names
that will correctly resolve within the backend container environment.
Specifically, change _MINIO_HOST from "localhost:9000" to the proper MinIO
service name, _TRINO_HOST from "localhost" to the proper Trino service name, and
_OM_URL from "http://localhost:8585" to the proper OpenMetadata service
endpoint. Also apply the same fix to any similar localhost defaults elsewhere in
the file (including the referenced line 552).
In `@backend/app/routers/evaluation.py`:
- Around line 35-37: The trace() method calls throughout the file are using an
invalid API pattern incompatible with Langfuse SDK v3+ and v4+, specifically the
`id` parameter which does not exist in the SDK. Replace all occurrences of
`trace(id=..., metadata=..., output=...)` calls with the correct Langfuse v4.x
API pattern using either `start_as_current_span(trace_context={"trace_id":
"..."}).update_trace(metadata=..., output=...)` or
`update_current_trace(metadata=..., output=...)`. The `@observe` decorator usage
should remain unchanged as it is valid. Search for all trace() calls in the
evaluation router functions and update them to use the compatible API method.
In `@backend/app/routers/flags.py`:
- Around line 5-13: The module docstring in the flags.py file uses en dash
characters (–) in the endpoint list documentation (lines showing GET, PATCH,
DELETE, PUT operations for /flags/ and /flags/modes/ endpoints), which violates
Ruff RUF002 linting rules. Replace all occurrences of the en dash (–) with
regular hyphens (-) throughout the endpoint descriptions in the docstring to
satisfy the linter.
- Around line 50-55: The _get_admin function dependency directly trusts a
client-supplied X-Admin-Email header to determine admin identity, which enables
header spoofing attacks. Instead of accepting the email from the client header
and passing it to require_admin, you need to first authenticate the actual
caller through a legitimate authentication mechanism (such as verifying a JWT
token, session cookie, or other trusted credential), extract the authenticated
user's email from that authenticated identity, and then validate that
authenticated user has admin privileges via require_admin. Remove the direct
reliance on the untrusted x_admin_email header parameter and replace it with
proper user authentication before checking admin role.
- Around line 170-178: The delete_mode function logs the administrator action
but does not propagate the actor identity to the service layer. Modify the
svc.delete_mode(name) call within the delete_mode function to include
current_admin.email as a parameter so that the FlagService can properly
attribute the deletion to the specific administrator for audit purposes.
In `@backend/app/routers/health.py`:
- Around line 186-189: The health check endpoint is returning HTTP 200 status
code even when reporting error states (missing OPENAI_API_KEY and other broken
dependency configurations). Update the error response paths in the health check
logic to return a non-2xx HTTP status code (such as 500) instead of the default
200. This applies to the condition checking `if not api_key:` and the similar
configuration validation check mentioned at line 208, ensuring health probes
correctly identify unhealthy service states.
- Around line 211-214: The EscaClient instantiation in the health check endpoint
is not wrapped in error handling, so any initialization failures will result in
a generic 500 error instead of the appropriate 503 status code for dependency
failures. Wrap the EscaClient constructor call in a try-except block to catch
any exceptions that occur during client initialization and return a 503 status
code with an appropriate error message indicating the dependency is unavailable.
In `@backend/app/services/flag_service.py`:
- Around line 200-219: The docstring for the delete method states that it
deletes the DB row, but the actual implementation only sets flag.value to None
while preserving the row and its metadata. Update the docstring to accurately
describe the actual behavior: that the method resets a flag to its env-var
default by clearing the value (setting it to None) while keeping the database
row intact to preserve metadata. This will align the documentation with the
implementation and the inline comment that explains the reset strategy.
- Around line 89-124: The event loop handling in methods _try_cache_get,
_try_cache_set, and _invalidate uses a fragile pattern that depends on thread
pool execution. Replace the try-except block that calls asyncio.get_event_loop()
followed by loop.run_until_complete() with asyncio.run() instead. This approach
creates a fresh event loop for each call, is thread-safe, and will continue to
work correctly if these methods or their callers become async in the future.
Apply this change consistently across all three methods where async Redis
operations are being executed.
In `@core/pyproject.toml`:
- Around line 18-19: The local path dependency python-core-utils referenced in
the tool.uv.sources section of core/pyproject.toml (and similarly in
backend/pyproject.toml and agent/pyproject.toml) points to a directory that is
gitignored and missing from the repository. To fix this, you must either: (1)
remove python-core-utils from .gitignore and add the actual directory with its
source code to the repository, (2) replace the local path references in all
three pyproject.toml files with a remote dependency source such as a PyPI
package or artifact registry, or (3) create documentation explaining how
developers should separately acquire and install python-core-utils before
building. Choose one approach and implement it consistently across all affected
pyproject.toml files and Dockerfiles that reference this dependency.
In `@core/src/core/langfuse.py`:
- Around line 2-13: The get_langfuse_handler function is not using the imported
settings object to configure the Langfuse client. Modify the function to first
check if settings.LANGFUSE_PUBLIC_KEY and settings.LANGFUSE_SECRET_KEY are
configured, then create a Langfuse client instance by passing the settings
values (public_key, secret_key, and host from settings.LANGFUSE_BASE_URL) to the
Langfuse constructor, and pass that client instance to CallbackHandler instead
of calling CallbackHandler with no arguments. Additionally, replace logger.error
with logger.exception to capture the full error traceback for better debugging.
In `@core/src/core/models/models.py`:
- Line 791: The type field in the FeatureFlag class is currently an unrestricted
string field that allows any value to be persisted, causing validation issues
downstream. Replace the string type annotation with an Enum or constrained field
that restricts the type field to only the valid values listed in the
description: "bool", "int", "float", "string", and "json". This ensures invalid
values are rejected at the schema boundary before they can be persisted to the
database.
In `@docker-compose.yml`:
- Line 407: The startup command in the service definition executes a seed
operation that destructively resets Trino seed data by deleting all rows from
seed tables before reinserting them on every container restart. Verify whether
this behavior is intentional for your use case by reviewing the
_seed_trino_data() function in infra_init.py to confirm the DELETE statements
(mentioned around lines 260-262), and determine if seed tables should instead
persist user modifications across restarts. If idempotent behavior is required,
consider modifying the seed logic to check for existing data before deleting or
use an upsert pattern instead of delete-then-insert, or remove the seed step
from the startup command if seed initialization should only occur once.
In `@frontend/src/api/flags.ts`:
- Around line 53-75: The name path parameters in the functions update, reset,
getMode, upsertMode, and deleteMode are being interpolated directly into URL
paths without URL encoding. Since these names are user-provided and may contain
reserved URL characters, wrap the name parameter with encodeURIComponent()
before inserting it into the endpoint paths, such as changing /flags/${name} to
/flags/${encodeURIComponent(name)} and /flags/modes/${name} to
/flags/modes/${encodeURIComponent(name)}.
In `@frontend/src/pages/FlagsPage.tsx`:
- Around line 37-56: The FLAG_CATEGORIES object only includes explicitly listed
flags, causing any backend flags not defined here to be silently excluded from
the page. Update the rendering logic (referenced in the lines 387-399 area) to
check for all available backend flags and dynamically add any flags that are not
already present in the FLAG_CATEGORIES object to a new "Uncategorized" category,
ensuring all flags from the backend are visible and manageable in the UI
regardless of whether they have been explicitly categorized.
- Around line 422-437: The submitMode function lacks validation for the mode
name before calling upsertModeMutation.mutate(), allowing empty or
whitespace-only names to be submitted. Add a validation check before the
mutation call to ensure modeDraft.name is not empty and not just whitespace. If
the name is invalid, either return early or display an error message to prevent
unnecessary API calls and failures.
- Line 284: The TabKey type referenced in the useState hook at line 284 of
FlagsPage.tsx is not defined in this file, causing a TypeScript compilation
error. Add a type definition for TabKey at the top of FlagsPage.tsx after the
import statements that includes the string literal union type with the values
'flags' and 'modes', which are the two tab keys used in the activeTab state in
this component.
---
Outside diff comments:
In `@agent/src/agent/routers/chat.py`:
- Around line 26-33: The ChatRequest model class is missing an execution_mode
field that is required by the init_flags_node function which resolves mode
overrides from state["execution_mode"]. Add an execution_mode parameter to the
ChatRequest BaseModel (similar to how query, thread_id, and resume_value are
defined) with an appropriate type and default value, and ensure that this field
is propagated into the agent state when creating new invocations. The same fix
also needs to be applied to the related code block at lines 136-143 in the same
file.
In `@backend/app/infra_init.py`:
- Around line 652-667: The _seed_trino_data function violates idempotency when
the DELETE statement fails because it unconditionally executes the INSERT via
_trino_exec(table["seed_sql"]) even when the delete fallback is triggered. When
DELETE fails, you should fall back to checking if the table already contains
data (using a count check as mentioned in the comment) before deciding whether
to insert the seed data, rather than unconditionally proceeding with the INSERT
after catching the DELETE exception.
In `@backend/app/services/llm_judge.py`:
- Around line 129-139: The code retrieves the OPENAI_API_KEY and proceeds
directly to making an HTTP request without validating that the key is actually
present. Add a preflight check immediately after the api_key assignment from
settings.OPENAI_API_KEY to verify the key is not empty or None, and return early
with an appropriate error response if validation fails. This prevents
unnecessary remote calls and ensures the failure happens locally with
deterministic behavior before entering the try block with the httpx.Client.
In `@core/src/core/models/models.py`:
- Around line 325-335: The status field in both the EvalResult and
EvalResultRead classes is defined as a plain str type, which allows any
arbitrary string value to be persisted. Constrain the status field to only
accept valid values by replacing the str type with a Literal type or enum that
explicitly defines the allowed values (such as "pass" and "fail"). Apply this
constraint to the status field in both the main model class and the
EvalResultRead SQLModel class to ensure type safety and prevent invalid statuses
from being persisted.
In `@docker-compose.yml`:
- Around line 412-438: Add the REDIS_URL environment variable to the backend
service's environment section in the docker-compose configuration. The backend
service currently lacks this configuration and will fall back to
redis://localhost:6379 from backend/app/config.py, which will fail inside the
container since localhost doesn't resolve to the Redis service. Set REDIS_URL to
point to the Redis service using the docker-compose service name, such as
redis://redis:6379, so that FlagService in backend/app/routers/flags.py can
properly initialize cache and flags functionality.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: cb2ec3b8-e483-41bc-aeb7-3c5497357fbd
⛔ Files ignored due to path filters (2)
agent/uv.lockis excluded by!**/*.lockbackend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (74)
.gitignoreagent/.coverageagent/Dockerfileagent/pyproject.tomlagent/src/agent/config.pyagent/src/agent/graph.pyagent/src/agent/llm.pyagent/src/agent/mcp_server.pyagent/src/agent/nodes/extractor.pyagent/src/agent/nodes/finalizer.pyagent/src/agent/nodes/init_flags.pyagent/src/agent/nodes/init_skills.pyagent/src/agent/nodes/query_builder.pyagent/src/agent/nodes/refiner.pyagent/src/agent/nodes/satisfaction_check.pyagent/src/agent/nodes/schema_explorer.pyagent/src/agent/routers/chat.pyagent/src/agent/state.pyagent/src/agent/utils/esca.pyagent/src/agent/utils/flag_bridge.pyagent/src/agent/utils/jeen_client.pyagent/src/agent/utils/schema_enrichment.pyagent/src/agent/utils/skill_registry.pyagent/src/agent/utils/sql.pyagent/tests/conftest.pyagent/tests/test_cache_and_gates.pyagent/tests/test_isolation.pyagent/tests/test_resilience.pyagent/tests/test_routing.pybackend/Dockerfilebackend/alembic/versions/f9a3d1c8e205_add_config_schema_flags.pybackend/app/config.pybackend/app/infra_init.pybackend/app/main.pybackend/app/routers/__init__.pybackend/app/routers/admin_approval.pybackend/app/routers/admin_auth.pybackend/app/routers/audit.pybackend/app/routers/enrichment.pybackend/app/routers/evaluation.pybackend/app/routers/feedback.pybackend/app/routers/flags.pybackend/app/routers/health.pybackend/app/routers/orchestration.pybackend/app/routers/publish.pybackend/app/routers/questions.pybackend/app/routers/scopes.pybackend/app/services/auth.pybackend/app/services/evaluator.pybackend/app/services/flag_service.pybackend/app/services/join_detection.pybackend/app/services/langfuse_client.pybackend/app/services/llm_judge.pybackend/app/services/profiling_engine.pybackend/app/services/scoring.pybackend/app/services/trino_client.pybackend/pyproject.tomlbackend/tests/conftest.pycore/pyproject.tomlcore/src/core/__init__.pycore/src/core/cache.pycore/src/core/config.pycore/src/core/langfuse.pycore/src/core/models/models.pycore/src/core/trino.pydocker-compose.ymlfrontend/.envfrontend/index.htmlfrontend/src/App.tsxfrontend/src/api/flags.tsfrontend/src/components/layout/Sidebar.tsxfrontend/src/config/constants.tsfrontend/src/pages/FlagsPage.cssfrontend/src/pages/FlagsPage.tsx
💤 Files with no reviewable changes (2)
- backend/app/services/trino_client.py
- backend/app/services/scoring.py
| agent/scratch/* | ||
| fix_agent_langfuse.py | ||
| fix_langfuse.py | ||
| python-core-utils/ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== python-core-utils directory presence =="
fd -HI -t d '^python-core-utils$' . || true
echo "== tracked files under python-core-utils =="
git ls-files | rg '^python-core-utils/' || true
echo "== cross-file references =="
rg -n 'python-core-utils' .gitignore agent/Dockerfile backend/Dockerfile agent/pyproject.toml backend/pyproject.toml || trueRepository: StavPonte11/text2sql-onboarding
Length of output: 516
Don't ignore python-core-utils/ — the directory is not tracked and build files depend on its presence.
Both agent/ and backend/ Dockerfiles copy from python-core-utils, and both pyproject.toml files reference it as a path dependency (../python-core-utils). However, the directory does not exist in the repository. Adding it to .gitignore will prevent it from being tracked while build systems still require it, causing Docker builds and uv sync to fail.
Either track the directory in the repository or remove the ignore rule if this is an external dependency that should be provided differently (e.g., as a Git submodule or published package).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.gitignore at line 42, The entry python-core-utils/ in the .gitignore file
at line 42 prevents the directory from being tracked in version control, but
both the agent/ and backend/ Dockerfiles and the pyproject.toml files require
this directory as a path dependency to build and sync correctly. Remove the
python-core-utils/ line from .gitignore to allow the directory to be tracked in
the repository, or if this is an external dependency, update the ignore rule and
ensure the dependency is provided through an alternative method such as a Git
submodule or published package.
| Node order: | ||
| START → validate_config → extractor → schema_explorer → ... | ||
| (G2-01 fail-fast) |
There was a problem hiding this comment.
Docstring doesn't match actual node order.
The docstring states START → validate_config → extractor but the actual graph (lines 209-212) is START → validate_config → init_flags → init_skills → extractor.
📝 Proposed fix
Node order:
- START → validate_config → extractor → schema_explorer → ...
+ START → validate_config → init_flags → init_skills → extractor → schema_explorer → ...
(G2-01 fail-fast)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Node order: | |
| START → validate_config → extractor → schema_explorer → ... | |
| (G2-01 fail-fast) | |
| Node order: | |
| START → validate_config → init_flags → init_skills → extractor → schema_explorer → ... | |
| (G2-01 fail-fast) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agent/src/agent/graph.py` around lines 4 - 6, The docstring at the beginning
of the file describing the node order is outdated and does not match the actual
graph implementation. Update the docstring to include all intermediate nodes
between validate_config and extractor. Specifically, add init_flags and
init_skills nodes to the flow so the documented order reads START →
validate_config → init_flags → init_skills → extractor → schema_explorer instead
of skipping directly from validate_config to extractor. This will match the
actual node connections defined in the graph construction code.
| if temperature is None: | ||
| temp_flag = _NODE_TEMP_FLAGS.get(node) | ||
| temperature = float(flags.get(temp_flag, 0.0)) if temp_flag else 0.0 | ||
|
|
There was a problem hiding this comment.
Handle malformed temperature flag values defensively.
Line 54 casts a dynamic flag directly with float(...). A bad DB/admin override (e.g., empty string or non-numeric text) will raise and break request execution.
Suggested fix
if temperature is None:
temp_flag = _NODE_TEMP_FLAGS.get(node)
- temperature = float(flags.get(temp_flag, 0.0)) if temp_flag else 0.0
+ if temp_flag:
+ raw_temp = flags.get(temp_flag, 0.0)
+ try:
+ temperature = float(raw_temp)
+ except (TypeError, ValueError):
+ logger.warning(
+ "Invalid temperature override for %s: %r; falling back to 0.0",
+ temp_flag,
+ raw_temp,
+ )
+ temperature = 0.0
+ else:
+ temperature = 0.0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if temperature is None: | |
| temp_flag = _NODE_TEMP_FLAGS.get(node) | |
| temperature = float(flags.get(temp_flag, 0.0)) if temp_flag else 0.0 | |
| if temperature is None: | |
| temp_flag = _NODE_TEMP_FLAGS.get(node) | |
| if temp_flag: | |
| raw_temp = flags.get(temp_flag, 0.0) | |
| try: | |
| temperature = float(raw_temp) | |
| except (TypeError, ValueError): | |
| logger.warning( | |
| "Invalid temperature override for %s: %r; falling back to 0.0", | |
| temp_flag, | |
| raw_temp, | |
| ) | |
| temperature = 0.0 | |
| else: | |
| temperature = 0.0 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agent/src/agent/llm.py` around lines 52 - 55, The temperature flag value is
being directly cast to float without error handling, which will raise a
ValueError if the flag contains a malformed value like an empty string or
non-numeric text. Wrap the float conversion in a try-except block to defensively
handle ValueError exceptions, defaulting to 0.0 when conversion fails. Apply
this error handling to the temperature assignment where
float(flags.get(temp_flag, 0.0)) is called, optionally logging a warning when an
invalid value is encountered.
| if allowed_tables: | ||
| with Session(engine) as session: | ||
| all_tables = session.exec(select(Table)).all() | ||
| for allowed in allowed_tables: | ||
| exists = False | ||
| for t in all_tables: | ||
| if (t.id == allowed or t.name == allowed or f"{t.schema_name}.{t.name}" == allowed): | ||
| if ( | ||
| t.id == allowed | ||
| or t.name == allowed | ||
| or f"{t.schema_name}.{t.name}" == allowed | ||
| ): | ||
| exists = True | ||
| break | ||
| if not exists: | ||
| return json.dumps({"error": f"Table '{allowed}' does not exist."}) | ||
| return json.dumps( | ||
| {"error": f"Table '{allowed}' does not exist."} | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Table validation has O(n×m) complexity.
The nested loop (lines 63-72) iterates over all database tables for each allowed table entry. With many tables or many allowed_tables, this becomes slow.
Consider building a lookup set once:
♻️ Suggested optimization
if allowed_tables:
with Session(engine) as session:
all_tables = session.exec(select(Table)).all()
+ table_lookup = set()
+ for t in all_tables:
+ table_lookup.add(t.id)
+ table_lookup.add(t.name)
+ table_lookup.add(f"{t.schema_name}.{t.name}")
for allowed in allowed_tables:
- exists = False
- for t in all_tables:
- if (
- t.id == allowed
- or t.name == allowed
- or f"{t.schema_name}.{t.name}" == allowed
- ):
- exists = True
- break
- if not exists:
+ if allowed not in table_lookup:
return json.dumps(
{"error": f"Table '{allowed}' does not exist."}
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if allowed_tables: | |
| with Session(engine) as session: | |
| all_tables = session.exec(select(Table)).all() | |
| for allowed in allowed_tables: | |
| exists = False | |
| for t in all_tables: | |
| if (t.id == allowed or t.name == allowed or f"{t.schema_name}.{t.name}" == allowed): | |
| if ( | |
| t.id == allowed | |
| or t.name == allowed | |
| or f"{t.schema_name}.{t.name}" == allowed | |
| ): | |
| exists = True | |
| break | |
| if not exists: | |
| return json.dumps({"error": f"Table '{allowed}' does not exist."}) | |
| return json.dumps( | |
| {"error": f"Table '{allowed}' does not exist."} | |
| ) | |
| if allowed_tables: | |
| with Session(engine) as session: | |
| all_tables = session.exec(select(Table)).all() | |
| table_lookup = set() | |
| for t in all_tables: | |
| table_lookup.add(t.id) | |
| table_lookup.add(t.name) | |
| table_lookup.add(f"{t.schema_name}.{t.name}") | |
| for allowed in allowed_tables: | |
| if allowed not in table_lookup: | |
| return json.dumps( | |
| {"error": f"Table '{allowed}' does not exist."} | |
| ) |
🧰 Tools
🪛 ast-grep (0.43.0)
[info] 73-75: use jsonify instead of json.dumps for JSON output
Context: json.dumps(
{"error": f"Table '{allowed}' does not exist."}
)
Note: Security best practice.
(use-jsonify)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agent/src/agent/mcp_server.py` around lines 60 - 76, The table validation
logic uses a nested loop that iterates through all_tables for each entry in
allowed_tables, creating O(n×m) complexity. Instead of this approach, build a
single lookup set once after fetching all_tables that contains all valid table
identifiers (the table id, name, and schema_name.name combinations). Then
iterate through allowed_tables and check each allowed entry against this
pre-built lookup set in constant time. This will reduce the complexity to O(n+m)
by eliminating the inner loop that searches through all_tables for each allowed
table entry.
| langfuse_prompt = langfuse_client.get_prompt(settings.LANGFUSE_PROMPT_EXTRACTOR) | ||
| self.prompt = ChatPromptTemplate.from_messages(langfuse_prompt.get_langchain_prompt()) | ||
|
|
||
| self.chain = self.prompt | self.llm.with_structured_output(ExtractorOutput, method="json_schema") | ||
| self.prompt = ChatPromptTemplate.from_messages( | ||
| langfuse_prompt.get_langchain_prompt() | ||
| ) |
There was a problem hiding this comment.
Guard Langfuse prompt lookup before dereference.
get_prompt(...) can fail/return None; calling .get_langchain_prompt() unconditionally can crash LLMExtractor construction and fail the node.
Suggested fix
langfuse_prompt = langfuse_client.get_prompt(settings.LANGFUSE_PROMPT_EXTRACTOR)
- self.prompt = ChatPromptTemplate.from_messages(
- langfuse_prompt.get_langchain_prompt()
- )
+ prompt_messages = (
+ langfuse_prompt.get_langchain_prompt()
+ if langfuse_prompt is not None
+ else [("system", "Extract concise contextual enrichments."), ("human", "{user_query}")]
+ )
+ self.prompt = ChatPromptTemplate.from_messages(prompt_messages)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agent/src/agent/nodes/extractor.py` around lines 51 - 54, The code in the
LLMExtractor class calls `.get_langchain_prompt()` on the result of
`langfuse_client.get_prompt()` without validating that the result is not None.
Add a guard check after the
`langfuse_client.get_prompt(settings.LANGFUSE_PROMPT_EXTRACTOR)` call to ensure
the returned prompt is not None before attempting to call
`.get_langchain_prompt()` on it. If the prompt is None or the lookup fails,
raise an appropriate error to prevent the LLMExtractor construction from
proceeding with invalid state.
| secrets: | ||
| - deploy_key | ||
| # command: sh -c "python -m app.seed && python -m app.infra_init && uvicorn app.main:app --host 0.0.0.0 --port 8000" | ||
| command: sh -c "python -m app.seed && python -m app.infra_init && uvicorn app.main:app --host 0.0.0.0 --port 8000" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Locate startup write modules:"
fd -i 'seed.py|infra_init.py' backend/app
echo "Scan for write operations and idempotency guards:"
rg -n --no-heading '(insert|add\(|commit|delete|truncate|create|drop|on_conflict|upsert|merge)' backend/app/seed.py backend/app/infra_init.py
echo "Inspect implementations:"
sed -n '1,260p' backend/app/seed.py
sed -n '1,260p' backend/app/infra_init.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 26386
No idempotency concerns for duplicates; Trino seed data is destructively reset on every restart.
The startup sequence cannot duplicate rows because _seed_trino_data() explicitly executes DELETE FROM {table} before inserting seed rows (infra_init.py lines 260–262). However, this means the seed tables are not idempotent—they are recreated from scratch on every backend restart, potentially clearing user data if any rows were manually added to these tables in production.
Verify this destructive behavior is intentional, especially if these tables should persist user modifications across container restarts.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docker-compose.yml` at line 407, The startup command in the service
definition executes a seed operation that destructively resets Trino seed data
by deleting all rows from seed tables before reinserting them on every container
restart. Verify whether this behavior is intentional for your use case by
reviewing the _seed_trino_data() function in infra_init.py to confirm the DELETE
statements (mentioned around lines 260-262), and determine if seed tables should
instead persist user modifications across restarts. If idempotent behavior is
required, consider modifying the seed logic to check for existing data before
deleting or use an upsert pattern instead of delete-then-insert, or remove the
seed step from the startup command if seed initialization should only occur
once.
| update: (name: string, value: unknown): Promise<FeatureFlag> => | ||
| fetchWithAdminEmail(`/flags/${name}`, { | ||
| method: 'PATCH', | ||
| body: JSON.stringify({ value }), | ||
| }), | ||
|
|
||
| reset: (name: string): Promise<null> => | ||
| fetchWithAdminEmail(`/flags/${name}`, { method: 'DELETE' }), | ||
|
|
||
| // ── Execution Modes ─────────────────────────────────────────────────────── | ||
| listModes: (): Promise<ExecutionMode[]> => fetchWithAdminEmail('/flags/modes/'), | ||
|
|
||
| getMode: (name: string): Promise<ExecutionMode> => | ||
| fetchWithAdminEmail(`/flags/modes/${name}`), | ||
|
|
||
| upsertMode: (name: string, data: Partial<ExecutionMode>): Promise<ExecutionMode> => | ||
| fetchWithAdminEmail(`/flags/modes/${name}`, { | ||
| method: 'PUT', | ||
| body: JSON.stringify(data), | ||
| }), | ||
|
|
||
| deleteMode: (name: string): Promise<null> => | ||
| fetchWithAdminEmail(`/flags/modes/${name}`, { method: 'DELETE' }), |
There was a problem hiding this comment.
URL-encode name path params before building endpoint paths.
Mode names are user-provided in the UI, so reserved URL characters can break these requests or target the wrong route when interpolated raw.
Suggested fix
+const encodePath = (value: string) => encodeURIComponent(value);
+
export const flagsApi = {
@@
update: (name: string, value: unknown): Promise<FeatureFlag> =>
- fetchWithAdminEmail(`/flags/${name}`, {
+ fetchWithAdminEmail(`/flags/${encodePath(name)}`, {
method: 'PATCH',
body: JSON.stringify({ value }),
}),
reset: (name: string): Promise<null> =>
- fetchWithAdminEmail(`/flags/${name}`, { method: 'DELETE' }),
+ fetchWithAdminEmail(`/flags/${encodePath(name)}`, { method: 'DELETE' }),
@@
getMode: (name: string): Promise<ExecutionMode> =>
- fetchWithAdminEmail(`/flags/modes/${name}`),
+ fetchWithAdminEmail(`/flags/modes/${encodePath(name)}`),
@@
upsertMode: (name: string, data: Partial<ExecutionMode>): Promise<ExecutionMode> =>
- fetchWithAdminEmail(`/flags/modes/${name}`, {
+ fetchWithAdminEmail(`/flags/modes/${encodePath(name)}`, {
method: 'PUT',
body: JSON.stringify(data),
}),
deleteMode: (name: string): Promise<null> =>
- fetchWithAdminEmail(`/flags/modes/${name}`, { method: 'DELETE' }),
+ fetchWithAdminEmail(`/flags/modes/${encodePath(name)}`, { method: 'DELETE' }),
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| update: (name: string, value: unknown): Promise<FeatureFlag> => | |
| fetchWithAdminEmail(`/flags/${name}`, { | |
| method: 'PATCH', | |
| body: JSON.stringify({ value }), | |
| }), | |
| reset: (name: string): Promise<null> => | |
| fetchWithAdminEmail(`/flags/${name}`, { method: 'DELETE' }), | |
| // ── Execution Modes ─────────────────────────────────────────────────────── | |
| listModes: (): Promise<ExecutionMode[]> => fetchWithAdminEmail('/flags/modes/'), | |
| getMode: (name: string): Promise<ExecutionMode> => | |
| fetchWithAdminEmail(`/flags/modes/${name}`), | |
| upsertMode: (name: string, data: Partial<ExecutionMode>): Promise<ExecutionMode> => | |
| fetchWithAdminEmail(`/flags/modes/${name}`, { | |
| method: 'PUT', | |
| body: JSON.stringify(data), | |
| }), | |
| deleteMode: (name: string): Promise<null> => | |
| fetchWithAdminEmail(`/flags/modes/${name}`, { method: 'DELETE' }), | |
| const encodePath = (value: string) => encodeURIComponent(value); | |
| export const flagsApi = { | |
| update: (name: string, value: unknown): Promise<FeatureFlag> => | |
| fetchWithAdminEmail(`/flags/${encodePath(name)}`, { | |
| method: 'PATCH', | |
| body: JSON.stringify({ value }), | |
| }), | |
| reset: (name: string): Promise<null> => | |
| fetchWithAdminEmail(`/flags/${encodePath(name)}`, { method: 'DELETE' }), | |
| // ── Execution Modes ─────────────────────────────────────────────────────── | |
| listModes: (): Promise<ExecutionMode[]> => fetchWithAdminEmail('/flags/modes/'), | |
| getMode: (name: string): Promise<ExecutionMode> => | |
| fetchWithAdminEmail(`/flags/modes/${encodePath(name)}`), | |
| upsertMode: (name: string, data: Partial<ExecutionMode>): Promise<ExecutionMode> => | |
| fetchWithAdminEmail(`/flags/modes/${encodePath(name)}`, { | |
| method: 'PUT', | |
| body: JSON.stringify(data), | |
| }), | |
| deleteMode: (name: string): Promise<null> => | |
| fetchWithAdminEmail(`/flags/modes/${encodePath(name)}`, { method: 'DELETE' }), | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@frontend/src/api/flags.ts` around lines 53 - 75, The name path parameters in
the functions update, reset, getMode, upsertMode, and deleteMode are being
interpolated directly into URL paths without URL encoding. Since these names are
user-provided and may contain reserved URL characters, wrap the name parameter
with encodeURIComponent() before inserting it into the endpoint paths, such as
changing /flags/${name} to /flags/${encodeURIComponent(name)} and
/flags/modes/${name} to /flags/modes/${encodeURIComponent(name)}.
| const FLAG_CATEGORIES: Record<string, string[]> = { | ||
| Extraction: [ | ||
| 'EXTRACTOR_MODEL', 'EXTRACTOR_TEMPERATURE', 'EXTRACTOR_TOP_K_TABLES', 'TABLE_SCOPING_MODE', | ||
| ], | ||
| 'Schema Explorer': [ | ||
| 'MAX_PROFILES_TO_FETCH', 'PROFILE_FETCH_CONCURRENCY', 'SCHEMA_CACHE_TTL', 'PROFILE_CACHE_TTL', | ||
| 'SCHEMA_SEMANTIC_TYPING', 'SCHEMA_JOIN_GRAPH', 'SCHEMA_SUMMARIZATION', 'SCHEMA_AMBIGUITY_DETECT', | ||
| 'SCHEMA_SUMMARY_MODEL', 'SCHEMA_TOP_K_JOINS', | ||
| ], | ||
| 'Query Builder': ['QUERY_BUILDER_MODEL', 'QUERY_BUILDER_TEMPERATURE'], | ||
| Refiner: ['MAX_REFINER_ITERATIONS', 'MAX_SCHEMA_REPLAN_ITERATIONS', 'REFINER_MODEL'], | ||
| 'Satisfaction Check': [ | ||
| 'SATISFACTION_CHECK_ENABLED', 'SATISFACTION_CHECK_EXECUTION', 'SATISFACTION_CHECK_PLAUSIBILITY', | ||
| 'SATISFACTION_CHECK_COLUMNS', 'SATISFACTION_CHECK_SEMANTIC', 'SATISFACTION_MIN_ROWS', | ||
| 'SATISFACTION_MAX_ROWS', 'SATISFACTION_SEMANTIC_THRESHOLD', 'SATISFACTION_JUDGE_MODEL', | ||
| ], | ||
| Skills: ['SKILLS_ENABLED', 'SKILLS_HOT_RELOAD', 'SKILLS_CACHE_TTL'], | ||
| Evaluation: ['LLM_JUDGE_ENABLED', 'EVAL_PARALLEL_WORKERS', 'EVAL_JUDGE_MODEL'], | ||
| 'Catalog Validation': ['CATALOG_VALIDATION_ENABLED', 'CATALOG_CACHE_TTL'], | ||
| }; |
There was a problem hiding this comment.
Uncategorized flags are silently excluded from the page.
The rendering pipeline only maps names from FLAG_CATEGORIES; any backend flag not listed here never appears and cannot be managed from this UI.
Suggested fix
const filteredCategories = useMemo(() => {
const q = search.toLowerCase();
- return Object.entries(FLAG_CATEGORIES).filter(([cat, names]) => {
+ const categorized = new Set(Object.values(FLAG_CATEGORIES).flat());
+ const groups = Object.entries(FLAG_CATEGORIES).filter(([cat, names]) => {
if (!q) return true;
return cat.toLowerCase().includes(q) || names.some((n) => n.toLowerCase().includes(q));
}).map(([cat, names]) => ({
cat,
flags: names
.filter((n) => !q || n.toLowerCase().includes(q) || cat.toLowerCase().includes(q))
.map((n) => flagMap[n])
.filter(Boolean) as FeatureFlag[],
- })).filter((g) => g.flags.length > 0);
+ })).filter((g) => g.flags.length > 0);
+
+ const uncategorized = flags.filter((f) => {
+ if (categorized.has(f.name)) return false;
+ if (!q) return true;
+ return f.name.toLowerCase().includes(q) || (f.description ?? '').toLowerCase().includes(q);
+ });
+ if (uncategorized.length) groups.push({ cat: 'Other', flags: uncategorized });
+
+ return groups;
}, [flags, flagMap, search]);Also applies to: 387-399
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@frontend/src/pages/FlagsPage.tsx` around lines 37 - 56, The FLAG_CATEGORIES
object only includes explicitly listed flags, causing any backend flags not
defined here to be silently excluded from the page. Update the rendering logic
(referenced in the lines 387-399 area) to check for all available backend flags
and dynamically add any flags that are not already present in the
FLAG_CATEGORIES object to a new "Uncategorized" category, ensuring all flags
from the backend are visible and manageable in the UI regardless of whether they
have been explicitly categorized.
| const queryClient = useQueryClient(); | ||
| const user = useAdminStore((state) => state.user); | ||
|
|
||
| const [activeTab, setActiveTab] = useState<TabKey>('flags'); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -nP '\buseState<\s*TabKey\s*>' frontend/src/pages/FlagsPage.tsx
rg -nP '\b(type|interface)\s+TabKey\b' frontend/src/pages/FlagsPage.tsxRepository: StavPonte11/text2sql-onboarding
Length of output: 142
🏁 Script executed:
# Check imports in the file
rg -nP '^import.*TabKey' frontend/src/pages/FlagsPage.tsx
# Also check the top of the file for any imports
head -30 frontend/src/pages/FlagsPage.tsxRepository: StavPonte11/text2sql-onboarding
Length of output: 573
🏁 Script executed:
# Search for TabKey definition in the entire repository
rg -nP '\b(type|interface)\s+TabKey\b' --type ts --type tsxRepository: StavPonte11/text2sql-onboarding
Length of output: 103
🏁 Script executed:
# Check for other tab values used in the file to confirm the union type
rg -nP "(setActiveTab|'(flags|modes)')" frontend/src/pages/FlagsPage.tsx | head -20Repository: StavPonte11/text2sql-onboarding
Length of output: 502
🏁 Script executed:
# Search entire repo for TabKey definition with correct syntax
rg '\b(type|interface)\s+TabKey\b' --glob '*.ts' --glob '*.tsx'Repository: StavPonte11/text2sql-onboarding
Length of output: 137
🏁 Script executed:
# Check the full TabKey definition in TableDetails.tsx
rg -A 1 'type TabKey' frontend/src/components/tables/TableDetails.tsx
# Check if TabKey is exported from TableDetails.tsx
rg -B 2 -A 2 'export.*TabKey' frontend/src/components/tables/TableDetails.tsxRepository: StavPonte11/text2sql-onboarding
Length of output: 104
Declare TabKey type in FlagsPage.tsx (unresolved type, build-breaking).
TabKey is referenced at line 284 but never defined in this file, causing TypeScript compilation to fail. Although TabKey exists in TableDetails.tsx, it uses 'overview' as a value—incompatible with FlagsPage's usage of 'flags' and 'modes'.
Add the type definition at the top of the file after imports:
Suggested fix
import { useState, useMemo } from 'react';
@@
import './FlagsPage.css';
+type TabKey = 'flags' | 'modes';
+
// ── Flag category grouping ─────────────────────────────────────────────────🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@frontend/src/pages/FlagsPage.tsx` at line 284, The TabKey type referenced in
the useState hook at line 284 of FlagsPage.tsx is not defined in this file,
causing a TypeScript compilation error. Add a type definition for TabKey at the
top of FlagsPage.tsx after the import statements that includes the string
literal union type with the values 'flags' and 'modes', which are the two tab
keys used in the activeTab state in this component.
| const submitMode = () => { | ||
| // Construct overrides object from overridesList | ||
| const overridesObj: Record<string, any> = {}; | ||
| for (const item of overridesList) { | ||
| overridesObj[item.name] = item.value; | ||
| } | ||
|
|
||
| upsertModeMutation.mutate({ | ||
| name: modeDraft.name, | ||
| data: { | ||
| description: modeDraft.description, | ||
| flag_overrides: overridesObj, | ||
| is_active: modeDraft.is_active | ||
| }, | ||
| }); | ||
| }; |
There was a problem hiding this comment.
Validate mode name before submit to prevent avoidable API failures.
submitMode allows empty/whitespace names, which sends invalid requests and yields preventable errors.
Suggested fix
const submitMode = () => {
+ const normalizedName = modeDraft.name.trim();
+ if (!normalizedName) {
+ message.error('Mode name is required');
+ return;
+ }
+
// Construct overrides object from overridesList
const overridesObj: Record<string, any> = {};
@@
upsertModeMutation.mutate({
- name: modeDraft.name,
+ name: normalizedName,
data: {
description: modeDraft.description,
flag_overrides: overridesObj, 🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@frontend/src/pages/FlagsPage.tsx` around lines 422 - 437, The submitMode
function lacks validation for the mode name before calling
upsertModeMutation.mutate(), allowing empty or whitespace-only names to be
submitted. Add a validation check before the mutation call to ensure
modeDraft.name is not empty and not just whitespace. If the name is invalid,
either return early or display an error message to prevent unnecessary API calls
and failures.
| min_rows = _f(runtime_flags, "SATISFACTION_MIN_ROWS", settings.SATISFACTION_MIN_ROWS) | ||
| max_rows = _f(runtime_flags, "SATISFACTION_MAX_ROWS", settings.SATISFACTION_MAX_ROWS) | ||
| if n < min_rows: | ||
| failures.append( | ||
| f"[CHECK_B] Result returned {n} rows — below minimum {min_rows}." | ||
| ) | ||
| elif n > max_rows: | ||
| failures.append( | ||
| f"[CHECK_B] Result returned {n} rows — exceeds maximum {max_rows}." | ||
| ) |
There was a problem hiding this comment.
Row-plausibility thresholds are used without type validation.
Line 67-69 consume runtime overrides directly. If overrides arrive as strings (common in admin-config paths), comparisons like n < min_rows can raise TypeError and abort the node.
Suggested fix
- min_rows = _f(runtime_flags, "SATISFACTION_MIN_ROWS", settings.SATISFACTION_MIN_ROWS)
- max_rows = _f(runtime_flags, "SATISFACTION_MAX_ROWS", settings.SATISFACTION_MAX_ROWS)
+ try:
+ min_rows = int(_f(runtime_flags, "SATISFACTION_MIN_ROWS", settings.SATISFACTION_MIN_ROWS))
+ except (TypeError, ValueError):
+ min_rows = int(settings.SATISFACTION_MIN_ROWS)
+ try:
+ max_rows = int(_f(runtime_flags, "SATISFACTION_MAX_ROWS", settings.SATISFACTION_MAX_ROWS))
+ except (TypeError, ValueError):
+ max_rows = int(settings.SATISFACTION_MAX_ROWS)
+ if min_rows < 0:
+ min_rows = 0
+ if max_rows < min_rows:
+ max_rows = min_rows🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agent/src/agent/nodes/satisfaction_check.py` around lines 67 - 76, The
min_rows and max_rows variables obtained from runtime flags may arrive as
strings, causing TypeError when compared with the integer n in the satisfaction
check comparisons. Convert the results of the _f() calls for
SATISFACTION_MIN_ROWS and SATISFACTION_MAX_ROWS to integers to ensure type
compatibility before they are used in the conditional expressions with n.
| profile_fetch_concurrency = int(runtime_flags.get("PROFILE_FETCH_CONCURRENCY", settings.PROFILE_FETCH_CONCURRENCY)) | ||
| max_profiles_to_fetch = int(runtime_flags.get("MAX_PROFILES_TO_FETCH", settings.MAX_PROFILES_TO_FETCH)) | ||
| schema_semantic_typing = bool(runtime_flags.get("SCHEMA_SEMANTIC_TYPING", settings.ENABLE_SEMANTIC_TYPING)) | ||
| schema_join_graph = bool(runtime_flags.get("SCHEMA_JOIN_GRAPH", settings.ENABLE_JOIN_GRAPH)) | ||
| schema_summarization = bool(runtime_flags.get("SCHEMA_SUMMARIZATION", settings.ENABLE_SCHEMA_SUMMARIZATION)) | ||
| schema_ambiguity_detect = bool(runtime_flags.get("SCHEMA_AMBIGUITY_DETECT", settings.ENABLE_AMBIGUITY_DETECT)) | ||
| scoping_mode_flag = runtime_flags.get("TABLE_SCOPING_MODE", settings.TABLE_SCOPING_MODE) | ||
|
|
||
| # Per-invocation LLM (supports model switching via execution mode) | ||
| _llm = get_llm("schema_explorer", runtime_flags=runtime_flags) | ||
|
|
||
| # ── G2-01: Resolve scoping mode (state > runtime_flag > env default) ───────── | ||
| scoping_mode: str = state.get("scoping_mode") or scoping_mode_flag | ||
|
|
||
| # ── G2-05: Cache hit/miss counters (pushed to Langfuse at end) ──────────── | ||
| cache_hit_count = 0 | ||
| cache_miss_count = 0 | ||
|
|
||
| # 1. Automatically run hybrid search to find candidates | ||
| emb = get_query_embedding(user_query) | ||
| with Session(engine) as session: | ||
| candidate_tables = hybrid_search_tables(user_query, emb, session, allowed_tables, allowed_statuses) | ||
|
|
||
| candidate_tables = hybrid_search_tables( | ||
| user_query, emb, session, allowed_tables, allowed_statuses, scoping_mode | ||
| ) | ||
|
|
||
| tables_info = [] | ||
| profile_details = [] | ||
|
|
||
| # 2. Automatically get profiles for the top candidate tables (up to 4) to seed the prompt | ||
| for i, t in enumerate(candidate_tables): | ||
| tables_info.append({ | ||
| "id": t.id, | ||
| "name": f"{t.catalog}.{t.schema_name}.{t.name}", | ||
| "description": "" | ||
| }) | ||
|
|
||
| # Fetch profile for the top tables based on MAX_PROFILES_TO_FETCH | ||
| if i < settings.MAX_PROFILES_TO_FETCH: | ||
|
|
||
| # 2. Get profiles for top candidate tables (G2-05 cache-aware) | ||
| import asyncio | ||
|
|
||
| sem = asyncio.Semaphore(profile_fetch_concurrency) | ||
|
|
There was a problem hiding this comment.
Unvalidated numeric runtime flags can crash or hang schema exploration.
Line 323/324 use direct int(...), and Line 354 uses the result in Semaphore(...). A bad override (e.g., "abc", 0, negative) can either throw ValueError or block all profile tasks indefinitely.
Suggested fix
+def _safe_int_flag(flags: dict[str, Any], key: str, default: int, *, min_value: int = 1) -> int:
+ try:
+ value = int(flags.get(key, default))
+ except (TypeError, ValueError):
+ value = default
+ return max(min_value, value)
- profile_fetch_concurrency = int(runtime_flags.get("PROFILE_FETCH_CONCURRENCY", settings.PROFILE_FETCH_CONCURRENCY))
- max_profiles_to_fetch = int(runtime_flags.get("MAX_PROFILES_TO_FETCH", settings.MAX_PROFILES_TO_FETCH))
+ profile_fetch_concurrency = _safe_int_flag(
+ runtime_flags, "PROFILE_FETCH_CONCURRENCY", settings.PROFILE_FETCH_CONCURRENCY, min_value=1
+ )
+ max_profiles_to_fetch = _safe_int_flag(
+ runtime_flags, "MAX_PROFILES_TO_FETCH", settings.MAX_PROFILES_TO_FETCH, min_value=1
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| profile_fetch_concurrency = int(runtime_flags.get("PROFILE_FETCH_CONCURRENCY", settings.PROFILE_FETCH_CONCURRENCY)) | |
| max_profiles_to_fetch = int(runtime_flags.get("MAX_PROFILES_TO_FETCH", settings.MAX_PROFILES_TO_FETCH)) | |
| schema_semantic_typing = bool(runtime_flags.get("SCHEMA_SEMANTIC_TYPING", settings.ENABLE_SEMANTIC_TYPING)) | |
| schema_join_graph = bool(runtime_flags.get("SCHEMA_JOIN_GRAPH", settings.ENABLE_JOIN_GRAPH)) | |
| schema_summarization = bool(runtime_flags.get("SCHEMA_SUMMARIZATION", settings.ENABLE_SCHEMA_SUMMARIZATION)) | |
| schema_ambiguity_detect = bool(runtime_flags.get("SCHEMA_AMBIGUITY_DETECT", settings.ENABLE_AMBIGUITY_DETECT)) | |
| scoping_mode_flag = runtime_flags.get("TABLE_SCOPING_MODE", settings.TABLE_SCOPING_MODE) | |
| # Per-invocation LLM (supports model switching via execution mode) | |
| _llm = get_llm("schema_explorer", runtime_flags=runtime_flags) | |
| # ── G2-01: Resolve scoping mode (state > runtime_flag > env default) ───────── | |
| scoping_mode: str = state.get("scoping_mode") or scoping_mode_flag | |
| # ── G2-05: Cache hit/miss counters (pushed to Langfuse at end) ──────────── | |
| cache_hit_count = 0 | |
| cache_miss_count = 0 | |
| # 1. Automatically run hybrid search to find candidates | |
| emb = get_query_embedding(user_query) | |
| with Session(engine) as session: | |
| candidate_tables = hybrid_search_tables(user_query, emb, session, allowed_tables, allowed_statuses) | |
| candidate_tables = hybrid_search_tables( | |
| user_query, emb, session, allowed_tables, allowed_statuses, scoping_mode | |
| ) | |
| tables_info = [] | |
| profile_details = [] | |
| # 2. Automatically get profiles for the top candidate tables (up to 4) to seed the prompt | |
| for i, t in enumerate(candidate_tables): | |
| tables_info.append({ | |
| "id": t.id, | |
| "name": f"{t.catalog}.{t.schema_name}.{t.name}", | |
| "description": "" | |
| }) | |
| # Fetch profile for the top tables based on MAX_PROFILES_TO_FETCH | |
| if i < settings.MAX_PROFILES_TO_FETCH: | |
| # 2. Get profiles for top candidate tables (G2-05 cache-aware) | |
| import asyncio | |
| sem = asyncio.Semaphore(profile_fetch_concurrency) | |
| def _safe_int_flag(flags: dict[str, Any], key: str, default: int, *, min_value: int = 1) -> int: | |
| try: | |
| value = int(flags.get(key, default)) | |
| except (TypeError, ValueError): | |
| value = default | |
| return max(min_value, value) | |
| profile_fetch_concurrency = _safe_int_flag( | |
| runtime_flags, "PROFILE_FETCH_CONCURRENCY", settings.PROFILE_FETCH_CONCURRENCY, min_value=1 | |
| ) | |
| max_profiles_to_fetch = _safe_int_flag( | |
| runtime_flags, "MAX_PROFILES_TO_FETCH", settings.MAX_PROFILES_TO_FETCH, min_value=1 | |
| ) | |
| schema_semantic_typing = bool(runtime_flags.get("SCHEMA_SEMANTIC_TYPING", settings.ENABLE_SEMANTIC_TYPING)) | |
| schema_join_graph = bool(runtime_flags.get("SCHEMA_JOIN_GRAPH", settings.ENABLE_JOIN_GRAPH)) | |
| schema_summarization = bool(runtime_flags.get("SCHEMA_SUMMARIZATION", settings.ENABLE_SCHEMA_SUMMARIZATION)) | |
| schema_ambiguity_detect = bool(runtime_flags.get("SCHEMA_AMBIGUITY_DETECT", settings.ENABLE_AMBIGUITY_DETECT)) | |
| scoping_mode_flag = runtime_flags.get("TABLE_SCOPING_MODE", settings.TABLE_SCOPING_MODE) | |
| # Per-invocation LLM (supports model switching via execution mode) | |
| _llm = get_llm("schema_explorer", runtime_flags=runtime_flags) | |
| # ── G2-01: Resolve scoping mode (state > runtime_flag > env default) ───────── | |
| scoping_mode: str = state.get("scoping_mode") or scoping_mode_flag | |
| # ── G2-05: Cache hit/miss counters (pushed to Langfuse at end) ──────────── | |
| cache_hit_count = 0 | |
| cache_miss_count = 0 | |
| # 1. Automatically run hybrid search to find candidates | |
| emb = get_query_embedding(user_query) | |
| with Session(engine) as session: | |
| candidate_tables = hybrid_search_tables( | |
| user_query, emb, session, allowed_tables, allowed_statuses, scoping_mode | |
| ) | |
| tables_info = [] | |
| profile_details = [] | |
| # 2. Get profiles for top candidate tables (G2-05 cache-aware) | |
| import asyncio | |
| sem = asyncio.Semaphore(profile_fetch_concurrency) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agent/src/agent/nodes/schema_explorer.py` around lines 323 - 355, The numeric
runtime flags profile_fetch_concurrency and max_profiles_to_fetch are converted
directly with int() without validation. Add validation after the int conversion
to ensure profile_fetch_concurrency is a positive integer (since it is passed to
asyncio.Semaphore which requires a positive value) and max_profiles_to_fetch is
non-negative. Wrap the int() conversions in try-except blocks to handle invalid
values that cannot be converted to integers, falling back to the default
settings values in such cases. This prevents ValueError crashes from invalid
inputs and ensures Semaphore receives a valid positive concurrency value.
| structured = llm.with_structured_output(AmbiguityOutput, method="json_schema") | ||
| result: AmbiguityOutput = await structured.ainvoke(prompt) | ||
| return result.ambiguity_notes | ||
| except Exception as exc: |
There was a problem hiding this comment.
Ambiguity detection fails open for dict structured-output responses.
Line 365-366 assumes result is a model with .ambiguity_notes. Other phases in this module already handle dict-or-model results; this one will hit AttributeError and silently return [], disabling the ambiguity gate for those providers.
Suggested fix
- result: AmbiguityOutput = await structured.ainvoke(prompt)
- return result.ambiguity_notes
+ result = await structured.ainvoke(prompt)
+ if isinstance(result, dict):
+ return result.get("ambiguity_notes", []) or []
+ return getattr(result, "ambiguity_notes", []) or []🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agent/src/agent/utils/schema_enrichment.py` around lines 364 - 367, The code
at the point where result is obtained from structured.ainvoke(prompt) assumes
result is always a model object with an ambiguity_notes attribute, but some LLM
providers return a dictionary instead, causing an AttributeError that silently
fails and returns an empty list. Modify the code after the ainvoke call to check
if result is a dictionary and extract ambiguity_notes using bracket notation
(result["ambiguity_notes"]), or if it's a model object, use dot notation
(result.ambiguity_notes). This pattern should match how other phases in this
module handle dict-or-model results from structured outputs.
| from core.llm import ConfigurationError, evaluate_with_llm | ||
|
|
||
| # In a real startup script this would be caught | ||
| # For the test, we mock evaluate_with_llm to fail | ||
| async def mock_evaluate(*args, **kwargs): | ||
| return {"score": None, "error": "judge_unavailable"} | ||
|
|
||
| with patch("core.llm.evaluate_with_llm", side_effect=mock_evaluate): | ||
| result = await evaluate_with_llm("test query", "SELECT 1") | ||
| assert result["score"] is None | ||
| assert result["error"] == "judge_unavailable" | ||
| except ImportError: | ||
| # if core.llm doesn't exist, we skip or mock the specific node that uses it | ||
| pass |
There was a problem hiding this comment.
This test can pass while not testing the mocked path.
Line 39 binds evaluate_with_llm locally before patching, so Line 47 may call the original function. Also, Line 50-52 swallows ImportError, which can hide real regressions.
Suggested fix
- try:
- from core.llm import ConfigurationError, evaluate_with_llm
+ try:
+ import core.llm as llm_module
@@
- with patch("core.llm.evaluate_with_llm", side_effect=mock_evaluate):
- result = await evaluate_with_llm("test query", "SELECT 1")
+ with patch.object(llm_module, "evaluate_with_llm", side_effect=mock_evaluate):
+ result = await llm_module.evaluate_with_llm("test query", "SELECT 1")
assert result["score"] is None
assert result["error"] == "judge_unavailable"
except ImportError:
- # if core.llm doesn't exist, we skip or mock the specific node that uses it
- pass
+ pytest.skip("core.llm is unavailable in this test environment")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agent/tests/test_isolation.py` around lines 39 - 52, The test has two issues:
First, the import of evaluate_with_llm at the beginning of the try block creates
a local binding before the patch is applied, so when the function is called on
line 47, it may still reference the original function instead of the mocked
version. Move the import statement to after the patch decorator is applied, or
import the module and access evaluate_with_llm through it (like
core.llm.evaluate_with_llm) so the patch is effective. Second, the broad
ImportError exception handler at the end of the block silently catches and
ignores import errors, which can hide real test failures. Remove this exception
handler entirely or replace it with more specific error handling that only
catches genuine missing module errors during initial setup, not failures during
test execution.
|
|
||
| // ── Flag category grouping ───────────────────────────────────────────────── | ||
|
|
||
| const FLAG_CATEGORIES: Record<string, string[]> = { |
There was a problem hiding this comment.
this should not be set here
either:
create is in the backend and export it to the frontend via the type gen
or
create this in a dedicated file that will also provide types if needed
|
|
||
| // ── Inline Flag Editor ───────────────────────────────────────────────────── | ||
|
|
||
| function FlagEditor({ |
There was a problem hiding this comment.
maybe extract to a different file - this file has way too many components together and it has become disorganized
| const user = useAdminStore.getState().user; | ||
| if (!user?.email) throw new Error('Not authenticated'); | ||
|
|
||
| const headers = new Headers(options.headers || {}); | ||
| headers.set('X-Admin-Email', user.email); | ||
| headers.set('Content-Type', 'application/json'); |
There was a problem hiding this comment.
will be obsolete when sso is used
| headers.set('X-Admin-Email', user.email); | ||
| headers.set('Content-Type', 'application/json'); | ||
|
|
||
| const response = await fetch(`${API_BASE_URL}${url}`, { ...options, headers }); |
There was a problem hiding this comment.
please use the existing api client
| export interface FeatureFlag { | ||
| name: string; | ||
| value: unknown; | ||
| type: FlagType; | ||
| description: string; | ||
| owner: string; | ||
| last_modified_by: string; | ||
| last_modified_at: string; | ||
| } | ||
|
|
||
| export interface ExecutionMode { | ||
| name: string; | ||
| description: string; | ||
| flag_overrides: Record<string, unknown>; | ||
| is_active: boolean; | ||
| created_by: string; | ||
| created_at: string; | ||
| updated_at: string; | ||
| } | ||
|
|
There was a problem hiding this comment.
those should be exposed by the backend and not set here
| owner: str = Field(default="") | ||
| last_modified_by: str = Field(default="") | ||
| last_modified_at: datetime = Field(default_factory=datetime.utcnow) | ||
|
|
| FLAG_CACHE_TTL = 30 # seconds — per TTS-G4-01 AC1 | ||
| MODE_CACHE_TTL = 30 # seconds |
| def _get_redis(redis_url: str) -> aioredis.Redis: | ||
| global _REDIS | ||
| if _REDIS is None: | ||
| _REDIS = aioredis.from_url( | ||
| redis_url, | ||
| decode_responses=True, | ||
| socket_connect_timeout=2, | ||
| socket_timeout=2, | ||
| ) | ||
| return _REDIS |
There was a problem hiding this comment.
use the python core utils package
| if flag_type == "json": | ||
| return isinstance(value, (dict, list)) | ||
| return False | ||
|
|
There was a problem hiding this comment.
might be cleaner:
def validate_flag_type(value: Any, flag_type: str) -> bool:
"""Return True if *value* strictly matches the declared *flag_type*."""
# 1. Handle the exceptions where your flag strings don't match Python's names
if flag_type == "json":
return type(value) in (dict, list)
if flag_type == "string":
return type(value) is str
# 2. For "bool", "int", and "float", the flag matches the type name perfectly!
return type(value).__name__ == flag_type```
|
|
||
| def _try_cache_get(self, key: str) -> dict | None: | ||
| """Synchronous Redis GET (creates a new event loop if needed for sync context).""" | ||
| import asyncio |
There was a problem hiding this comment.
move import to top of file
This PR completes the end-to-end implementation of Group 4: Feature Flags & Execution Modes. It includes the FlagService database models, caching layer, FastAPI router, agent init_flags node, dynamic model switching, and a premium Studio UI flags management page with an interactive overrides configuration builder and administrator attribution.
Summary by CodeRabbit
New Features
Improvements