Fix ts evaluations#21
Conversation
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Plus Run ID: 📒 Files selected for processing (11)
📝 WalkthroughWalkthroughThis PR consolidates three major infrastructure changes: (1) centralizes embedder logic into a core module using OpenAI-compatible API contract (input/Bearer auth) and adds EMBEDDER_KEY config support; (2) replaces SSE MCP transport with Streamable HTTP in both agent and backend; (3) migrates all evaluation phases to call a remote evaluation service instead of using local stubs, introducing Pydantic models and EVALUATION_SERVICE_URL. Additionally, timestamps across 50+ sites are standardized from naive UTC to timezone-aware UTC datetimes. ChangesEmbedder API Contract and MCP Transport Updates
Evaluation Pipeline: Remote Service Refactor
Timezone-Aware UTC Standardization
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@agent/src/agent/config.py`:
- Around line 12-14: The EMBEDDER_URL field on line 12 is currently typed as a
plain str, which allows unsafe URL schemes like ftp or file to be passed
through. Change the EMBEDDER_URL field type from str to Pydantic's AnyHttpUrl
type to enforce that only HTTP and HTTPS schemes are allowed. This will validate
the URL scheme at configuration initialization time and fail fast if an invalid
scheme is provided.
In `@agent/src/agent/nodes/schema_explorer.py`:
- Around line 52-53: The embedding returned on line 53 from
resp_data["embedding"] is not validated against the 768-dimensional requirement
documented on line 43. Add a validation check after extracting the embedding to
verify its length matches the expected 768 dimensions. If the dimension check
fails, raise an exception that will be caught by the existing exception fallback
mechanism, rather than returning the embedding directly without validation.
In `@backend/app/config.py`:
- Around line 55-57: The EMBEDDER_URL field in the Settings class lacks
validation for URL schemes, which allows unsafe or unsupported schemes like
ftp://, file://, or data:// to pass through and potentially cause security or
functionality issues. Add a Pydantic field validator (using `@field_validator` or
`@validator` decorator) to the EMBEDDER_URL field that parses the URL and
validates that the scheme is either http or https, raising a clear validation
error if any other scheme is provided. This will ensure the validation happens
at application startup and prevents invalid URLs from being used downstream.
In `@backend/app/routers/agent.py`:
- Line 16: Update the import statement from the deprecated streamablehttp_client
to the current streamable_http_client function. In the import statement on line
16 that imports from mcp.client.streamable_http, replace the function name
streamablehttp_client (no underscore) with streamable_http_client (with
underscores). This ensures compatibility with the current version of the mcp SDK
as of June 2026.
In `@backend/app/routers/evaluation.py`:
- Around line 156-165: Add a null check for the `table` variable immediately
after the `session.get(Table, table_id)` call on line 156. Before accessing
`table.name` in the `EvalAPIRequest` instantiation or passing `table` to
`_build_questions_payload`, verify that `table` is not `None`. If `table` is
`None`, raise an appropriate exception (such as a 404 HTTPException) to handle
the case where the table was deleted.
- Line 189: The deprecated datetime.utcnow() method needs to be replaced with
the modern equivalent datetime.now(timezone.utc) which is the recommended
approach since Python 3.12. Update the import statement to include timezone from
the datetime module, then locate all occurrences of datetime.utcnow() throughout
the file (particularly in assignments like run.completed_at and any other
similar datetime assignments) and replace each call with
datetime.now(timezone.utc) to ensure compatibility with current Python
standards.
In `@backend/app/routers/tables.py`:
- Around line 134-145: The embedder HTTP request logic is duplicated across two
locations (around lines 134-145 and lines 253-264), with both blocks handling
header setup with EMBEDDER_KEY authorization, making a POST request to
EMBEDDER_URL with model and input parameters, and parsing the JSON response to
extract embedding data. Extract this logic into a single shared helper function
that accepts the text to embed as a parameter and returns the embedding result,
then replace both duplicated blocks with calls to this helper function to
prevent contract drift when the embedder API changes.
- Around line 143-145: The status code check on the embed_resp object (if
embed_resp.status_code == 200) silently skips embedding extraction for error
responses without logging or raising an exception, making it difficult to detect
API failures. Replace the conditional status code check with a call to
embed_resp.raise_for_status() before accessing the JSON data. This will
explicitly raise an HTTPError for non-2xx status codes (401, 429, 500, etc.)
instead of silently continuing. Apply this same fix at both locations mentioned
(around lines 143-145 and 262-270) by removing the if statement and calling
raise_for_status() on the embed_resp object directly, then proceeding to extract
the embedding data unconditionally.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 8348f866-c88e-48c9-8146-eb467eb1356f
📒 Files selected for processing (10)
.gitignoreagent/src/agent/config.pyagent/src/agent/main.pyagent/src/agent/nodes/schema_explorer.pybackend/app/config.pybackend/app/routers/agent.pybackend/app/routers/evaluation.pybackend/app/routers/tables.pybackend/app/seed.pybackend/app/services/langfuse_client.py
| EMBEDDER_URL: str = "http://localhost:11434/v1/embeddings" | ||
| EMBEDDER_MODEL: str = "nomic-embed-text:latest" | ||
| EMBEDDER_KEY: str = "" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n agent/src/agent/config.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 1600
🏁 Script executed:
rg -A 5 -B 5 "EMBEDDER_URL" --type pyRepository: StavPonte11/text2sql-onboarding
Length of output: 4453
🏁 Script executed:
rg "from pydantic" agent/src/agent/config.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 165
Constrain EMBEDDER_URL to HTTP(S) at settings load time.
Line 12 keeps EMBEDDER_URL as an unconstrained str, allowing unsafe URL schemes (ftp, file, etc.) to be passed to embedder callers. The value is directly consumed by urllib.request.Request and httpx.post without scheme validation. Use Pydantic's AnyHttpUrl type to enforce HTTP/HTTPS schemes at configuration initialization and fail fast on misconfiguration.
Suggested hardening
-from pydantic import Field
+from pydantic import AnyHttpUrl, Field
class AgentSettings(BaseSettings):
- EMBEDDER_URL: str = "http://localhost:11434/v1/embeddings"
+ EMBEDDER_URL: AnyHttpUrl = "http://localhost:11434/v1/embeddings"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| EMBEDDER_URL: str = "http://localhost:11434/v1/embeddings" | |
| EMBEDDER_MODEL: str = "nomic-embed-text:latest" | |
| EMBEDDER_KEY: str = "" | |
| EMBEDDER_URL: AnyHttpUrl = "http://localhost:11434/v1/embeddings" | |
| EMBEDDER_MODEL: str = "nomic-embed-text:latest" | |
| EMBEDDER_KEY: str = "" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agent/src/agent/config.py` around lines 12 - 14, The EMBEDDER_URL field on
line 12 is currently typed as a plain str, which allows unsafe URL schemes like
ftp or file to be passed through. Change the EMBEDDER_URL field type from str to
Pydantic's AnyHttpUrl type to enforce that only HTTP and HTTPS schemes are
allowed. This will validate the URL scheme at configuration initialization time
and fail fast if an invalid scheme is provided.
Source: Linters/SAST tools
| EMBEDDER_URL: str = "http://host.docker.internal:11434/v1/embeddings" | ||
| EMBEDDER_MODEL: str = "nomic-embed-text:latest" | ||
| EMBEDDER_KEY: str = "" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n backend/app/config.py | head -100Repository: StavPonte11/text2sql-onboarding
Length of output: 2645
🏁 Script executed:
cat -n backend/app/config.py | tail -20Repository: StavPonte11/text2sql-onboarding
Length of output: 902
🏁 Script executed:
wc -l backend/app/config.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 100
🏁 Script executed:
rg "EMBEDDER_URL" -A 3 -B 3Repository: StavPonte11/text2sql-onboarding
Length of output: 3384
🏁 Script executed:
cat -n backend/app/config.py && echo "---" && cat -n agent/src/agent/config.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 4174
🏁 Script executed:
rg "field_validator|Field\(" backend/app/config.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 57
🏁 Script executed:
rg "field_validator|`@validator`" backend/app/ && echo "---FOUND---" || echo "---NO VALIDATORS FOUND---"Repository: StavPonte11/text2sql-onboarding
Length of output: 101
🏁 Script executed:
cat -n backend/app/seed.py | grep -A 10 "def get_embedding"Repository: StavPonte11/text2sql-onboarding
Length of output: 711
🏁 Script executed:
rg "http|https|scheme" backend/app/ -i | grep -i "valid\|check\|assert" | head -20Repository: StavPonte11/text2sql-onboarding
Length of output: 57
🏁 Script executed:
rg "urllib.*parse|urlparse|HttpUrl|AnyUrl" backend/app/Repository: StavPonte11/text2sql-onboarding
Length of output: 57
🏁 Script executed:
rg "from pydantic" backend/app/config.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 138
🏁 Script executed:
rg "HttpUrl|AnyUrl|field_validator" --type pythonRepository: StavPonte11/text2sql-onboarding
Length of output: 106
🏁 Script executed:
rg "HttpUrl|AnyUrl|field_validator" -t pyRepository: StavPonte11/text2sql-onboarding
Length of output: 57
🏁 Script executed:
git log --oneline --all -S "EMBEDDER_URL" | head -5Repository: StavPonte11/text2sql-onboarding
Length of output: 202
🏁 Script executed:
cat -n backend/app/routers/tables.py | grep -A 5 -B 5 "EMBEDDER_URL"Repository: StavPonte11/text2sql-onboarding
Length of output: 1010
🏁 Script executed:
git log --all --oneline | head -20Repository: StavPonte11/text2sql-onboarding
Length of output: 1031
🏁 Script executed:
git log -p --all -S "field_validator" --reverse | head -50Repository: StavPonte11/text2sql-onboarding
Length of output: 57
🏁 Script executed:
git show 5e7aea0 --statRepository: StavPonte11/text2sql-onboarding
Length of output: 456
🏁 Script executed:
git show 462eb08Repository: StavPonte11/text2sql-onboarding
Length of output: 958
🏁 Script executed:
rg "http" backend/app/config.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 450
🏁 Script executed:
rg "startswith.*http\|http.*startswith\|scheme" backend/app/ agent/src/Repository: StavPonte11/text2sql-onboarding
Length of output: 57
🏁 Script executed:
rg "ValueError|raise.*Error" backend/app/config.py agent/src/agent/config.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 57
🏁 Script executed:
rg "model_validate\|field_validator\|validator" -t py | head -20Repository: StavPonte11/text2sql-onboarding
Length of output: 57
Add HTTP(S) scheme validation to EMBEDDER_URL in the Settings class.
The EMBEDDER_URL field at line 55 is a plain str with no validation. Invalid or unsafe URL schemes (e.g., ftp://, file://, data://) can be accepted at startup and passed directly to HTTP clients (httpx.post, urllib.request) without safeguards. Add a field validator to reject non-HTTP(S) values and fail fast at application startup.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/app/config.py` around lines 55 - 57, The EMBEDDER_URL field in the
Settings class lacks validation for URL schemes, which allows unsafe or
unsupported schemes like ftp://, file://, or data:// to pass through and
potentially cause security or functionality issues. Add a Pydantic field
validator (using `@field_validator` or `@validator` decorator) to the EMBEDDER_URL
field that parses the URL and validates that the scheme is either http or https,
raising a clear validation error if any other scheme is provided. This will
ensure the validation happens at application startup and prevents invalid URLs
from being used downstream.
| from fastapi import APIRouter, HTTPException | ||
| from mcp.client.session import ClientSession | ||
| from mcp.client.sse import sse_client | ||
| from mcp.client.streamable_http import streamablehttp_client |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the mcp version declared in backend dependencies
fd -i "requirements" --exec cat {} \; 2>/dev/null
fd -i "pyproject.toml" --exec cat {} \; 2>/dev/nullRepository: StavPonte11/text2sql-onboarding
Length of output: 4198
🌐 Web query:
What version of the mcp Python package introduced streamablehttp_client?
💡 Result:
The streamablehttp_client function was an early API provided by the mcp Python SDK for handling StreamableHTTP transport connections [1][2]. It was later superseded by the streamable_http_client function (note the added underscore), which improved control over the httpx.AsyncClient lifecycle [1][3]. The newer streamable_http_client was introduced in version 1.24.0 of the mcp Python SDK [3]. Following this introduction, the original streamablehttp_client was marked as deprecated and eventually removed from the SDK in subsequent releases [4][3].
Citations:
- 1: https://github.com/modelcontextprotocol/python-sdk/blob/e8e64842/docs/migration.md
- 2: jlowin/mcp-python-sdk@a3a4b8d
- 3: fix(python): replace deprecated streamablehttp_client with streamable_http_client mcp-use/mcp-use#1017
- 4: modelcontextprotocol/python-sdk@8893b02
🌐 Web query:
mcp Python SDK streamablehttp_client availability version history release notes
💡 Result:
The function streamablehttp_client (without the underscore) has been deprecated and subsequently removed from the Model Context Protocol (MCP) Python SDK [1][2]. Availability and Version History: - The function was replaced by streamable_http_client (with an underscore) to accommodate changes in how the SDK handles httpx.AsyncClient instances [3]. - streamablehttp_client was marked as deprecated while streamable_http_client was introduced as the recommended replacement [1][2]. - The removal of the deprecated streamablehttp_client was finalized in early 2026 [1]. Current Status: - As of June 2026, streamable_http_client is the standard, active function used for streamable HTTP transport in the SDK [4][5]. - Developers should update their code to import streamable_http_client from mcp.client.streamable_http and pass an configured httpx.AsyncClient instead of using the older function signatures [2][3]. For further implementation details, the official migration documentation provides comprehensive guides on transitioning from the legacy streamablehttp_client to the current streamable_http_client API [2].
Citations:
- 1: modelcontextprotocol/python-sdk@8893b02
- 2: https://github.com/modelcontextprotocol/python-sdk/blob/7ba4fb881d85406f44a5af8169fb7200fa7c8e49/docs/migration.md
- 3: Add
streamable_http_clientwhich acceptshttpx.AsyncClientinstead ofhttpx_client_factorymodelcontextprotocol/python-sdk#1177 - 4: feat: add remote streamable-http client example modelcontextprotocol/python-sdk#2807
- 5: https://github.com/modelcontextprotocol/python-sdk/blob/main/src/mcp/client/streamable_http.py
🏁 Script executed:
# Search for any documentation or changelog that might specify when streamablehttp_client was available
cd backend && git log --oneline --all --grep="streamable" 2>/dev/null | head -20 || echo "Git not available"Repository: StavPonte11/text2sql-onboarding
Length of output: 57
Use streamable_http_client instead of the deprecated streamablehttp_client.
The function streamablehttp_client was deprecated and removed from the mcp SDK in early 2026. As of June 2026, the correct import is streamable_http_client (with underscore). Update the import to:
from mcp.client.streamable_http import streamable_http_client🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/app/routers/agent.py` at line 16, Update the import statement from
the deprecated streamablehttp_client to the current streamable_http_client
function. In the import statement on line 16 that imports from
mcp.client.streamable_http, replace the function name streamablehttp_client (no
underscore) with streamable_http_client (with underscores). This ensures
compatibility with the current version of the mcp SDK as of June 2026.
| table = session.get(Table, table_id) | ||
| dataset_name = f"text2sql_sandbox_{table_id}" | ||
|
|
||
| question_scores_contains: list[float] = [ | ||
| float(random.choice([0, 1])) for _ in questions | ||
| ] | ||
| question_scores_exact: list[float] = [ | ||
| float(random.choice([0, 1])) for _ in questions | ||
| ] | ||
| question_scores_ranking: list[float] = [ | ||
| float(random.choice([0, 1])) for _ in questions | ||
| ] | ||
| logger.info( | ||
| f"[Eval] Scored {len(questions)} questions for table {table_id} (local stubs)" | ||
| ) | ||
| if langfuse_client.enabled: | ||
| langfuse_client.ensure_dataset_synced( | ||
| dataset_name, _build_questions_payload(questions, table) | ||
| ) | ||
|
|
||
| avg_score_contains = round( | ||
| sum(question_scores_contains) / len(question_scores_contains), 3 | ||
| ) | ||
| pass_count_contains = sum(1 for s in question_scores_contains if s >= 0.50) | ||
| pass_rate_contains = round(pass_count_contains / len(question_scores_contains), 3) | ||
| avg_score_exact = round(sum(question_scores_exact) / len(question_scores_exact), 3) | ||
| pass_count_exact = sum(1 for s in question_scores_exact if s >= 0.50) | ||
| pass_rate_exact = round(pass_count_exact / len(question_scores_exact), 3) | ||
| avg_score_ranking = round( | ||
| sum(question_scores_ranking) / len(question_scores_ranking), 3 | ||
| ) | ||
| pass_count_ranking = sum(1 for s in question_scores_ranking if s >= 0.50) | ||
| pass_rate_ranking = round(pass_count_ranking / len(question_scores_ranking), 3) | ||
| try: | ||
| req = EvalAPIRequest(tables_names=[table.name], dataset_name=dataset_name) |
There was a problem hiding this comment.
Missing null check for table before accessing its attributes.
session.get(Table, table_id) can return None if the table was deleted. However, table.name (line 165) and table.schema_name via _build_questions_payload (line 161) are accessed without a null check, which would raise AttributeError.
The null check on line 208 (if table and table.status == ...) comes too late.
🛡️ Proposed fix
table = session.get(Table, table_id)
+ if not table:
+ logger.error(f"[Eval] Table {table_id} not found")
+ run.status = EvalStatus.failed
+ run.score = -1.0
+ session.add(run)
+ session.commit()
+ return -1.0
+
dataset_name = f"text2sql_sandbox_{table_id}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| table = session.get(Table, table_id) | |
| dataset_name = f"text2sql_sandbox_{table_id}" | |
| question_scores_contains: list[float] = [ | |
| float(random.choice([0, 1])) for _ in questions | |
| ] | |
| question_scores_exact: list[float] = [ | |
| float(random.choice([0, 1])) for _ in questions | |
| ] | |
| question_scores_ranking: list[float] = [ | |
| float(random.choice([0, 1])) for _ in questions | |
| ] | |
| logger.info( | |
| f"[Eval] Scored {len(questions)} questions for table {table_id} (local stubs)" | |
| ) | |
| if langfuse_client.enabled: | |
| langfuse_client.ensure_dataset_synced( | |
| dataset_name, _build_questions_payload(questions, table) | |
| ) | |
| avg_score_contains = round( | |
| sum(question_scores_contains) / len(question_scores_contains), 3 | |
| ) | |
| pass_count_contains = sum(1 for s in question_scores_contains if s >= 0.50) | |
| pass_rate_contains = round(pass_count_contains / len(question_scores_contains), 3) | |
| avg_score_exact = round(sum(question_scores_exact) / len(question_scores_exact), 3) | |
| pass_count_exact = sum(1 for s in question_scores_exact if s >= 0.50) | |
| pass_rate_exact = round(pass_count_exact / len(question_scores_exact), 3) | |
| avg_score_ranking = round( | |
| sum(question_scores_ranking) / len(question_scores_ranking), 3 | |
| ) | |
| pass_count_ranking = sum(1 for s in question_scores_ranking if s >= 0.50) | |
| pass_rate_ranking = round(pass_count_ranking / len(question_scores_ranking), 3) | |
| try: | |
| req = EvalAPIRequest(tables_names=[table.name], dataset_name=dataset_name) | |
| table = session.get(Table, table_id) | |
| if not table: | |
| logger.error(f"[Eval] Table {table_id} not found") | |
| run.status = EvalStatus.failed | |
| run.score = -1.0 | |
| session.add(run) | |
| session.commit() | |
| return -1.0 | |
| dataset_name = f"text2sql_sandbox_{table_id}" | |
| if langfuse_client.enabled: | |
| langfuse_client.ensure_dataset_synced( | |
| dataset_name, _build_questions_payload(questions, table) | |
| ) | |
| try: | |
| req = EvalAPIRequest(tables_names=[table.name], dataset_name=dataset_name) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/app/routers/evaluation.py` around lines 156 - 165, Add a null check
for the `table` variable immediately after the `session.get(Table, table_id)`
call on line 156. Before accessing `table.name` in the `EvalAPIRequest`
instantiation or passing `table` to `_build_questions_payload`, verify that
`table` is not `None`. If `table` is `None`, raise an appropriate exception
(such as a 404 HTTPException) to handle the case where the table was deleted.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/routers/evaluation.py (1)
443-493: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winConsider extracting common evaluation API call and persistence logic.
The pattern of making the evaluation API call, handling errors, populating
EvalRunmetrics, and persistingEvalResultrows is repeated nearly identically acrossexecute_single_table_eval,_run_production_dataset_eval,_run_candidate_eval, and_run_regression_eval.Extracting this into a helper function would reduce duplication and make future changes to the API contract easier to maintain:
def _call_eval_service_and_persist( session: Session, run: EvalRun, table_names: list[str], dataset_name: str, log_prefix: str, ) -> tuple[EvalAPIOverallMetrics | None, float]: """Returns (metrics, score) or (None, -1.0) on failure.""" # ... shared API call + persistence logic🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/routers/evaluation.py` around lines 443 - 493, The evaluation API call and persistence logic is duplicated across the functions execute_single_table_eval, _run_production_dataset_eval, _run_candidate_eval, and _run_regression_eval. Extract this common pattern into a helper function that accepts the session, EvalRun object, table_names, dataset_name, and a log_prefix parameter. The helper should encapsulate the entire flow: making the POST request to the evaluation service, handling exceptions and logging errors, parsing the EvalAPIResponse, updating the EvalRun metrics (score, pass_rate, fail_rate, total_questions, status, completed_at, dimension_averages), creating and persisting EvalResult objects from eval_resp.results, and returning a tuple of (metrics, score) or (None, -1.0) on failure. Then replace the duplicated blocks in each function with calls to this helper, passing the appropriate log_prefix for context-specific logging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/src/core/embeddings.py`:
- Line 23: The response parsing on the line containing json.loads is using the
variable name `data` instead of the string literal "data" when indexing into the
parsed JSON response. Change the index from the variable `data` to the string
key "data" in the json.loads result indexing to correctly extract the embedding
payload from the API response.
- Around line 20-22: Before creating the urllib.request.Request object with the
embedder_url parameter on line 20, add validation to ensure the URL scheme is
restricted to only HTTP or HTTPS. Parse the embedder_url using
urllib.parse.urlparse to extract the scheme, verify it is either 'http' or
'https', and raise an appropriate exception (such as ValueError) if any other
scheme is provided. This validation must occur before the urllib.request.Request
call to prevent potential local resource access through file:// or other custom
URL schemes.
In `@core/src/core/models/models.py`:
- Around line 32-33: The created_at and updated_at fields in the models are
generating timezone-aware UTC datetime objects, but the database migrations are
creating TIMESTAMP (naive) columns instead of TIMESTAMPTZ columns. To fix this,
update the initial migration file (01333a305f59_initial_schema.py) to use
sa.DateTime(timezone=True) instead of sa.DateTime() for all created_at and
updated_at column definitions. If this migration has already been deployed to
production, create a separate ALTER migration that converts existing TIMESTAMP
columns to TIMESTAMP WITH TIME ZONE. Additionally, verify that all subsequent
migrations (such as 781457ead87d_add_httpextractor.py and
d35412e44b34_add_foreignkeymapping.py) also use sa.DateTime(timezone=True) for
any new datetime columns to maintain consistency going forward.
---
Outside diff comments:
In `@backend/app/routers/evaluation.py`:
- Around line 443-493: The evaluation API call and persistence logic is
duplicated across the functions execute_single_table_eval,
_run_production_dataset_eval, _run_candidate_eval, and _run_regression_eval.
Extract this common pattern into a helper function that accepts the session,
EvalRun object, table_names, dataset_name, and a log_prefix parameter. The
helper should encapsulate the entire flow: making the POST request to the
evaluation service, handling exceptions and logging errors, parsing the
EvalAPIResponse, updating the EvalRun metrics (score, pass_rate, fail_rate,
total_questions, status, completed_at, dimension_averages), creating and
persisting EvalResult objects from eval_resp.results, and returning a tuple of
(metrics, score) or (None, -1.0) on failure. Then replace the duplicated blocks
in each function with calls to this helper, passing the appropriate log_prefix
for context-specific logging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: c74fd2a3-466a-480a-9109-7a632773165a
📒 Files selected for processing (13)
agent/src/agent/nodes/schema_explorer.pybackend/app/routers/admin_approval.pybackend/app/routers/enrichment.pybackend/app/routers/evaluation.pybackend/app/routers/health.pybackend/app/routers/orchestration.pybackend/app/routers/profiling.pybackend/app/routers/tables.pybackend/app/seed.pybackend/app/services/profiling_engine.pybackend/app/services/scheduler.pycore/src/core/embeddings.pycore/src/core/models/models.py
| req = urllib.request.Request(embedder_url, data=data, headers=headers) | ||
| try: | ||
| with urllib.request.urlopen(req, timeout=10) as res: |
There was a problem hiding this comment.
Restrict embedder URL schemes to HTTP(S) before opening the request.
Line 20 and Line 22 accept any urllib scheme. Even though this URL is config-driven, allowing file:/custom schemes weakens security guarantees and can cause unexpected local resource access.
Proposed fix
import json
import logging
import urllib.request
+from urllib.parse import urlparse
@@
def get_embedding(
@@
) -> list[float] | None:
+ parsed = urlparse(embedder_url)
+ if parsed.scheme not in {"http", "https"}:
+ raise ValueError(f"Unsupported embedder URL scheme: {parsed.scheme}")
+
data = json.dumps({"model": embedder_model, "input": text}).encode()🧰 Tools
🪛 Ruff (0.15.17)
[error] 20-20: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
[error] 22-22: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@core/src/core/embeddings.py` around lines 20 - 22, Before creating the
urllib.request.Request object with the embedder_url parameter on line 20, add
validation to ensure the URL scheme is restricted to only HTTP or HTTPS. Parse
the embedder_url using urllib.parse.urlparse to extract the scheme, verify it is
either 'http' or 'https', and raise an appropriate exception (such as
ValueError) if any other scheme is provided. This validation must occur before
the urllib.request.Request call to prevent potential local resource access
through file:// or other custom URL schemes.
Source: Linters/SAST tools
| created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc)) | ||
| updated_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc)) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for any remaining datetime.utcnow() usage that wasn't updated
rg -n 'datetime\.utcnow\(\)' --type pyRepository: StavPonte11/text2sql-onboarding
Length of output: 57
🏁 Script executed:
#!/bin/bash
# Check for any timezone-naive datetime comparisons or operations
rg -n 'datetime\.now\(\)' --type py -A2 -B2 | head -50Repository: StavPonte11/text2sql-onboarding
Length of output: 463
🏁 Script executed:
#!/bin/bash
# Verify SQLModel/Alembic migrations define TIMESTAMPTZ columns
fd -e py alembic migration | xargs rg -l 'TIMESTAMP' | head -5Repository: StavPonte11/text2sql-onboarding
Length of output: 305
🏁 Script executed:
# Find Alembic migration files
fd -e py . -path '*migration*' -o -path '*alembic*' | grep -E '(migration|alembic)' | head -20Repository: StavPonte11/text2sql-onboarding
Length of output: 244
🏁 Script executed:
# Get the full context of the datetime.now() usage in extractor.py
cat -n agent/src/agent/nodes/extractor.py | sed -n '55,75p'Repository: StavPonte11/text2sql-onboarding
Length of output: 896
🏁 Script executed:
# Search for database schema definitions or create table statements
rg -n 'TIMESTAMP|created_at|updated_at' --type py -B2 -A2 | head -100Repository: StavPonte11/text2sql-onboarding
Length of output: 5592
🏁 Script executed:
# Find migration files using git ls-files
git ls-files | grep -i migratRepository: StavPonte11/text2sql-onboarding
Length of output: 209
🏁 Script executed:
# Find any database initialization or schema definition files
git ls-files | grep -E '(alembic|migration|schema|init.*db)' | head -20Repository: StavPonte11/text2sql-onboarding
Length of output: 969
🏁 Script executed:
# Check for SQLAlchemy create_all or similar schema creation
rg -n 'create_all|CreateTable|TIMESTAMP' --type py -l | head -20Repository: StavPonte11/text2sql-onboarding
Length of output: 188
🏁 Script executed:
# Check the initial migration to see how timestamp columns are created
cat backend/alembic/versions/01333a305f59_initial_schema.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 21756
🏁 Script executed:
# Check the deployed schema
cat backend/schema.sql | grep -A 5 -B 5 'TIMESTAMP\|created_at\|updated_at' | head -100Repository: StavPonte11/text2sql-onboarding
Length of output: 57
🏁 Script executed:
# Check other migrations for timestamp-related changes
grep -n 'TIMESTAMP\|created_at\|updated_at' backend/alembic/versions/*.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 3044
Migration columns must use sa.DateTime(timezone=True) to create TIMESTAMPTZ columns.
The database schema created by 01333a305f59_initial_schema.py uses sa.DateTime() which creates PostgreSQL TIMESTAMP (naive) columns. However, the models generate timezone-aware UTC datetime objects with datetime.now(timezone.utc). This creates a critical type mismatch:
- Update the initial migration to use
sa.DateTime(timezone=True)for all created_at/updated_at columns to createTIMESTAMP WITH TIME ZONE(timestamptz) in PostgreSQL - Create an ALTER migration if the initial migration has already been applied to production databases, converting existing TIMESTAMP columns to TIMESTAMPTZ
- Verify all subsequent migrations (e.g.,
781457ead87d_add_httpextractor.py,d35412e44b34_add_foreignkeymapping.py) also use the corrected DateTime definition going forward
The naive datetime.now() in agent/src/agent/nodes/extractor.py:63 is only used for display and does not interact with database timestamps, so it poses no compatibility risk.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@core/src/core/models/models.py` around lines 32 - 33, The created_at and
updated_at fields in the models are generating timezone-aware UTC datetime
objects, but the database migrations are creating TIMESTAMP (naive) columns
instead of TIMESTAMPTZ columns. To fix this, update the initial migration file
(01333a305f59_initial_schema.py) to use sa.DateTime(timezone=True) instead of
sa.DateTime() for all created_at and updated_at column definitions. If this
migration has already been deployed to production, create a separate ALTER
migration that converts existing TIMESTAMP columns to TIMESTAMP WITH TIME ZONE.
Additionally, verify that all subsequent migrations (such as
781457ead87d_add_httpextractor.py and d35412e44b34_add_foreignkeymapping.py)
also use sa.DateTime(timezone=True) for any new datetime columns to maintain
consistency going forward.
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/app/routers/admin_approval.py`:
- Line 2: In the admin_approval.py file, locate the lines 181 and 219 where
`updated_at` is being assigned with naive datetime values in the approval and
rejection flows. Replace these naive datetime calls (such as datetime.now()
without timezone info) with UTC-aware datetime values (such as
datetime.now(timezone.utc) or datetime.utcnow()) to ensure status transitions
remain comparable across services and environments.
In `@backend/app/routers/evaluation.py`:
- Line 17: The import statement on line 17 is incomplete for timezone-aware UTC
timestamps. Update the import to include timezone from the datetime module,
changing from just importing datetime to also importing timezone. Then, replace
all occurrences of datetime.now() (at lines 189, 303, 386, and 470) with
datetime.now(timezone.utc) to ensure timezone-aware UTC timestamps are used
throughout the file instead of naive local time.
In `@backend/app/routers/orchestration.py`:
- Line 14: Update the import statement at the top of the file to include
timezone from the datetime module (add timezone to the existing datetime
import). Then, locate all instances of datetime.now() calls in the trends window
and today counter logic (specifically around lines 467 and 780) and replace them
with datetime.now(timezone.utc) to ensure all analytics boundaries use UTC-based
time references instead of local-naive datetime values.
In `@backend/app/routers/profiling.py`:
- Line 10: The import statement at the top of the file only imports datetime and
timedelta, but UTC-aware timestamps require the timezone object. Update the
import statement to include timezone from the datetime module, then replace the
datetime.now() calls used for cached_until and updated_at fields (around lines
89-90) with datetime.now(timezone.utc) to ensure all timestamps are UTC-aware
and consistent across different deployment environments.
In `@backend/app/routers/tables.py`:
- Line 2: The import statement on line 2 is missing the timezone module needed
for timezone-aware timestamps. Update the import to include UTC from the
datetime module, then locate the three datetime.now() calls at lines 251, 308,
and 388 where updated_at fields are being persisted to the database, and modify
each of these calls to use datetime.now(UTC) instead of datetime.now() to ensure
timezone-aware UTC timestamps are stored consistently.
In `@backend/app/services/profiling_engine.py`:
- Line 12: The import statement on line 12 needs to include `timezone` from the
datetime module to support creating UTC-aware timestamps. Update the import
statement to add `timezone` alongside `date` and `datetime`. Then locate the
`datetime.now()` call at line 868 (in the context where `computed_at` is being
set) and replace it with `datetime.now(timezone.utc)` to ensure the resulting
timestamp includes timezone information that will be properly serialized when
`.isoformat()` is called at line 995.
In `@backend/app/services/scheduler.py`:
- Line 7: The scheduler is writing naive datetime timestamps for last_run_at
using datetime.now(), which can cause issues in distributed systems. Update the
import statement to include timezone from the datetime module, then locate where
last_run_at is assigned with datetime.now() and change it to
datetime.now(timezone.utc) to ensure UTC-aware timestamps are properly
persisted.
In `@core/src/core/models/models.py`:
- Line 2: Replace all instances of `default_factory=datetime.now` with a
UTC-aware alternative such as `default_factory=datetime.utcnow` or a custom
`utc_now` factory function across all timestamp fields in the models file. The
affected timestamp field definitions are located at the specified line ranges
and use `default_factory=datetime.now` as their default value initialization.
Ensure every timestamp field that currently uses `datetime.now` is updated to
use the UTC-aware factory to maintain consistency with UTC standardization
across all 22 timestamp fields in this file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: fd4038a1-cd42-4dc7-bbf4-b10fc4cb6e8c
📒 Files selected for processing (11)
backend/app/routers/admin_approval.pybackend/app/routers/enrichment.pybackend/app/routers/evaluation.pybackend/app/routers/health.pybackend/app/routers/orchestration.pybackend/app/routers/profiling.pybackend/app/routers/tables.pybackend/app/services/profiling_engine.pybackend/app/services/scheduler.pycore/src/core/embeddings.pycore/src/core/models/models.py
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/app/routers/admin_approval.py`:
- Line 2: In the admin_approval.py file, locate the lines 181 and 219 where
`updated_at` is being assigned with naive datetime values in the approval and
rejection flows. Replace these naive datetime calls (such as datetime.now()
without timezone info) with UTC-aware datetime values (such as
datetime.now(timezone.utc) or datetime.utcnow()) to ensure status transitions
remain comparable across services and environments.
In `@backend/app/routers/evaluation.py`:
- Line 17: The import statement on line 17 is incomplete for timezone-aware UTC
timestamps. Update the import to include timezone from the datetime module,
changing from just importing datetime to also importing timezone. Then, replace
all occurrences of datetime.now() (at lines 189, 303, 386, and 470) with
datetime.now(timezone.utc) to ensure timezone-aware UTC timestamps are used
throughout the file instead of naive local time.
In `@backend/app/routers/orchestration.py`:
- Line 14: Update the import statement at the top of the file to include
timezone from the datetime module (add timezone to the existing datetime
import). Then, locate all instances of datetime.now() calls in the trends window
and today counter logic (specifically around lines 467 and 780) and replace them
with datetime.now(timezone.utc) to ensure all analytics boundaries use UTC-based
time references instead of local-naive datetime values.
In `@backend/app/routers/profiling.py`:
- Line 10: The import statement at the top of the file only imports datetime and
timedelta, but UTC-aware timestamps require the timezone object. Update the
import statement to include timezone from the datetime module, then replace the
datetime.now() calls used for cached_until and updated_at fields (around lines
89-90) with datetime.now(timezone.utc) to ensure all timestamps are UTC-aware
and consistent across different deployment environments.
In `@backend/app/routers/tables.py`:
- Line 2: The import statement on line 2 is missing the timezone module needed
for timezone-aware timestamps. Update the import to include UTC from the
datetime module, then locate the three datetime.now() calls at lines 251, 308,
and 388 where updated_at fields are being persisted to the database, and modify
each of these calls to use datetime.now(UTC) instead of datetime.now() to ensure
timezone-aware UTC timestamps are stored consistently.
In `@backend/app/services/profiling_engine.py`:
- Line 12: The import statement on line 12 needs to include `timezone` from the
datetime module to support creating UTC-aware timestamps. Update the import
statement to add `timezone` alongside `date` and `datetime`. Then locate the
`datetime.now()` call at line 868 (in the context where `computed_at` is being
set) and replace it with `datetime.now(timezone.utc)` to ensure the resulting
timestamp includes timezone information that will be properly serialized when
`.isoformat()` is called at line 995.
In `@backend/app/services/scheduler.py`:
- Line 7: The scheduler is writing naive datetime timestamps for last_run_at
using datetime.now(), which can cause issues in distributed systems. Update the
import statement to include timezone from the datetime module, then locate where
last_run_at is assigned with datetime.now() and change it to
datetime.now(timezone.utc) to ensure UTC-aware timestamps are properly
persisted.
In `@core/src/core/models/models.py`:
- Line 2: Replace all instances of `default_factory=datetime.now` with a
UTC-aware alternative such as `default_factory=datetime.utcnow` or a custom
`utc_now` factory function across all timestamp fields in the models file. The
affected timestamp field definitions are located at the specified line ranges
and use `default_factory=datetime.now` as their default value initialization.
Ensure every timestamp field that currently uses `datetime.now` is updated to
use the UTC-aware factory to maintain consistency with UTC standardization
across all 22 timestamp fields in this file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: fd4038a1-cd42-4dc7-bbf4-b10fc4cb6e8c
📒 Files selected for processing (11)
backend/app/routers/admin_approval.pybackend/app/routers/enrichment.pybackend/app/routers/evaluation.pybackend/app/routers/health.pybackend/app/routers/orchestration.pybackend/app/routers/profiling.pybackend/app/routers/tables.pybackend/app/services/profiling_engine.pybackend/app/services/scheduler.pycore/src/core/embeddings.pycore/src/core/models/models.py
🛑 Comments failed to post (8)
backend/app/routers/admin_approval.py (1)
2-2:
⚠️ Potential issue | 🟠 Major🧩 Analysis chain
🏁 Script executed:
#!/bin/bash rg -nP 'updated_at\s*=\s*datetime\.now\(\)' backend/app/routers/admin_approval.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 159
🏁 Script executed:
# Check the imports and the exact lines mentioned head -20 backend/app/routers/admin_approval.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 566
🏁 Script executed:
# View the specific lines where datetime.now() is used sed -n '175,185p' backend/app/routers/admin_approval.py sed -n '215,225p' backend/app/routers/admin_approval.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 799
🏁 Script executed:
# Check if timezone is already used elsewhere in this file rg 'timezone|utc' backend/app/routers/admin_approval.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 57
🏁 Script executed:
# Check how updated_at is handled elsewhere in the codebase rg 'updated_at.*=' backend/app/ -A 2 -B 2 | head -60Repository: StavPonte11/text2sql-onboarding
Length of output: 2748
Preserve UTC-aware
updated_atwrites in approval/rejection flows.Lines 181 and 219 use naive datetime values. Replace with UTC-aware timestamps so status transitions remain comparable across services and environments.
Suggested patch
-from datetime import datetime +from datetime import datetime, timezone ... - table.updated_at = datetime.now() + table.updated_at = datetime.now(timezone.utc) ... - table.updated_at = datetime.now() + table.updated_at = datetime.now(timezone.utc)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/routers/admin_approval.py` at line 2, In the admin_approval.py file, locate the lines 181 and 219 where `updated_at` is being assigned with naive datetime values in the approval and rejection flows. Replace these naive datetime calls (such as datetime.now() without timezone info) with UTC-aware datetime values (such as datetime.now(timezone.utc) or datetime.utcnow()) to ensure status transitions remain comparable across services and environments.backend/app/routers/evaluation.py (1)
17-17:
⚠️ Potential issue | 🟠 Major | ⚡ Quick win
datetime.now()returns naive local time, not timezone-aware UTC.The import only includes
datetimewithouttimezone. Usingdatetime.now()produces a naive datetime in the system's local timezone, which contradicts the PR's stated goal of "timezone-aware UTC standardization." This will cause timestamp inconsistencies across servers in different timezones.🐛 Proposed fix
Update the import:
-from datetime import datetime +from datetime import datetime, timezoneThen at line 189 (and similarly at lines 303, 386, 470):
- run.completed_at = datetime.now() + run.completed_at = datetime.now(timezone.utc)Also applies to: 189-189
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/routers/evaluation.py` at line 17, The import statement on line 17 is incomplete for timezone-aware UTC timestamps. Update the import to include timezone from the datetime module, changing from just importing datetime to also importing timezone. Then, replace all occurrences of datetime.now() (at lines 189, 303, 386, and 470) with datetime.now(timezone.utc) to ensure timezone-aware UTC timestamps are used throughout the file instead of naive local time.backend/app/routers/orchestration.py (1)
14-14:
⚠️ Potential issue | 🟠 Major🧩 Analysis chain
🏁 Script executed:
#!/bin/bash rg -nP 'datetime\.now\(\)\s*-\s*timedelta|datetime\.now\(\)\.date\(\)' backend/app/routers/orchestration.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 213
🏁 Script executed:
# Check the imports section at the top of the file head -30 backend/app/routers/orchestration.py | cat -nRepository: StavPonte11/text2sql-onboarding
Length of output: 1186
🏁 Script executed:
# Check context around line 467 sed -n '460,475p' backend/app/routers/orchestration.py | cat -nRepository: StavPonte11/text2sql-onboarding
Length of output: 748
🏁 Script executed:
# Check context around line 780 sed -n '773,787p' backend/app/routers/orchestration.py | cat -nRepository: StavPonte11/text2sql-onboarding
Length of output: 855
🏁 Script executed:
# Search for all datetime.now() usages in the file (not just the two patterns from the original regex) rg -n 'datetime\.now\(\)' backend/app/routers/orchestration.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 213
Use UTC reference time for trends window and "today" counters.
The import at line 14 does not include
timezone, and both line 467 and line 780 use local-naivedatetime.now(). These analytics boundaries should be UTC-based to avoid timezone-dependent drift in returned metrics.Suggested patch
-from datetime import datetime, timedelta +from datetime import datetime, timedelta, timezone ... - since = datetime.now() - timedelta(days=days) + since = datetime.now(timezone.utc) - timedelta(days=days) ... - 1 for r in all_runs if r.created_at.date() == datetime.now().date() + 1 for r in all_runs if r.created_at.date() == datetime.now(timezone.utc).date()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/routers/orchestration.py` at line 14, Update the import statement at the top of the file to include timezone from the datetime module (add timezone to the existing datetime import). Then, locate all instances of datetime.now() calls in the trends window and today counter logic (specifically around lines 467 and 780) and replace them with datetime.now(timezone.utc) to ensure all analytics boundaries use UTC-based time references instead of local-naive datetime values.backend/app/routers/profiling.py (1)
10-10:
⚠️ Potential issue | 🟠 Major🧩 Analysis chain
🏁 Script executed:
#!/bin/bash rg -nP 'cached_until\s*=\s*datetime\.now\(\)|updated_at\s*=\s*datetime\.now\(\)' backend/app/routers/profiling.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 193
🏁 Script executed:
cat -n backend/app/routers/profiling.py | head -100 | tail -30Repository: StavPonte11/text2sql-onboarding
Length of output: 1430
🏁 Script executed:
rg -n 'datetime\.now\(\)' backend/app/routers/profiling.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 193
🏁 Script executed:
rg -rn 'datetime\.now\(\)' backend/app --type py | head -20Repository: StavPonte11/text2sql-onboarding
Length of output: 1196
🏁 Script executed:
head -15 backend/app/routers/profiling.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 478
🏁 Script executed:
grep -n "class TableProfile" backend/app -rRepository: StavPonte11/text2sql-onboarding
Length of output: 57
🏁 Script executed:
cat backend/app/models/profiling.py | head -50Repository: StavPonte11/text2sql-onboarding
Length of output: 139
🏁 Script executed:
find backend -name "*.py" -type f | xargs grep -l "class TableProfile"Repository: StavPonte11/text2sql-onboarding
Length of output: 59
🏁 Script executed:
grep -n "updated_at\|cached_until" backend/app -r --type py | grep -E "(Column|Table)Profile" | head -10Repository: StavPonte11/text2sql-onboarding
Length of output: 193
🏁 Script executed:
find backend -name "*.py" -type f | xargs grep -l "class TableProfile"Repository: StavPonte11/text2sql-onboarding
Length of output: 59
🏁 Script executed:
rg -n "class TableProfile" backendRepository: StavPonte11/text2sql-onboarding
Length of output: 57
🏁 Script executed:
rg -n "updated_at|cached_until" backend/app -A 2 -B 2Repository: StavPonte11/text2sql-onboarding
Length of output: 3294
Use UTC-aware timestamps for
cached_untilandupdated_atat lines 89–90.Replace
datetime.now()withdatetime.now(timezone.utc)ordatetime.utcnow(). Naive timestamps cause cache validity and audit fields to become environment-dependent across deployments.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/routers/profiling.py` at line 10, The import statement at the top of the file only imports datetime and timedelta, but UTC-aware timestamps require the timezone object. Update the import statement to include timezone from the datetime module, then replace the datetime.now() calls used for cached_until and updated_at fields (around lines 89-90) with datetime.now(timezone.utc) to ensure all timestamps are UTC-aware and consistent across different deployment environments.backend/app/routers/tables.py (1)
2-2:
⚠️ Potential issue | 🟠 Major🧩 Analysis chain
🏁 Script executed:
#!/bin/bash # Verify naive timestamp writes in this file. rg -nP '\bdatetime\.now\(\)' backend/app/routers/tables.py rg -nP '\bdatetime\.now\(UTC\)' backend/app/routers/tables.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 208
🏁 Script executed:
head -20 backend/app/routers/tables.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 580
🏁 Script executed:
python3 - <<'PY' # Check if UTC is importable from datetime import sys print(f"Python version: {sys.version_info}") try: from datetime import UTC print("UTC is importable from datetime") except ImportError: print("UTC is NOT importable from datetime") PYRepository: StavPonte11/text2sql-onboarding
Length of output: 200
🏁 Script executed:
# Check the complete import section and surrounding context for datetime usage sed -n '1,15p' backend/app/routers/tables.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 435
🏁 Script executed:
# Full search for all datetime.now() calls to ensure nothing was missed rg -n 'datetime\.now\(' backend/app/routers/tables.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 208
🏁 Script executed:
# View the context around the three identified lines sed -n '248,253p' backend/app/routers/tables.py sed -n '305,310p' backend/app/routers/tables.py sed -n '385,390p' backend/app/routers/tables.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 613
Restore timezone-aware UTC for persisted
updated_atfields.Line 2 imports only
datetimewithoutUTC. Lines 251, 308, and 388 persistdatetime.now()(naive local time) to database fields. This causes timestamp consistency issues when integrating with timezone-aware systems.Proposed fix
-from datetime import datetime +from datetime import UTC, datetime @@ - table.updated_at = datetime.now() + table.updated_at = datetime.now(UTC) @@ - table.updated_at = datetime.now() + table.updated_at = datetime.now(UTC) @@ - existing.updated_at = datetime.now() + existing.updated_at = datetime.now(UTC)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/routers/tables.py` at line 2, The import statement on line 2 is missing the timezone module needed for timezone-aware timestamps. Update the import to include UTC from the datetime module, then locate the three datetime.now() calls at lines 251, 308, and 388 where updated_at fields are being persisted to the database, and modify each of these calls to use datetime.now(UTC) instead of datetime.now() to ensure timezone-aware UTC timestamps are stored consistently.backend/app/services/profiling_engine.py (1)
12-12:
⚠️ Potential issue | 🟡 Minor🧩 Analysis chain
🏁 Script executed:
#!/bin/bash rg -nP 'computed_at\s*=\s*datetime\.now\(\)' backend/app/services/profiling_engine.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 112
🏁 Script executed:
#!/bin/bash # Get context around line 868 to see how computed_at is used sed -n '860,880p' backend/app/services/profiling_engine.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 748
🏁 Script executed:
#!/bin/bash # Search for profile_json["computed_at"] to see how it's serialized rg -n 'profile_json.*computed_at|computed_at.*isoformat' backend/app/services/profiling_engine.py -A 2 -B 2Repository: StavPonte11/text2sql-onboarding
Length of output: 295
🏁 Script executed:
#!/bin/bash # Check if datetime.now() is used elsewhere and how UTC-aware alternatives are used rg -n 'datetime\.now\(\)|datetime\.utcnow\(\)|timezone|UTC' backend/app/services/profiling_engine.py | head -20Repository: StavPonte11/text2sql-onboarding
Length of output: 112
🏁 Script executed:
#!/bin/bash # Search for timezone-related patterns in the entire backend rg -l 'timezone|utc|UTC|datetime\.utcnow|pytz|zoneinfo' backend/ --type py | head -10Repository: StavPonte11/text2sql-onboarding
Length of output: 175
🏁 Script executed:
#!/bin/bash # Check what the profile result structure expects for computed_at rg -n 'class TableProfilingResult|computed_at.*:' backend/ --type py -A 3 | head -30Repository: StavPonte11/text2sql-onboarding
Length of output: 991
Use UTC-aware timestamp for
computed_atto include timezone context in JSON serialization.Line 868 uses
datetime.now(), which creates a naive datetime object. When serialized via.isoformat()at line 995, this produces an ambiguous timestamp string without UTC offset information. Downstream consumers cannot determine the timezone of the timestamp. Usedatetime.now(timezone.utc)ordatetime.utcnow().replace(tzinfo=timezone.utc)instead.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/services/profiling_engine.py` at line 12, The import statement on line 12 needs to include `timezone` from the datetime module to support creating UTC-aware timestamps. Update the import statement to add `timezone` alongside `date` and `datetime`. Then locate the `datetime.now()` call at line 868 (in the context where `computed_at` is being set) and replace it with `datetime.now(timezone.utc)` to ensure the resulting timestamp includes timezone information that will be properly serialized when `.isoformat()` is called at line 995.backend/app/services/scheduler.py (1)
7-7:
⚠️ Potential issue | 🟠 Major | ⚡ Quick win🧩 Analysis chain
🏁 Script executed:
#!/bin/bash rg -nP 'last_run_at\s*=\s*datetime\.now\(\)' backend/app/services/scheduler.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 124
🏁 Script executed:
#!/bin/bash # Get full context around line 70 sed -n '50,90p' backend/app/services/scheduler.py | cat -nRepository: StavPonte11/text2sql-onboarding
Length of output: 1826
🏁 Script executed:
#!/bin/bash # Check if schedule is an ORM model and how last_run_at is defined rg -n 'class.*Schedule|last_run_at|DateTime|Column' backend/app/services/scheduler.py | head -20Repository: StavPonte11/text2sql-onboarding
Length of output: 124
🏁 Script executed:
#!/bin/bash # Search for schedule model definition across the backend fd -e py | xargs rg -l 'class.*Schedule' | grep -E '(model|models|schema)' | head -10Repository: StavPonte11/text2sql-onboarding
Length of output: 106
🏁 Script executed:
#!/bin/bash # Look for all datetime.now() calls in the scheduler file to check for consistency rg -n 'datetime\.now\(\)' backend/app/services/scheduler.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 124
🏁 Script executed:
#!/bin/bash # Find and examine the EvaluationSchedule model definition rg -n 'class EvaluationSchedule|last_run_at' core/src/core/models/models.py | head -30Repository: StavPonte11/text2sql-onboarding
Length of output: 344
🏁 Script executed:
#!/bin/bash # Get the full EvaluationSchedule class definition grep -n 'class EvaluationSchedule' core/src/core/models/models.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 263
🏁 Script executed:
#!/bin/bash # Read the models.py file to see EvaluationSchedule and last_run_at field sed -n '1,200p' core/src/core/models/models.py | grep -A 30 'class EvaluationSchedule'Repository: StavPonte11/text2sql-onboarding
Length of output: 248
🏁 Script executed:
#!/bin/bash # Check imports in scheduler.py to see what datetime tools are available head -15 backend/app/services/scheduler.py | cat -nRepository: StavPonte11/text2sql-onboarding
Length of output: 635
🏁 Script executed:
#!/bin/bash # Check if timezone module is imported anywhere in the scheduler or models rg -n 'import.*timezone|from.*timezone|datetime.timezone|pytz' backend/app/services/scheduler.py core/src/core/models/models.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 57
🏁 Script executed:
#!/bin/bash # Get more context around the EvaluationSchedule model to see all fields and how datetime is imported there sed -n '1,220p' core/src/core/models/models.py | tail -50Repository: StavPonte11/text2sql-onboarding
Length of output: 1640
🏁 Script executed:
#!/bin/bash # Check imports at the top of models.py to understand datetime handling head -30 core/src/core/models/models.pyRepository: StavPonte11/text2sql-onboarding
Length of output: 929
🏁 Script executed:
#!/bin/bash # Search for datetime usage patterns in models.py - are there other datetime fields using UTC? rg -n 'datetime|UTC|utc|timezone' core/src/core/models/models.py | head -20Repository: StavPonte11/text2sql-onboarding
Length of output: 981
🏁 Script executed:
#!/bin/bash # Check if there are other datetime.now() calls anywhere in the codebase that might show the pattern rg -n 'datetime\.now\(\)' backend/ core/ --include='*.py' | head -15Repository: StavPonte11/text2sql-onboarding
Length of output: 517
🏁 Script executed:
#!/bin/bash # Verify Python version and utcnow() deprecation status by checking what's recommended rg -n 'utcnow|timezone.utc' backend/ core/ --type pyRepository: StavPonte11/text2sql-onboarding
Length of output: 57
🏁 Script executed:
#!/bin/bash # Check if the schedule model uses any SQLAlchemy DateTime type with timezone parameter sed -n '197,215p' core/src/core/models/models.py | cat -nRepository: StavPonte11/text2sql-onboarding
Length of output: 977
🏁 Script executed:
#!/bin/bash # Verify if SQLModel/SQLAlchemy DateTime type is used anywhere with timezone configuration rg -n 'DateTime|sa_column.*DateTime' core/src/core/models/models.py backend/Repository: StavPonte11/text2sql-onboarding
Length of output: 3398
🏁 Script executed:
#!/bin/bash # Check what timezone library imports exist in the codebase fd -e py -x grep -l 'from datetime import\|import datetime' core/ backend/ | xargs grep -h 'from datetime\|import datetime\|timezone' | sort -uRepository: StavPonte11/text2sql-onboarding
Length of output: 4446
🏁 Script executed:
#!/bin/bash # Check the database timezone configuration cat backend/alembic/versions/01333a305f59_initial_schema.py | grep -A 10 'evaluation_schedules'Repository: StavPonte11/text2sql-onboarding
Length of output: 1348
🏁 Script executed:
#!/bin/bash # Check if there's any database-level timezone configuration rg -n 'timezone|UTC' backend/alembic/env.py backend/alembic/script.py.makoRepository: StavPonte11/text2sql-onboarding
Length of output: 57
Use timezone-aware UTC for persisted scheduler timestamps.
Line 70 writes
last_run_atusing naivedatetime.now(). For a distributed system with schedule persistence, this should use UTC-aware timestamps:datetime.now(timezone.utc). This ensures deterministic schedule history and correct cross-node interpretation.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/services/scheduler.py` at line 7, The scheduler is writing naive datetime timestamps for last_run_at using datetime.now(), which can cause issues in distributed systems. Update the import statement to include timezone from the datetime module, then locate where last_run_at is assigned with datetime.now() and change it to datetime.now(timezone.utc) to ensure UTC-aware timestamps are properly persisted.core/src/core/models/models.py (1)
2-2:
⚠️ Potential issue | 🟠 Major🧩 Analysis chain
🏁 Script executed:
#!/bin/bash # Verify naive timestamp default factories in this model file rg -nP 'default_factory\s*=\s*datetime\.now\b' core/src/core/models/models.py # Verify UTC-aware factories after fix (expected >0 once patched) rg -nP 'default_factory\s*=\s*(utc_now|lambda:\s*datetime\.now\(timezone\.utc\))' core/src/core/models/models.py # Inspect declared Python toolchain to keep timezone API choice compatible fd -a 'pyproject.toml|\.python-version|\.tool-versions|Dockerfile' | \ xargs -r -I{} sh -c 'echo "--- {} ---"; sed -n "1,120p" "{}"'Repository: StavPonte11/text2sql-onboarding
Length of output: 8287
Use UTC-aware defaults;
datetime.nowreturns timezone-naive local time, not UTC.The 22 timestamp fields using
default_factory=datetime.nowstore local-naive datetimes instead of UTC-aware values. This contradicts the stated UTC standardization and causes persisted timestamps to shift by host timezone offset, breaking ordering and filtering across deployments.Suggested fix pattern
-from datetime import datetime +from datetime import datetime, timezone + +def utc_now() -> datetime: + return datetime.now(timezone.utc) class Table(SQLModel, table=True): - created_at: datetime = Field(default_factory=datetime.now) - updated_at: datetime = Field(default_factory=datetime.now) + created_at: datetime = Field(default_factory=utc_now) + updated_at: datetime = Field(default_factory=utc_now)Apply the same
utc_nowfactory to all changed timestamp defaults in this file.Affected lines: 32–33, 66, 112, 158, 167, 208, 257, 296, 396, 459–460, 514, 555, 599, 655, 688–689, 722–723, 761–762.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/src/core/models/models.py` at line 2, Replace all instances of `default_factory=datetime.now` with a UTC-aware alternative such as `default_factory=datetime.utcnow` or a custom `utc_now` factory function across all timestamp fields in the models file. The affected timestamp field definitions are located at the specified line ranges and use `default_factory=datetime.now` as their default value initialization. Ensure every timestamp field that currently uses `datetime.now` is updated to use the UTC-aware factory to maintain consistency with UTC standardization across all 22 timestamp fields in this file.
Summary by CodeRabbit
/v1/embeddingsendpoint and added optional embedder API key support.datetime.now()).