Skip to content

Fix ts evaluations#21

Merged
yuvalkh merged 7 commits into
mainfrom
fix-ts-evaluations
Jun 22, 2026
Merged

Fix ts evaluations#21
yuvalkh merged 7 commits into
mainfrom
fix-ts-evaluations

Conversation

@yuvalkh

@yuvalkh yuvalkh commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • New Features
    • Integrated a remote evaluation service for text-to-SQL evaluations, replacing in-process scoring.
  • Infrastructure & Improvements
    • Updated embedder integration to use the /v1/embeddings endpoint and added optional embedder API key support.
    • Switched agent-to-backend communication from SSE to a streamable HTTP interface.
    • Improved evaluation dataset syncing by using question IDs for item identifiers.
  • Bug Fixes
    • Standardized timestamp defaults across models and updates to use system local time (via datetime.now()).
  • Chores
    • Updated ignore rules to exclude OpenShift image artifacts.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: fd4038a1-cd42-4dc7-bbf4-b10fc4cb6e8c

📥 Commits

Reviewing files that changed from the base of the PR and between 980a76f and 422b05d.

📒 Files selected for processing (11)
  • backend/app/routers/admin_approval.py
  • backend/app/routers/enrichment.py
  • backend/app/routers/evaluation.py
  • backend/app/routers/health.py
  • backend/app/routers/orchestration.py
  • backend/app/routers/profiling.py
  • backend/app/routers/tables.py
  • backend/app/services/profiling_engine.py
  • backend/app/services/scheduler.py
  • core/src/core/embeddings.py
  • core/src/core/models/models.py

📝 Walkthrough

Walkthrough

This PR consolidates three major infrastructure changes: (1) centralizes embedder logic into a core module using OpenAI-compatible API contract (input/Bearer auth) and adds EMBEDDER_KEY config support; (2) replaces SSE MCP transport with Streamable HTTP in both agent and backend; (3) migrates all evaluation phases to call a remote evaluation service instead of using local stubs, introducing Pydantic models and EVALUATION_SERVICE_URL. Additionally, timestamps across 50+ sites are standardized from naive UTC to timezone-aware UTC datetimes.

Changes

Embedder API Contract and MCP Transport Updates

Layer / File(s) Summary
Core embeddings module with OpenAI-compatible contract
core/src/core/embeddings.py
New module defines EXPECTED_EMBEDDING_DIM constant and get_embedding helper that builds OpenAI-compatible JSON requests (model + input fields), adds optional Bearer Authorization header when embedder_key is provided, enforces 10-second timeout, validates embedding length, and logs warnings on errors.
Embedder config defaults and EMBEDDER_KEY
agent/src/agent/config.py, backend/app/config.py
AgentSettings and backend Settings update EMBEDDER_URL to /v1/embeddings path and introduce EMBEDDER_KEY field with empty-string default for Bearer auth support.
Embedder callers delegating to core module
agent/src/agent/nodes/schema_explorer.py, backend/app/routers/tables.py, backend/app/seed.py
All embedding callers refactor from inline HTTP requests to delegating through core.embeddings.get_embedding, passing embedder_url, embedder_model, and embedder_key. Local exception handling is removed; errors are caught and logged by the core module.
MCP transport: SSE → Streamable HTTP
agent/src/agent/main.py, backend/app/routers/agent.py
Agent server exposes mcp.streamable_http_app() instead of mcp.sse_app(). Backend router replaces sse_client with streamablehttp_client, targets /mcp endpoint, and builds ClientSession from explicit read/write streams.

Evaluation Pipeline: Remote Service Refactor

Layer / File(s) Summary
Evaluation service config, imports, and Pydantic models
backend/app/config.py, backend/app/routers/evaluation.py
Settings gains EVALUATION_SERVICE_URL (default http://localhost:8001). Five new Pydantic models (EvalAPIRequest, EvalAPIQuestionMetrics, EvalAPIQuestionResult, EvalAPIOverallMetrics, EvalAPIResponse) and PRODUCTION_DATASET_NAME constant define the remote evaluation API contract.
Single-table evaluation and Langfuse dataset management
backend/app/routers/evaluation.py, backend/app/services/langfuse_client.py
execute_single_table_eval derives sandbox dataset name from table_id, syncs Langfuse dataset when enabled, calls evaluation service, and persists EvalRun metrics and per-question EvalResult rows from API response. sync_dataset now passes explicit id=question_id on item creation.
Phase A baseline evaluation via remote service
backend/app/routers/evaluation.py
Phase A aggregates production golden questions, syncs Langfuse production dataset, calls evaluation service on PRODUCTION_DATASET_NAME, and persists baseline EvalRun/EvalResult rows from API response.
Phase B candidate and regression evaluation via remote service
backend/app/routers/evaluation.py
Phase B candidate syncs candidate dataset, calls evaluation service, persists results, and clears candidate dataset. Phase B regression re-evaluates production dataset via evaluation service and persists regression metrics/results.

Timezone-Aware UTC Standardization

Layer / File(s) Summary
Core model timestamp defaults
core/src/core/models/models.py
Updates 18 tables' created_at/updated_at/started_at defaults from datetime.utcnow to datetime.now, ensuring all ORM-generated timestamps are timezone-aware.
Router-level timestamp assignments
backend/app/routers/admin_approval.py, backend/app/routers/enrichment.py, backend/app/routers/health.py, backend/app/routers/orchestration.py, backend/app/routers/profiling.py, backend/app/routers/tables.py
All routers update timestamp assignments (table.updated_at, schedule.last_run_at) from datetime.utcnow() to datetime.now().
Service-level timestamp updates
backend/app/services/profiling_engine.py, backend/app/services/scheduler.py
Services update computed_at, last_run_at from datetime.utcnow() to datetime.now().

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • StavPonte11/text2sql-onboarding#13: Both PRs modify the MCP agent HTTP/SSE wiring—retrieved PR adds SSE-based mcp.sse_app() and a backend proxy to the agent's /sse endpoint, while this PR switches the agent app to mcp.streamable_http_app() and updates the backend proxy to use /mcp with streamable HTTP instead.
  • StavPonte11/text2sql-onboarding#20: get_query_embedding in agent/src/agent/nodes/schema_explorer.py was updated in both PRs to refactor embedder HTTP logic, with this PR routing embedding calls through the centralized core.embeddings.get_embedding helper.

🐇 Hopping through the code at night,
I swapped the SSE for streaming light,
The embedder now speaks OpenAI,
And eval runs—no stubs, no lie!
From random scores to real API calls,
This bunny cleaned the messy halls! 🌟

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is too vague and does not clearly convey the scope of changes, which include evaluation service refactoring, timestamp handling updates, embedding configuration changes, and MCP transport updates. Use a more descriptive title that captures the primary changes, such as 'Refactor evaluation pipeline to use remote service and update timestamp handling' or 'Migrate to remote evaluation service with timestamp and embedding updates'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-ts-evaluations

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agent/src/agent/config.py`:
- Around line 12-14: The EMBEDDER_URL field on line 12 is currently typed as a
plain str, which allows unsafe URL schemes like ftp or file to be passed
through. Change the EMBEDDER_URL field type from str to Pydantic's AnyHttpUrl
type to enforce that only HTTP and HTTPS schemes are allowed. This will validate
the URL scheme at configuration initialization time and fail fast if an invalid
scheme is provided.

In `@agent/src/agent/nodes/schema_explorer.py`:
- Around line 52-53: The embedding returned on line 53 from
resp_data["embedding"] is not validated against the 768-dimensional requirement
documented on line 43. Add a validation check after extracting the embedding to
verify its length matches the expected 768 dimensions. If the dimension check
fails, raise an exception that will be caught by the existing exception fallback
mechanism, rather than returning the embedding directly without validation.

In `@backend/app/config.py`:
- Around line 55-57: The EMBEDDER_URL field in the Settings class lacks
validation for URL schemes, which allows unsafe or unsupported schemes like
ftp://, file://, or data:// to pass through and potentially cause security or
functionality issues. Add a Pydantic field validator (using `@field_validator` or
`@validator` decorator) to the EMBEDDER_URL field that parses the URL and
validates that the scheme is either http or https, raising a clear validation
error if any other scheme is provided. This will ensure the validation happens
at application startup and prevents invalid URLs from being used downstream.

In `@backend/app/routers/agent.py`:
- Line 16: Update the import statement from the deprecated streamablehttp_client
to the current streamable_http_client function. In the import statement on line
16 that imports from mcp.client.streamable_http, replace the function name
streamablehttp_client (no underscore) with streamable_http_client (with
underscores). This ensures compatibility with the current version of the mcp SDK
as of June 2026.

In `@backend/app/routers/evaluation.py`:
- Around line 156-165: Add a null check for the `table` variable immediately
after the `session.get(Table, table_id)` call on line 156. Before accessing
`table.name` in the `EvalAPIRequest` instantiation or passing `table` to
`_build_questions_payload`, verify that `table` is not `None`. If `table` is
`None`, raise an appropriate exception (such as a 404 HTTPException) to handle
the case where the table was deleted.
- Line 189: The deprecated datetime.utcnow() method needs to be replaced with
the modern equivalent datetime.now(timezone.utc) which is the recommended
approach since Python 3.12. Update the import statement to include timezone from
the datetime module, then locate all occurrences of datetime.utcnow() throughout
the file (particularly in assignments like run.completed_at and any other
similar datetime assignments) and replace each call with
datetime.now(timezone.utc) to ensure compatibility with current Python
standards.

In `@backend/app/routers/tables.py`:
- Around line 134-145: The embedder HTTP request logic is duplicated across two
locations (around lines 134-145 and lines 253-264), with both blocks handling
header setup with EMBEDDER_KEY authorization, making a POST request to
EMBEDDER_URL with model and input parameters, and parsing the JSON response to
extract embedding data. Extract this logic into a single shared helper function
that accepts the text to embed as a parameter and returns the embedding result,
then replace both duplicated blocks with calls to this helper function to
prevent contract drift when the embedder API changes.
- Around line 143-145: The status code check on the embed_resp object (if
embed_resp.status_code == 200) silently skips embedding extraction for error
responses without logging or raising an exception, making it difficult to detect
API failures. Replace the conditional status code check with a call to
embed_resp.raise_for_status() before accessing the JSON data. This will
explicitly raise an HTTPError for non-2xx status codes (401, 429, 500, etc.)
instead of silently continuing. Apply this same fix at both locations mentioned
(around lines 143-145 and 262-270) by removing the if statement and calling
raise_for_status() on the embed_resp object directly, then proceeding to extract
the embedding data unconditionally.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 8348f866-c88e-48c9-8146-eb467eb1356f

📥 Commits

Reviewing files that changed from the base of the PR and between 5e7aea0 and 59cee5d.

📒 Files selected for processing (10)
  • .gitignore
  • agent/src/agent/config.py
  • agent/src/agent/main.py
  • agent/src/agent/nodes/schema_explorer.py
  • backend/app/config.py
  • backend/app/routers/agent.py
  • backend/app/routers/evaluation.py
  • backend/app/routers/tables.py
  • backend/app/seed.py
  • backend/app/services/langfuse_client.py

Comment thread agent/src/agent/config.py
Comment on lines +12 to +14
EMBEDDER_URL: str = "http://localhost:11434/v1/embeddings"
EMBEDDER_MODEL: str = "nomic-embed-text:latest"
EMBEDDER_KEY: str = ""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n agent/src/agent/config.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 1600


🏁 Script executed:

rg -A 5 -B 5 "EMBEDDER_URL" --type py

Repository: StavPonte11/text2sql-onboarding

Length of output: 4453


🏁 Script executed:

rg "from pydantic" agent/src/agent/config.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 165


Constrain EMBEDDER_URL to HTTP(S) at settings load time.

Line 12 keeps EMBEDDER_URL as an unconstrained str, allowing unsafe URL schemes (ftp, file, etc.) to be passed to embedder callers. The value is directly consumed by urllib.request.Request and httpx.post without scheme validation. Use Pydantic's AnyHttpUrl type to enforce HTTP/HTTPS schemes at configuration initialization and fail fast on misconfiguration.

Suggested hardening
-from pydantic import Field
+from pydantic import AnyHttpUrl, Field

 class AgentSettings(BaseSettings):
     
-    EMBEDDER_URL: str = "http://localhost:11434/v1/embeddings"
+    EMBEDDER_URL: AnyHttpUrl = "http://localhost:11434/v1/embeddings"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
EMBEDDER_URL: str = "http://localhost:11434/v1/embeddings"
EMBEDDER_MODEL: str = "nomic-embed-text:latest"
EMBEDDER_KEY: str = ""
EMBEDDER_URL: AnyHttpUrl = "http://localhost:11434/v1/embeddings"
EMBEDDER_MODEL: str = "nomic-embed-text:latest"
EMBEDDER_KEY: str = ""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/src/agent/config.py` around lines 12 - 14, The EMBEDDER_URL field on
line 12 is currently typed as a plain str, which allows unsafe URL schemes like
ftp or file to be passed through. Change the EMBEDDER_URL field type from str to
Pydantic's AnyHttpUrl type to enforce that only HTTP and HTTPS schemes are
allowed. This will validate the URL scheme at configuration initialization time
and fail fast if an invalid scheme is provided.

Source: Linters/SAST tools

Comment thread agent/src/agent/nodes/schema_explorer.py Outdated
Comment thread backend/app/config.py
Comment on lines +55 to +57
EMBEDDER_URL: str = "http://host.docker.internal:11434/v1/embeddings"
EMBEDDER_MODEL: str = "nomic-embed-text:latest"
EMBEDDER_KEY: str = ""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n backend/app/config.py | head -100

Repository: StavPonte11/text2sql-onboarding

Length of output: 2645


🏁 Script executed:

cat -n backend/app/config.py | tail -20

Repository: StavPonte11/text2sql-onboarding

Length of output: 902


🏁 Script executed:

wc -l backend/app/config.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 100


🏁 Script executed:

rg "EMBEDDER_URL" -A 3 -B 3

Repository: StavPonte11/text2sql-onboarding

Length of output: 3384


🏁 Script executed:

cat -n backend/app/config.py && echo "---" && cat -n agent/src/agent/config.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 4174


🏁 Script executed:

rg "field_validator|Field\(" backend/app/config.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


🏁 Script executed:

rg "field_validator|`@validator`" backend/app/ && echo "---FOUND---" || echo "---NO VALIDATORS FOUND---"

Repository: StavPonte11/text2sql-onboarding

Length of output: 101


🏁 Script executed:

cat -n backend/app/seed.py | grep -A 10 "def get_embedding"

Repository: StavPonte11/text2sql-onboarding

Length of output: 711


🏁 Script executed:

rg "http|https|scheme" backend/app/ -i | grep -i "valid\|check\|assert" | head -20

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


🏁 Script executed:

rg "urllib.*parse|urlparse|HttpUrl|AnyUrl" backend/app/

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


🏁 Script executed:

rg "from pydantic" backend/app/config.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 138


🏁 Script executed:

rg "HttpUrl|AnyUrl|field_validator" --type python

Repository: StavPonte11/text2sql-onboarding

Length of output: 106


🏁 Script executed:

rg "HttpUrl|AnyUrl|field_validator" -t py

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


🏁 Script executed:

git log --oneline --all -S "EMBEDDER_URL" | head -5

Repository: StavPonte11/text2sql-onboarding

Length of output: 202


🏁 Script executed:

cat -n backend/app/routers/tables.py | grep -A 5 -B 5 "EMBEDDER_URL"

Repository: StavPonte11/text2sql-onboarding

Length of output: 1010


🏁 Script executed:

git log --all --oneline | head -20

Repository: StavPonte11/text2sql-onboarding

Length of output: 1031


🏁 Script executed:

git log -p --all -S "field_validator" --reverse | head -50

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


🏁 Script executed:

git show 5e7aea0 --stat

Repository: StavPonte11/text2sql-onboarding

Length of output: 456


🏁 Script executed:

git show 462eb08

Repository: StavPonte11/text2sql-onboarding

Length of output: 958


🏁 Script executed:

rg "http" backend/app/config.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 450


🏁 Script executed:

rg "startswith.*http\|http.*startswith\|scheme" backend/app/ agent/src/

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


🏁 Script executed:

rg "ValueError|raise.*Error" backend/app/config.py agent/src/agent/config.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


🏁 Script executed:

rg "model_validate\|field_validator\|validator" -t py | head -20

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


Add HTTP(S) scheme validation to EMBEDDER_URL in the Settings class.

The EMBEDDER_URL field at line 55 is a plain str with no validation. Invalid or unsafe URL schemes (e.g., ftp://, file://, data://) can be accepted at startup and passed directly to HTTP clients (httpx.post, urllib.request) without safeguards. Add a field validator to reject non-HTTP(S) values and fail fast at application startup.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/config.py` around lines 55 - 57, The EMBEDDER_URL field in the
Settings class lacks validation for URL schemes, which allows unsafe or
unsupported schemes like ftp://, file://, or data:// to pass through and
potentially cause security or functionality issues. Add a Pydantic field
validator (using `@field_validator` or `@validator` decorator) to the EMBEDDER_URL
field that parses the URL and validates that the scheme is either http or https,
raising a clear validation error if any other scheme is provided. This will
ensure the validation happens at application startup and prevents invalid URLs
from being used downstream.

from fastapi import APIRouter, HTTPException
from mcp.client.session import ClientSession
from mcp.client.sse import sse_client
from mcp.client.streamable_http import streamablehttp_client

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the mcp version declared in backend dependencies
fd -i "requirements" --exec cat {} \; 2>/dev/null
fd -i "pyproject.toml" --exec cat {} \; 2>/dev/null

Repository: StavPonte11/text2sql-onboarding

Length of output: 4198


🌐 Web query:

What version of the mcp Python package introduced streamablehttp_client?

💡 Result:

The streamablehttp_client function was an early API provided by the mcp Python SDK for handling StreamableHTTP transport connections [1][2]. It was later superseded by the streamable_http_client function (note the added underscore), which improved control over the httpx.AsyncClient lifecycle [1][3]. The newer streamable_http_client was introduced in version 1.24.0 of the mcp Python SDK [3]. Following this introduction, the original streamablehttp_client was marked as deprecated and eventually removed from the SDK in subsequent releases [4][3].

Citations:


🌐 Web query:

mcp Python SDK streamablehttp_client availability version history release notes

💡 Result:

The function streamablehttp_client (without the underscore) has been deprecated and subsequently removed from the Model Context Protocol (MCP) Python SDK [1][2]. Availability and Version History: - The function was replaced by streamable_http_client (with an underscore) to accommodate changes in how the SDK handles httpx.AsyncClient instances [3]. - streamablehttp_client was marked as deprecated while streamable_http_client was introduced as the recommended replacement [1][2]. - The removal of the deprecated streamablehttp_client was finalized in early 2026 [1]. Current Status: - As of June 2026, streamable_http_client is the standard, active function used for streamable HTTP transport in the SDK [4][5]. - Developers should update their code to import streamable_http_client from mcp.client.streamable_http and pass an configured httpx.AsyncClient instead of using the older function signatures [2][3]. For further implementation details, the official migration documentation provides comprehensive guides on transitioning from the legacy streamablehttp_client to the current streamable_http_client API [2].

Citations:


🏁 Script executed:

# Search for any documentation or changelog that might specify when streamablehttp_client was available
cd backend && git log --oneline --all --grep="streamable" 2>/dev/null | head -20 || echo "Git not available"

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


Use streamable_http_client instead of the deprecated streamablehttp_client.

The function streamablehttp_client was deprecated and removed from the mcp SDK in early 2026. As of June 2026, the correct import is streamable_http_client (with underscore). Update the import to:

from mcp.client.streamable_http import streamable_http_client
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/agent.py` at line 16, Update the import statement from
the deprecated streamablehttp_client to the current streamable_http_client
function. In the import statement on line 16 that imports from
mcp.client.streamable_http, replace the function name streamablehttp_client (no
underscore) with streamable_http_client (with underscores). This ensures
compatibility with the current version of the mcp SDK as of June 2026.

Comment on lines +156 to +165
table = session.get(Table, table_id)
dataset_name = f"text2sql_sandbox_{table_id}"

question_scores_contains: list[float] = [
float(random.choice([0, 1])) for _ in questions
]
question_scores_exact: list[float] = [
float(random.choice([0, 1])) for _ in questions
]
question_scores_ranking: list[float] = [
float(random.choice([0, 1])) for _ in questions
]
logger.info(
f"[Eval] Scored {len(questions)} questions for table {table_id} (local stubs)"
)
if langfuse_client.enabled:
langfuse_client.ensure_dataset_synced(
dataset_name, _build_questions_payload(questions, table)
)

avg_score_contains = round(
sum(question_scores_contains) / len(question_scores_contains), 3
)
pass_count_contains = sum(1 for s in question_scores_contains if s >= 0.50)
pass_rate_contains = round(pass_count_contains / len(question_scores_contains), 3)
avg_score_exact = round(sum(question_scores_exact) / len(question_scores_exact), 3)
pass_count_exact = sum(1 for s in question_scores_exact if s >= 0.50)
pass_rate_exact = round(pass_count_exact / len(question_scores_exact), 3)
avg_score_ranking = round(
sum(question_scores_ranking) / len(question_scores_ranking), 3
)
pass_count_ranking = sum(1 for s in question_scores_ranking if s >= 0.50)
pass_rate_ranking = round(pass_count_ranking / len(question_scores_ranking), 3)
try:
req = EvalAPIRequest(tables_names=[table.name], dataset_name=dataset_name)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing null check for table before accessing its attributes.

session.get(Table, table_id) can return None if the table was deleted. However, table.name (line 165) and table.schema_name via _build_questions_payload (line 161) are accessed without a null check, which would raise AttributeError.

The null check on line 208 (if table and table.status == ...) comes too late.

🛡️ Proposed fix
     table = session.get(Table, table_id)
+    if not table:
+        logger.error(f"[Eval] Table {table_id} not found")
+        run.status = EvalStatus.failed
+        run.score = -1.0
+        session.add(run)
+        session.commit()
+        return -1.0
+
     dataset_name = f"text2sql_sandbox_{table_id}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
table = session.get(Table, table_id)
dataset_name = f"text2sql_sandbox_{table_id}"
question_scores_contains: list[float] = [
float(random.choice([0, 1])) for _ in questions
]
question_scores_exact: list[float] = [
float(random.choice([0, 1])) for _ in questions
]
question_scores_ranking: list[float] = [
float(random.choice([0, 1])) for _ in questions
]
logger.info(
f"[Eval] Scored {len(questions)} questions for table {table_id} (local stubs)"
)
if langfuse_client.enabled:
langfuse_client.ensure_dataset_synced(
dataset_name, _build_questions_payload(questions, table)
)
avg_score_contains = round(
sum(question_scores_contains) / len(question_scores_contains), 3
)
pass_count_contains = sum(1 for s in question_scores_contains if s >= 0.50)
pass_rate_contains = round(pass_count_contains / len(question_scores_contains), 3)
avg_score_exact = round(sum(question_scores_exact) / len(question_scores_exact), 3)
pass_count_exact = sum(1 for s in question_scores_exact if s >= 0.50)
pass_rate_exact = round(pass_count_exact / len(question_scores_exact), 3)
avg_score_ranking = round(
sum(question_scores_ranking) / len(question_scores_ranking), 3
)
pass_count_ranking = sum(1 for s in question_scores_ranking if s >= 0.50)
pass_rate_ranking = round(pass_count_ranking / len(question_scores_ranking), 3)
try:
req = EvalAPIRequest(tables_names=[table.name], dataset_name=dataset_name)
table = session.get(Table, table_id)
if not table:
logger.error(f"[Eval] Table {table_id} not found")
run.status = EvalStatus.failed
run.score = -1.0
session.add(run)
session.commit()
return -1.0
dataset_name = f"text2sql_sandbox_{table_id}"
if langfuse_client.enabled:
langfuse_client.ensure_dataset_synced(
dataset_name, _build_questions_payload(questions, table)
)
try:
req = EvalAPIRequest(tables_names=[table.name], dataset_name=dataset_name)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/evaluation.py` around lines 156 - 165, Add a null check
for the `table` variable immediately after the `session.get(Table, table_id)`
call on line 156. Before accessing `table.name` in the `EvalAPIRequest`
instantiation or passing `table` to `_build_questions_payload`, verify that
`table` is not `None`. If `table` is `None`, raise an appropriate exception
(such as a 404 HTTPException) to handle the case where the table was deleted.

Comment thread backend/app/routers/evaluation.py Outdated
Comment thread backend/app/routers/tables.py Outdated
Comment thread backend/app/routers/tables.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/routers/evaluation.py (1)

443-493: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Consider extracting common evaluation API call and persistence logic.

The pattern of making the evaluation API call, handling errors, populating EvalRun metrics, and persisting EvalResult rows is repeated nearly identically across execute_single_table_eval, _run_production_dataset_eval, _run_candidate_eval, and _run_regression_eval.

Extracting this into a helper function would reduce duplication and make future changes to the API contract easier to maintain:

def _call_eval_service_and_persist(
    session: Session,
    run: EvalRun,
    table_names: list[str],
    dataset_name: str,
    log_prefix: str,
) -> tuple[EvalAPIOverallMetrics | None, float]:
    """Returns (metrics, score) or (None, -1.0) on failure."""
    # ... shared API call + persistence logic
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/evaluation.py` around lines 443 - 493, The evaluation API
call and persistence logic is duplicated across the functions
execute_single_table_eval, _run_production_dataset_eval, _run_candidate_eval,
and _run_regression_eval. Extract this common pattern into a helper function
that accepts the session, EvalRun object, table_names, dataset_name, and a
log_prefix parameter. The helper should encapsulate the entire flow: making the
POST request to the evaluation service, handling exceptions and logging errors,
parsing the EvalAPIResponse, updating the EvalRun metrics (score, pass_rate,
fail_rate, total_questions, status, completed_at, dimension_averages), creating
and persisting EvalResult objects from eval_resp.results, and returning a tuple
of (metrics, score) or (None, -1.0) on failure. Then replace the duplicated
blocks in each function with calls to this helper, passing the appropriate
log_prefix for context-specific logging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/src/core/embeddings.py`:
- Line 23: The response parsing on the line containing json.loads is using the
variable name `data` instead of the string literal "data" when indexing into the
parsed JSON response. Change the index from the variable `data` to the string
key "data" in the json.loads result indexing to correctly extract the embedding
payload from the API response.
- Around line 20-22: Before creating the urllib.request.Request object with the
embedder_url parameter on line 20, add validation to ensure the URL scheme is
restricted to only HTTP or HTTPS. Parse the embedder_url using
urllib.parse.urlparse to extract the scheme, verify it is either 'http' or
'https', and raise an appropriate exception (such as ValueError) if any other
scheme is provided. This validation must occur before the urllib.request.Request
call to prevent potential local resource access through file:// or other custom
URL schemes.

In `@core/src/core/models/models.py`:
- Around line 32-33: The created_at and updated_at fields in the models are
generating timezone-aware UTC datetime objects, but the database migrations are
creating TIMESTAMP (naive) columns instead of TIMESTAMPTZ columns. To fix this,
update the initial migration file (01333a305f59_initial_schema.py) to use
sa.DateTime(timezone=True) instead of sa.DateTime() for all created_at and
updated_at column definitions. If this migration has already been deployed to
production, create a separate ALTER migration that converts existing TIMESTAMP
columns to TIMESTAMP WITH TIME ZONE. Additionally, verify that all subsequent
migrations (such as 781457ead87d_add_httpextractor.py and
d35412e44b34_add_foreignkeymapping.py) also use sa.DateTime(timezone=True) for
any new datetime columns to maintain consistency going forward.

---

Outside diff comments:
In `@backend/app/routers/evaluation.py`:
- Around line 443-493: The evaluation API call and persistence logic is
duplicated across the functions execute_single_table_eval,
_run_production_dataset_eval, _run_candidate_eval, and _run_regression_eval.
Extract this common pattern into a helper function that accepts the session,
EvalRun object, table_names, dataset_name, and a log_prefix parameter. The
helper should encapsulate the entire flow: making the POST request to the
evaluation service, handling exceptions and logging errors, parsing the
EvalAPIResponse, updating the EvalRun metrics (score, pass_rate, fail_rate,
total_questions, status, completed_at, dimension_averages), creating and
persisting EvalResult objects from eval_resp.results, and returning a tuple of
(metrics, score) or (None, -1.0) on failure. Then replace the duplicated blocks
in each function with calls to this helper, passing the appropriate log_prefix
for context-specific logging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: c74fd2a3-466a-480a-9109-7a632773165a

📥 Commits

Reviewing files that changed from the base of the PR and between 59cee5d and 980a76f.

📒 Files selected for processing (13)
  • agent/src/agent/nodes/schema_explorer.py
  • backend/app/routers/admin_approval.py
  • backend/app/routers/enrichment.py
  • backend/app/routers/evaluation.py
  • backend/app/routers/health.py
  • backend/app/routers/orchestration.py
  • backend/app/routers/profiling.py
  • backend/app/routers/tables.py
  • backend/app/seed.py
  • backend/app/services/profiling_engine.py
  • backend/app/services/scheduler.py
  • core/src/core/embeddings.py
  • core/src/core/models/models.py

Comment on lines +20 to +22
req = urllib.request.Request(embedder_url, data=data, headers=headers)
try:
with urllib.request.urlopen(req, timeout=10) as res:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restrict embedder URL schemes to HTTP(S) before opening the request.

Line 20 and Line 22 accept any urllib scheme. Even though this URL is config-driven, allowing file:/custom schemes weakens security guarantees and can cause unexpected local resource access.

Proposed fix
 import json
 import logging
 import urllib.request
+from urllib.parse import urlparse
@@
 def get_embedding(
@@
 ) -> list[float] | None:
+    parsed = urlparse(embedder_url)
+    if parsed.scheme not in {"http", "https"}:
+        raise ValueError(f"Unsupported embedder URL scheme: {parsed.scheme}")
+
     data = json.dumps({"model": embedder_model, "input": text}).encode()
🧰 Tools
🪛 Ruff (0.15.17)

[error] 20-20: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


[error] 22-22: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/core/embeddings.py` around lines 20 - 22, Before creating the
urllib.request.Request object with the embedder_url parameter on line 20, add
validation to ensure the URL scheme is restricted to only HTTP or HTTPS. Parse
the embedder_url using urllib.parse.urlparse to extract the scheme, verify it is
either 'http' or 'https', and raise an appropriate exception (such as
ValueError) if any other scheme is provided. This validation must occur before
the urllib.request.Request call to prevent potential local resource access
through file:// or other custom URL schemes.

Source: Linters/SAST tools

Comment thread core/src/core/embeddings.py Outdated
Comment thread core/src/core/models/models.py Outdated
Comment on lines +32 to +33
created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
updated_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for any remaining datetime.utcnow() usage that wasn't updated
rg -n 'datetime\.utcnow\(\)' --type py

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


🏁 Script executed:

#!/bin/bash
# Check for any timezone-naive datetime comparisons or operations
rg -n 'datetime\.now\(\)' --type py -A2 -B2 | head -50

Repository: StavPonte11/text2sql-onboarding

Length of output: 463


🏁 Script executed:

#!/bin/bash
# Verify SQLModel/Alembic migrations define TIMESTAMPTZ columns
fd -e py alembic migration | xargs rg -l 'TIMESTAMP' | head -5

Repository: StavPonte11/text2sql-onboarding

Length of output: 305


🏁 Script executed:

# Find Alembic migration files
fd -e py . -path '*migration*' -o -path '*alembic*' | grep -E '(migration|alembic)' | head -20

Repository: StavPonte11/text2sql-onboarding

Length of output: 244


🏁 Script executed:

# Get the full context of the datetime.now() usage in extractor.py
cat -n agent/src/agent/nodes/extractor.py | sed -n '55,75p'

Repository: StavPonte11/text2sql-onboarding

Length of output: 896


🏁 Script executed:

# Search for database schema definitions or create table statements
rg -n 'TIMESTAMP|created_at|updated_at' --type py -B2 -A2 | head -100

Repository: StavPonte11/text2sql-onboarding

Length of output: 5592


🏁 Script executed:

# Find migration files using git ls-files
git ls-files | grep -i migrat

Repository: StavPonte11/text2sql-onboarding

Length of output: 209


🏁 Script executed:

# Find any database initialization or schema definition files
git ls-files | grep -E '(alembic|migration|schema|init.*db)' | head -20

Repository: StavPonte11/text2sql-onboarding

Length of output: 969


🏁 Script executed:

# Check for SQLAlchemy create_all or similar schema creation
rg -n 'create_all|CreateTable|TIMESTAMP' --type py -l | head -20

Repository: StavPonte11/text2sql-onboarding

Length of output: 188


🏁 Script executed:

# Check the initial migration to see how timestamp columns are created
cat backend/alembic/versions/01333a305f59_initial_schema.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 21756


🏁 Script executed:

# Check the deployed schema
cat backend/schema.sql | grep -A 5 -B 5 'TIMESTAMP\|created_at\|updated_at' | head -100

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


🏁 Script executed:

# Check other migrations for timestamp-related changes
grep -n 'TIMESTAMP\|created_at\|updated_at' backend/alembic/versions/*.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 3044


Migration columns must use sa.DateTime(timezone=True) to create TIMESTAMPTZ columns.

The database schema created by 01333a305f59_initial_schema.py uses sa.DateTime() which creates PostgreSQL TIMESTAMP (naive) columns. However, the models generate timezone-aware UTC datetime objects with datetime.now(timezone.utc). This creates a critical type mismatch:

  1. Update the initial migration to use sa.DateTime(timezone=True) for all created_at/updated_at columns to create TIMESTAMP WITH TIME ZONE (timestamptz) in PostgreSQL
  2. Create an ALTER migration if the initial migration has already been applied to production databases, converting existing TIMESTAMP columns to TIMESTAMPTZ
  3. Verify all subsequent migrations (e.g., 781457ead87d_add_httpextractor.py, d35412e44b34_add_foreignkeymapping.py) also use the corrected DateTime definition going forward

The naive datetime.now() in agent/src/agent/nodes/extractor.py:63 is only used for display and does not interact with database timestamps, so it poses no compatibility risk.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/core/models/models.py` around lines 32 - 33, The created_at and
updated_at fields in the models are generating timezone-aware UTC datetime
objects, but the database migrations are creating TIMESTAMP (naive) columns
instead of TIMESTAMPTZ columns. To fix this, update the initial migration file
(01333a305f59_initial_schema.py) to use sa.DateTime(timezone=True) instead of
sa.DateTime() for all created_at and updated_at column definitions. If this
migration has already been deployed to production, create a separate ALTER
migration that converts existing TIMESTAMP columns to TIMESTAMP WITH TIME ZONE.
Additionally, verify that all subsequent migrations (such as
781457ead87d_add_httpextractor.py and d35412e44b34_add_foreignkeymapping.py)
also use sa.DateTime(timezone=True) for any new datetime columns to maintain
consistency going forward.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/routers/admin_approval.py`:
- Line 2: In the admin_approval.py file, locate the lines 181 and 219 where
`updated_at` is being assigned with naive datetime values in the approval and
rejection flows. Replace these naive datetime calls (such as datetime.now()
without timezone info) with UTC-aware datetime values (such as
datetime.now(timezone.utc) or datetime.utcnow()) to ensure status transitions
remain comparable across services and environments.

In `@backend/app/routers/evaluation.py`:
- Line 17: The import statement on line 17 is incomplete for timezone-aware UTC
timestamps. Update the import to include timezone from the datetime module,
changing from just importing datetime to also importing timezone. Then, replace
all occurrences of datetime.now() (at lines 189, 303, 386, and 470) with
datetime.now(timezone.utc) to ensure timezone-aware UTC timestamps are used
throughout the file instead of naive local time.

In `@backend/app/routers/orchestration.py`:
- Line 14: Update the import statement at the top of the file to include
timezone from the datetime module (add timezone to the existing datetime
import). Then, locate all instances of datetime.now() calls in the trends window
and today counter logic (specifically around lines 467 and 780) and replace them
with datetime.now(timezone.utc) to ensure all analytics boundaries use UTC-based
time references instead of local-naive datetime values.

In `@backend/app/routers/profiling.py`:
- Line 10: The import statement at the top of the file only imports datetime and
timedelta, but UTC-aware timestamps require the timezone object. Update the
import statement to include timezone from the datetime module, then replace the
datetime.now() calls used for cached_until and updated_at fields (around lines
89-90) with datetime.now(timezone.utc) to ensure all timestamps are UTC-aware
and consistent across different deployment environments.

In `@backend/app/routers/tables.py`:
- Line 2: The import statement on line 2 is missing the timezone module needed
for timezone-aware timestamps. Update the import to include UTC from the
datetime module, then locate the three datetime.now() calls at lines 251, 308,
and 388 where updated_at fields are being persisted to the database, and modify
each of these calls to use datetime.now(UTC) instead of datetime.now() to ensure
timezone-aware UTC timestamps are stored consistently.

In `@backend/app/services/profiling_engine.py`:
- Line 12: The import statement on line 12 needs to include `timezone` from the
datetime module to support creating UTC-aware timestamps. Update the import
statement to add `timezone` alongside `date` and `datetime`. Then locate the
`datetime.now()` call at line 868 (in the context where `computed_at` is being
set) and replace it with `datetime.now(timezone.utc)` to ensure the resulting
timestamp includes timezone information that will be properly serialized when
`.isoformat()` is called at line 995.

In `@backend/app/services/scheduler.py`:
- Line 7: The scheduler is writing naive datetime timestamps for last_run_at
using datetime.now(), which can cause issues in distributed systems. Update the
import statement to include timezone from the datetime module, then locate where
last_run_at is assigned with datetime.now() and change it to
datetime.now(timezone.utc) to ensure UTC-aware timestamps are properly
persisted.

In `@core/src/core/models/models.py`:
- Line 2: Replace all instances of `default_factory=datetime.now` with a
UTC-aware alternative such as `default_factory=datetime.utcnow` or a custom
`utc_now` factory function across all timestamp fields in the models file. The
affected timestamp field definitions are located at the specified line ranges
and use `default_factory=datetime.now` as their default value initialization.
Ensure every timestamp field that currently uses `datetime.now` is updated to
use the UTC-aware factory to maintain consistency with UTC standardization
across all 22 timestamp fields in this file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: fd4038a1-cd42-4dc7-bbf4-b10fc4cb6e8c

📥 Commits

Reviewing files that changed from the base of the PR and between 980a76f and 422b05d.

📒 Files selected for processing (11)
  • backend/app/routers/admin_approval.py
  • backend/app/routers/enrichment.py
  • backend/app/routers/evaluation.py
  • backend/app/routers/health.py
  • backend/app/routers/orchestration.py
  • backend/app/routers/profiling.py
  • backend/app/routers/tables.py
  • backend/app/services/profiling_engine.py
  • backend/app/services/scheduler.py
  • core/src/core/embeddings.py
  • core/src/core/models/models.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/routers/admin_approval.py`:
- Line 2: In the admin_approval.py file, locate the lines 181 and 219 where
`updated_at` is being assigned with naive datetime values in the approval and
rejection flows. Replace these naive datetime calls (such as datetime.now()
without timezone info) with UTC-aware datetime values (such as
datetime.now(timezone.utc) or datetime.utcnow()) to ensure status transitions
remain comparable across services and environments.

In `@backend/app/routers/evaluation.py`:
- Line 17: The import statement on line 17 is incomplete for timezone-aware UTC
timestamps. Update the import to include timezone from the datetime module,
changing from just importing datetime to also importing timezone. Then, replace
all occurrences of datetime.now() (at lines 189, 303, 386, and 470) with
datetime.now(timezone.utc) to ensure timezone-aware UTC timestamps are used
throughout the file instead of naive local time.

In `@backend/app/routers/orchestration.py`:
- Line 14: Update the import statement at the top of the file to include
timezone from the datetime module (add timezone to the existing datetime
import). Then, locate all instances of datetime.now() calls in the trends window
and today counter logic (specifically around lines 467 and 780) and replace them
with datetime.now(timezone.utc) to ensure all analytics boundaries use UTC-based
time references instead of local-naive datetime values.

In `@backend/app/routers/profiling.py`:
- Line 10: The import statement at the top of the file only imports datetime and
timedelta, but UTC-aware timestamps require the timezone object. Update the
import statement to include timezone from the datetime module, then replace the
datetime.now() calls used for cached_until and updated_at fields (around lines
89-90) with datetime.now(timezone.utc) to ensure all timestamps are UTC-aware
and consistent across different deployment environments.

In `@backend/app/routers/tables.py`:
- Line 2: The import statement on line 2 is missing the timezone module needed
for timezone-aware timestamps. Update the import to include UTC from the
datetime module, then locate the three datetime.now() calls at lines 251, 308,
and 388 where updated_at fields are being persisted to the database, and modify
each of these calls to use datetime.now(UTC) instead of datetime.now() to ensure
timezone-aware UTC timestamps are stored consistently.

In `@backend/app/services/profiling_engine.py`:
- Line 12: The import statement on line 12 needs to include `timezone` from the
datetime module to support creating UTC-aware timestamps. Update the import
statement to add `timezone` alongside `date` and `datetime`. Then locate the
`datetime.now()` call at line 868 (in the context where `computed_at` is being
set) and replace it with `datetime.now(timezone.utc)` to ensure the resulting
timestamp includes timezone information that will be properly serialized when
`.isoformat()` is called at line 995.

In `@backend/app/services/scheduler.py`:
- Line 7: The scheduler is writing naive datetime timestamps for last_run_at
using datetime.now(), which can cause issues in distributed systems. Update the
import statement to include timezone from the datetime module, then locate where
last_run_at is assigned with datetime.now() and change it to
datetime.now(timezone.utc) to ensure UTC-aware timestamps are properly
persisted.

In `@core/src/core/models/models.py`:
- Line 2: Replace all instances of `default_factory=datetime.now` with a
UTC-aware alternative such as `default_factory=datetime.utcnow` or a custom
`utc_now` factory function across all timestamp fields in the models file. The
affected timestamp field definitions are located at the specified line ranges
and use `default_factory=datetime.now` as their default value initialization.
Ensure every timestamp field that currently uses `datetime.now` is updated to
use the UTC-aware factory to maintain consistency with UTC standardization
across all 22 timestamp fields in this file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: fd4038a1-cd42-4dc7-bbf4-b10fc4cb6e8c

📥 Commits

Reviewing files that changed from the base of the PR and between 980a76f and 422b05d.

📒 Files selected for processing (11)
  • backend/app/routers/admin_approval.py
  • backend/app/routers/enrichment.py
  • backend/app/routers/evaluation.py
  • backend/app/routers/health.py
  • backend/app/routers/orchestration.py
  • backend/app/routers/profiling.py
  • backend/app/routers/tables.py
  • backend/app/services/profiling_engine.py
  • backend/app/services/scheduler.py
  • core/src/core/embeddings.py
  • core/src/core/models/models.py
🛑 Comments failed to post (8)
backend/app/routers/admin_approval.py (1)

2-2: ⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP 'updated_at\s*=\s*datetime\.now\(\)' backend/app/routers/admin_approval.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 159


🏁 Script executed:

# Check the imports and the exact lines mentioned
head -20 backend/app/routers/admin_approval.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 566


🏁 Script executed:

# View the specific lines where datetime.now() is used
sed -n '175,185p' backend/app/routers/admin_approval.py
sed -n '215,225p' backend/app/routers/admin_approval.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 799


🏁 Script executed:

# Check if timezone is already used elsewhere in this file
rg 'timezone|utc' backend/app/routers/admin_approval.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


🏁 Script executed:

# Check how updated_at is handled elsewhere in the codebase
rg 'updated_at.*=' backend/app/ -A 2 -B 2 | head -60

Repository: StavPonte11/text2sql-onboarding

Length of output: 2748


Preserve UTC-aware updated_at writes in approval/rejection flows.

Lines 181 and 219 use naive datetime values. Replace with UTC-aware timestamps so status transitions remain comparable across services and environments.

Suggested patch
-from datetime import datetime
+from datetime import datetime, timezone
...
-    table.updated_at = datetime.now()
+    table.updated_at = datetime.now(timezone.utc)
...
-    table.updated_at = datetime.now()
+    table.updated_at = datetime.now(timezone.utc)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/admin_approval.py` at line 2, In the admin_approval.py
file, locate the lines 181 and 219 where `updated_at` is being assigned with
naive datetime values in the approval and rejection flows. Replace these naive
datetime calls (such as datetime.now() without timezone info) with UTC-aware
datetime values (such as datetime.now(timezone.utc) or datetime.utcnow()) to
ensure status transitions remain comparable across services and environments.
backend/app/routers/evaluation.py (1)

17-17: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

datetime.now() returns naive local time, not timezone-aware UTC.

The import only includes datetime without timezone. Using datetime.now() produces a naive datetime in the system's local timezone, which contradicts the PR's stated goal of "timezone-aware UTC standardization." This will cause timestamp inconsistencies across servers in different timezones.

🐛 Proposed fix

Update the import:

-from datetime import datetime
+from datetime import datetime, timezone

Then at line 189 (and similarly at lines 303, 386, 470):

-    run.completed_at = datetime.now()
+    run.completed_at = datetime.now(timezone.utc)

Also applies to: 189-189

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/evaluation.py` at line 17, The import statement on line
17 is incomplete for timezone-aware UTC timestamps. Update the import to include
timezone from the datetime module, changing from just importing datetime to also
importing timezone. Then, replace all occurrences of datetime.now() (at lines
189, 303, 386, and 470) with datetime.now(timezone.utc) to ensure timezone-aware
UTC timestamps are used throughout the file instead of naive local time.
backend/app/routers/orchestration.py (1)

14-14: ⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP 'datetime\.now\(\)\s*-\s*timedelta|datetime\.now\(\)\.date\(\)' backend/app/routers/orchestration.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 213


🏁 Script executed:

# Check the imports section at the top of the file
head -30 backend/app/routers/orchestration.py | cat -n

Repository: StavPonte11/text2sql-onboarding

Length of output: 1186


🏁 Script executed:

# Check context around line 467
sed -n '460,475p' backend/app/routers/orchestration.py | cat -n

Repository: StavPonte11/text2sql-onboarding

Length of output: 748


🏁 Script executed:

# Check context around line 780
sed -n '773,787p' backend/app/routers/orchestration.py | cat -n

Repository: StavPonte11/text2sql-onboarding

Length of output: 855


🏁 Script executed:

# Search for all datetime.now() usages in the file (not just the two patterns from the original regex)
rg -n 'datetime\.now\(\)' backend/app/routers/orchestration.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 213


Use UTC reference time for trends window and "today" counters.

The import at line 14 does not include timezone, and both line 467 and line 780 use local-naive datetime.now(). These analytics boundaries should be UTC-based to avoid timezone-dependent drift in returned metrics.

Suggested patch
-from datetime import datetime, timedelta
+from datetime import datetime, timedelta, timezone
...
-    since = datetime.now() - timedelta(days=days)
+    since = datetime.now(timezone.utc) - timedelta(days=days)
...
-            1 for r in all_runs if r.created_at.date() == datetime.now().date()
+            1 for r in all_runs if r.created_at.date() == datetime.now(timezone.utc).date()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/orchestration.py` at line 14, Update the import statement
at the top of the file to include timezone from the datetime module (add
timezone to the existing datetime import). Then, locate all instances of
datetime.now() calls in the trends window and today counter logic (specifically
around lines 467 and 780) and replace them with datetime.now(timezone.utc) to
ensure all analytics boundaries use UTC-based time references instead of
local-naive datetime values.
backend/app/routers/profiling.py (1)

10-10: ⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP 'cached_until\s*=\s*datetime\.now\(\)|updated_at\s*=\s*datetime\.now\(\)' backend/app/routers/profiling.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 193


🏁 Script executed:

cat -n backend/app/routers/profiling.py | head -100 | tail -30

Repository: StavPonte11/text2sql-onboarding

Length of output: 1430


🏁 Script executed:

rg -n 'datetime\.now\(\)' backend/app/routers/profiling.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 193


🏁 Script executed:

rg -rn 'datetime\.now\(\)' backend/app --type py | head -20

Repository: StavPonte11/text2sql-onboarding

Length of output: 1196


🏁 Script executed:

head -15 backend/app/routers/profiling.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 478


🏁 Script executed:

grep -n "class TableProfile" backend/app -r

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


🏁 Script executed:

cat backend/app/models/profiling.py | head -50

Repository: StavPonte11/text2sql-onboarding

Length of output: 139


🏁 Script executed:

find backend -name "*.py" -type f | xargs grep -l "class TableProfile"

Repository: StavPonte11/text2sql-onboarding

Length of output: 59


🏁 Script executed:

grep -n "updated_at\|cached_until" backend/app -r --type py | grep -E "(Column|Table)Profile" | head -10

Repository: StavPonte11/text2sql-onboarding

Length of output: 193


🏁 Script executed:

find backend -name "*.py" -type f | xargs grep -l "class TableProfile"

Repository: StavPonte11/text2sql-onboarding

Length of output: 59


🏁 Script executed:

rg -n "class TableProfile" backend

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


🏁 Script executed:

rg -n "updated_at|cached_until" backend/app -A 2 -B 2

Repository: StavPonte11/text2sql-onboarding

Length of output: 3294


Use UTC-aware timestamps for cached_until and updated_at at lines 89–90.

Replace datetime.now() with datetime.now(timezone.utc) or datetime.utcnow(). Naive timestamps cause cache validity and audit fields to become environment-dependent across deployments.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/profiling.py` at line 10, The import statement at the top
of the file only imports datetime and timedelta, but UTC-aware timestamps
require the timezone object. Update the import statement to include timezone
from the datetime module, then replace the datetime.now() calls used for
cached_until and updated_at fields (around lines 89-90) with
datetime.now(timezone.utc) to ensure all timestamps are UTC-aware and consistent
across different deployment environments.
backend/app/routers/tables.py (1)

2-2: ⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify naive timestamp writes in this file.
rg -nP '\bdatetime\.now\(\)' backend/app/routers/tables.py
rg -nP '\bdatetime\.now\(UTC\)' backend/app/routers/tables.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 208


🏁 Script executed:

head -20 backend/app/routers/tables.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 580


🏁 Script executed:

python3 - <<'PY'
# Check if UTC is importable from datetime
import sys
print(f"Python version: {sys.version_info}")
try:
    from datetime import UTC
    print("UTC is importable from datetime")
except ImportError:
    print("UTC is NOT importable from datetime")
PY

Repository: StavPonte11/text2sql-onboarding

Length of output: 200


🏁 Script executed:

# Check the complete import section and surrounding context for datetime usage
sed -n '1,15p' backend/app/routers/tables.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 435


🏁 Script executed:

# Full search for all datetime.now() calls to ensure nothing was missed
rg -n 'datetime\.now\(' backend/app/routers/tables.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 208


🏁 Script executed:

# View the context around the three identified lines
sed -n '248,253p' backend/app/routers/tables.py
sed -n '305,310p' backend/app/routers/tables.py
sed -n '385,390p' backend/app/routers/tables.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 613


Restore timezone-aware UTC for persisted updated_at fields.

Line 2 imports only datetime without UTC. Lines 251, 308, and 388 persist datetime.now() (naive local time) to database fields. This causes timestamp consistency issues when integrating with timezone-aware systems.

Proposed fix
-from datetime import datetime
+from datetime import UTC, datetime
@@
-    table.updated_at = datetime.now()
+    table.updated_at = datetime.now(UTC)
@@
-    table.updated_at = datetime.now()
+    table.updated_at = datetime.now(UTC)
@@
-        existing.updated_at = datetime.now()
+        existing.updated_at = datetime.now(UTC)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/routers/tables.py` at line 2, The import statement on line 2 is
missing the timezone module needed for timezone-aware timestamps. Update the
import to include UTC from the datetime module, then locate the three
datetime.now() calls at lines 251, 308, and 388 where updated_at fields are
being persisted to the database, and modify each of these calls to use
datetime.now(UTC) instead of datetime.now() to ensure timezone-aware UTC
timestamps are stored consistently.
backend/app/services/profiling_engine.py (1)

12-12: ⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP 'computed_at\s*=\s*datetime\.now\(\)' backend/app/services/profiling_engine.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 112


🏁 Script executed:

#!/bin/bash
# Get context around line 868 to see how computed_at is used
sed -n '860,880p' backend/app/services/profiling_engine.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 748


🏁 Script executed:

#!/bin/bash
# Search for profile_json["computed_at"] to see how it's serialized
rg -n 'profile_json.*computed_at|computed_at.*isoformat' backend/app/services/profiling_engine.py -A 2 -B 2

Repository: StavPonte11/text2sql-onboarding

Length of output: 295


🏁 Script executed:

#!/bin/bash
# Check if datetime.now() is used elsewhere and how UTC-aware alternatives are used
rg -n 'datetime\.now\(\)|datetime\.utcnow\(\)|timezone|UTC' backend/app/services/profiling_engine.py | head -20

Repository: StavPonte11/text2sql-onboarding

Length of output: 112


🏁 Script executed:

#!/bin/bash
# Search for timezone-related patterns in the entire backend
rg -l 'timezone|utc|UTC|datetime\.utcnow|pytz|zoneinfo' backend/ --type py | head -10

Repository: StavPonte11/text2sql-onboarding

Length of output: 175


🏁 Script executed:

#!/bin/bash
# Check what the profile result structure expects for computed_at
rg -n 'class TableProfilingResult|computed_at.*:' backend/ --type py -A 3 | head -30

Repository: StavPonte11/text2sql-onboarding

Length of output: 991


Use UTC-aware timestamp for computed_at to include timezone context in JSON serialization.

Line 868 uses datetime.now(), which creates a naive datetime object. When serialized via .isoformat() at line 995, this produces an ambiguous timestamp string without UTC offset information. Downstream consumers cannot determine the timezone of the timestamp. Use datetime.now(timezone.utc) or datetime.utcnow().replace(tzinfo=timezone.utc) instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/services/profiling_engine.py` at line 12, The import statement on
line 12 needs to include `timezone` from the datetime module to support creating
UTC-aware timestamps. Update the import statement to add `timezone` alongside
`date` and `datetime`. Then locate the `datetime.now()` call at line 868 (in the
context where `computed_at` is being set) and replace it with
`datetime.now(timezone.utc)` to ensure the resulting timestamp includes timezone
information that will be properly serialized when `.isoformat()` is called at
line 995.
backend/app/services/scheduler.py (1)

7-7: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP 'last_run_at\s*=\s*datetime\.now\(\)' backend/app/services/scheduler.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 124


🏁 Script executed:

#!/bin/bash
# Get full context around line 70
sed -n '50,90p' backend/app/services/scheduler.py | cat -n

Repository: StavPonte11/text2sql-onboarding

Length of output: 1826


🏁 Script executed:

#!/bin/bash
# Check if schedule is an ORM model and how last_run_at is defined
rg -n 'class.*Schedule|last_run_at|DateTime|Column' backend/app/services/scheduler.py | head -20

Repository: StavPonte11/text2sql-onboarding

Length of output: 124


🏁 Script executed:

#!/bin/bash
# Search for schedule model definition across the backend
fd -e py | xargs rg -l 'class.*Schedule' | grep -E '(model|models|schema)' | head -10

Repository: StavPonte11/text2sql-onboarding

Length of output: 106


🏁 Script executed:

#!/bin/bash
# Look for all datetime.now() calls in the scheduler file to check for consistency
rg -n 'datetime\.now\(\)' backend/app/services/scheduler.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 124


🏁 Script executed:

#!/bin/bash
# Find and examine the EvaluationSchedule model definition
rg -n 'class EvaluationSchedule|last_run_at' core/src/core/models/models.py | head -30

Repository: StavPonte11/text2sql-onboarding

Length of output: 344


🏁 Script executed:

#!/bin/bash
# Get the full EvaluationSchedule class definition
grep -n 'class EvaluationSchedule' core/src/core/models/models.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 263


🏁 Script executed:

#!/bin/bash
# Read the models.py file to see EvaluationSchedule and last_run_at field
sed -n '1,200p' core/src/core/models/models.py | grep -A 30 'class EvaluationSchedule'

Repository: StavPonte11/text2sql-onboarding

Length of output: 248


🏁 Script executed:

#!/bin/bash
# Check imports in scheduler.py to see what datetime tools are available
head -15 backend/app/services/scheduler.py | cat -n

Repository: StavPonte11/text2sql-onboarding

Length of output: 635


🏁 Script executed:

#!/bin/bash
# Check if timezone module is imported anywhere in the scheduler or models
rg -n 'import.*timezone|from.*timezone|datetime.timezone|pytz' backend/app/services/scheduler.py core/src/core/models/models.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


🏁 Script executed:

#!/bin/bash
# Get more context around the EvaluationSchedule model to see all fields and how datetime is imported there
sed -n '1,220p' core/src/core/models/models.py | tail -50

Repository: StavPonte11/text2sql-onboarding

Length of output: 1640


🏁 Script executed:

#!/bin/bash
# Check imports at the top of models.py to understand datetime handling
head -30 core/src/core/models/models.py

Repository: StavPonte11/text2sql-onboarding

Length of output: 929


🏁 Script executed:

#!/bin/bash
# Search for datetime usage patterns in models.py - are there other datetime fields using UTC?
rg -n 'datetime|UTC|utc|timezone' core/src/core/models/models.py | head -20

Repository: StavPonte11/text2sql-onboarding

Length of output: 981


🏁 Script executed:

#!/bin/bash
# Check if there are other datetime.now() calls anywhere in the codebase that might show the pattern
rg -n 'datetime\.now\(\)' backend/ core/ --include='*.py' | head -15

Repository: StavPonte11/text2sql-onboarding

Length of output: 517


🏁 Script executed:

#!/bin/bash
# Verify Python version and utcnow() deprecation status by checking what's recommended
rg -n 'utcnow|timezone.utc' backend/ core/ --type py

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


🏁 Script executed:

#!/bin/bash
# Check if the schedule model uses any SQLAlchemy DateTime type with timezone parameter
sed -n '197,215p' core/src/core/models/models.py | cat -n

Repository: StavPonte11/text2sql-onboarding

Length of output: 977


🏁 Script executed:

#!/bin/bash
# Verify if SQLModel/SQLAlchemy DateTime type is used anywhere with timezone configuration
rg -n 'DateTime|sa_column.*DateTime' core/src/core/models/models.py backend/

Repository: StavPonte11/text2sql-onboarding

Length of output: 3398


🏁 Script executed:

#!/bin/bash
# Check what timezone library imports exist in the codebase
fd -e py -x grep -l 'from datetime import\|import datetime' core/ backend/ | xargs grep -h 'from datetime\|import datetime\|timezone' | sort -u

Repository: StavPonte11/text2sql-onboarding

Length of output: 4446


🏁 Script executed:

#!/bin/bash
# Check the database timezone configuration
cat backend/alembic/versions/01333a305f59_initial_schema.py | grep -A 10 'evaluation_schedules'

Repository: StavPonte11/text2sql-onboarding

Length of output: 1348


🏁 Script executed:

#!/bin/bash
# Check if there's any database-level timezone configuration
rg -n 'timezone|UTC' backend/alembic/env.py backend/alembic/script.py.mako

Repository: StavPonte11/text2sql-onboarding

Length of output: 57


Use timezone-aware UTC for persisted scheduler timestamps.

Line 70 writes last_run_at using naive datetime.now(). For a distributed system with schedule persistence, this should use UTC-aware timestamps: datetime.now(timezone.utc). This ensures deterministic schedule history and correct cross-node interpretation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/services/scheduler.py` at line 7, The scheduler is writing naive
datetime timestamps for last_run_at using datetime.now(), which can cause issues
in distributed systems. Update the import statement to include timezone from the
datetime module, then locate where last_run_at is assigned with datetime.now()
and change it to datetime.now(timezone.utc) to ensure UTC-aware timestamps are
properly persisted.
core/src/core/models/models.py (1)

2-2: ⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify naive timestamp default factories in this model file
rg -nP 'default_factory\s*=\s*datetime\.now\b' core/src/core/models/models.py

# Verify UTC-aware factories after fix (expected >0 once patched)
rg -nP 'default_factory\s*=\s*(utc_now|lambda:\s*datetime\.now\(timezone\.utc\))' core/src/core/models/models.py

# Inspect declared Python toolchain to keep timezone API choice compatible
fd -a 'pyproject.toml|\.python-version|\.tool-versions|Dockerfile' | \
  xargs -r -I{} sh -c 'echo "--- {} ---"; sed -n "1,120p" "{}"'

Repository: StavPonte11/text2sql-onboarding

Length of output: 8287


Use UTC-aware defaults; datetime.now returns timezone-naive local time, not UTC.

The 22 timestamp fields using default_factory=datetime.now store local-naive datetimes instead of UTC-aware values. This contradicts the stated UTC standardization and causes persisted timestamps to shift by host timezone offset, breaking ordering and filtering across deployments.

Suggested fix pattern
-from datetime import datetime
+from datetime import datetime, timezone
+
+def utc_now() -> datetime:
+    return datetime.now(timezone.utc)

 class Table(SQLModel, table=True):
-    created_at: datetime = Field(default_factory=datetime.now)
-    updated_at: datetime = Field(default_factory=datetime.now)
+    created_at: datetime = Field(default_factory=utc_now)
+    updated_at: datetime = Field(default_factory=utc_now)

Apply the same utc_now factory to all changed timestamp defaults in this file.

Affected lines: 32–33, 66, 112, 158, 167, 208, 257, 296, 396, 459–460, 514, 555, 599, 655, 688–689, 722–723, 761–762.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/core/models/models.py` at line 2, Replace all instances of
`default_factory=datetime.now` with a UTC-aware alternative such as
`default_factory=datetime.utcnow` or a custom `utc_now` factory function across
all timestamp fields in the models file. The affected timestamp field
definitions are located at the specified line ranges and use
`default_factory=datetime.now` as their default value initialization. Ensure
every timestamp field that currently uses `datetime.now` is updated to use the
UTC-aware factory to maintain consistency with UTC standardization across all 22
timestamp fields in this file.

@yuvalkh yuvalkh merged commit 34ad829 into main Jun 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant