Skip to content

feat: implement API concurrency fix strategy by converting async rout…#655

Merged
jirhiker merged 1 commit into
stagingfrom
jir-preformance
Apr 15, 2026
Merged

feat: implement API concurrency fix strategy by converting async rout…#655
jirhiker merged 1 commit into
stagingfrom
jir-preformance

Conversation

@jirhiker

Copy link
Copy Markdown
Member

…e handlers to sync and enhancing error handling

Why

This PR addresses the following problem / context:

  • Use bullet points here

How

Implementation summary - the following was changed / added / removed:

  • Use bullet points here

Notes

Any special considerations, workarounds, or follow-up work to note?

  • Use bullet points here

…e handlers to sync and enhancing error handling
Copilot AI review requested due to automatic review settings April 15, 2026 21:34
@jirhiker jirhiker merged commit b240b21 into staging Apr 15, 2026
11 checks passed
@jirhiker jirhiker deleted the jir-preformance branch April 15, 2026 21:37

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to mitigate a FastAPI concurrency issue caused by running synchronous SQLAlchemy ORM work inside async def route handlers, by converting DB-bound routes to synchronous def handlers and hardening some database error handling. It also adjusts deployment workflows to run with more Gunicorn workers and adds an ADR documenting the strategy.

Changes:

  • Convert many API endpoints from async def to def so synchronous DB work runs off the event loop.
  • Broaden DB exception handling (e.g., handle IntegrityError in addition to ProgrammingError) and make error-message extraction driver-tolerant.
  • Add an ADR describing the concurrency strategy and add/adjust CD workflows (including worker-count changes).

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/test_nma_chemistry_lineage.py Updates test comments around NOT NULL violation error surfaces.
tests/integration/test_nma_legacy_relationships.py Updates a legacy test fixture value and adjusts driver-agnostic error comment.
api/thing.py Converts routes to sync def; expands DB error handler and caught exception types.
api/sensor.py Converts CRUD-style sensor endpoints to sync def.
api/search.py Converts search endpoint to sync def.
api/sample.py Converts sample endpoints to sync def; expands DB error handling to include IntegrityError.
api/publication.py Converts publication create endpoint to sync def.
api/observation.py Converts most observation endpoints to sync def (bulk upload remains async).
api/ngwmn.py Converts NGWMN XML endpoints to sync def.
api/location.py Converts location endpoints to sync def.
api/lexicon.py Converts lexicon endpoints to sync def (many already disabled/deprecated).
api/group.py Converts group endpoints to sync def.
api/geospatial.py Converts geospatial endpoints to sync def.
api/geochronology.py Converts geochronology endpoints to sync def.
api/contact.py Converts contact endpoints to sync def; expands DB error handling to include IntegrityError.
api/author.py Converts author publications endpoint to sync def.
ADR2.md Adds ADR documenting the concurrency issue and two-phase remediation plan.
.github/workflows/CD_testing.yml Adds a new CD workflow for testing deploys.
.github/workflows/CD_staging.yml Increases Gunicorn worker count in staging deploy entrypoint.
.github/workflows/CD_production.yml Increases Gunicorn worker count in production deploy entrypoint.
Comments suppressed due to low confidence (3)

api/thing.py:335

  • This module defines get_thing_id_links multiple times (same Python function name for different routes). FastAPI uses the function name to derive operationId; duplicate names can lead to OpenAPI collisions/overwrites and client generation issues. Rename these handlers to unique, descriptive names (e.g., list vs get-by-id vs list-by-thing).
@router.get(
    "/id-link",
    summary="Get all thing links",
)
def get_thing_id_links(
    user: viewer_dependency,
    session: session_dependency,
    filter_: str = Query(alias="filter", default=None),
    sort: str = None,
    order: str = None,
) -> CustomPage[ThingIdLinkResponse]:
    """
    Retrieve all thing links, optionally filtered and sorted.
    """
    sql = select(ThingIdLink)
    sql = order_sort_filter(sql, ThingIdLink, sort=sort, order=order, filter_=filter_)

    return paginate(query=sql, conn=session)

api/thing.py:347

  • This route handler reuses the same Python name get_thing_id_links as another endpoint above. Even though the decorator registers routes at definition time, duplicate function names can produce duplicate operationIds in the generated OpenAPI schema. Use a unique function name for this specific route (e.g., get_thing_id_link_by_id).
@public_route
@router.get("/id-link/{link_id}", summary="Get thing links by link ID")
def get_thing_id_links(
    user: viewer_dependency,
    link_id: int,
    session: session_dependency,
) -> ThingIdLinkResponse:
    """
    Retrieve all links for a specific thing by its ID.
    """
    return simple_get_by_id(session, ThingIdLink, link_id)

api/thing.py:405

  • This endpoint is also named get_thing_id_links, duplicating other handlers in the same module. Duplicate handler names can cause OpenAPI operationId conflicts. Rename to a unique name that reflects the behavior (e.g., list_thing_id_links_for_thing).
@router.get("/{thing_id}/id-link", summary="Get thing links by thing ID")
def get_thing_id_links(
    user: viewer_dependency,
    thing_id: int,
    session: session_dependency,
) -> CustomPage[ThingIdLinkResponse]:
    """
    Retrieve all links for a specific thing by its ID.
    """
    thing = simple_get_by_id(session, Thing, thing_id)
    sql = select(ThingIdLink).where(ThingIdLink.thing_id == thing.id)
    return paginate(query=sql, conn=session)

Comment on lines 89 to 92
export MAX_INSTANCES="10"
export SERVICE_NAME="ocotillo-api-staging"
export ENTRYPOINT="gunicorn -w 1 -k uvicorn.workers.UvicornWorker main:app"
export ENTRYPOINT="gunicorn -w 4 -k uvicorn.workers.UvicornWorker main:app"
export MIN_INSTANCES="0"

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumping Gunicorn workers from 1 to 4 increases process count and typically multiplies DB connections/pool usage (each worker has its own SQLAlchemy engine/pool) and memory. Consider making the worker count configurable via an env var (with a safe default) and verify Cloud SQL connection limits/pool sizing are compatible with the increased concurrency.

Copilot uses AI. Check for mistakes.
Comment on lines 89 to 92
export MAX_INSTANCES="10"
export SERVICE_NAME="ocotillo-api"
export ENTRYPOINT="gunicorn -w 1 -k uvicorn.workers.UvicornWorker main:app"
export ENTRYPOINT="gunicorn -w 4 -k uvicorn.workers.UvicornWorker main:app"
export MIN_INSTANCES="0"

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumping Gunicorn workers from 1 to 4 increases process count and can significantly raise DB connection usage (each worker typically maintains its own pool) and memory footprint. Consider making the worker count configurable and confirm Cloud SQL connection limits/pool settings are aligned with this change before rolling out to production.

Copilot uses AI. Check for mistakes.
Comment thread api/thing.py
Comment on lines +133 to +139
else:
detail = {
"loc": ["body"],
"msg": error_message,
"type": "value_error",
"input": {},
}

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

database_error_handler falls back to returning the raw DB/driver error string to clients (the else branch). For IntegrityError this often includes SQL fragments and bound parameters, which can leak implementation details and potentially sensitive data. Consider returning a generic message (and logging the full exception server-side) or mapping common constraint types to safe, user-facing messages instead of echoing error_message verbatim.

Copilot uses AI. Check for mistakes.
Comment thread api/sample.py
Comment on lines +81 to +87
else:
detail = {
"loc": ["body"],
"msg": error_message,
"type": "value_error",
"input": {},
}

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

database_error_handler falls back to returning the raw DB/driver error string to clients. For IntegrityError, the string frequently includes SQL and parameters; consider returning a generic message (and logging the full exception) rather than exposing internal DB error text in the HTTP response.

Copilot uses AI. Check for mistakes.
Comment thread api/contact.py
Comment on lines +109 to +115
else:
detail = {
"loc": ["body"],
"msg": error_message,
"type": "value_error",
"input": {},
}

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

database_error_handler falls back to returning the raw DB/driver error string to clients. This may leak SQL details/parameters for IntegrityError and is generally unsafe. Prefer a generic message and log the original exception, or map known constraint names to safe, user-facing validation errors.

Copilot uses AI. Check for mistakes.
Comment thread api/geochronology.py
Comment on lines 38 to 41
@router.get("/age", tags=["geochronology"])
async def get_geochronology_age(
def get_geochronology_age(
user: viewer_dependency, session: session_dependency, method: str = "arar"
):

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside this handler the query result is returned via session.scalar(sql).all(), but Session.scalar() returns a single scalar value (not a result collection), so .all() will fail at runtime. Use session.scalars(sql).all() (or session.execute(sql).scalars().all()) if you intend to return all rows.

Copilot uses AI. Check for mistakes.
Comment thread ADR2.md
Comment on lines +6 to +8

The API uses synchronous SQLAlchemy sessions backed by `psycopg`. When those sessions are consumed from `async def` route handlers, blocking database work runs on the event loop thread if the handlers call synchronous ORM helpers directly. The lowest-risk immediate fix is to convert database-bound route handlers that do not perform asynchronous work into plain `def`. The longer-term fix is to introduce a real async SQLAlchemy stack and migrate the affected handlers and helpers incrementally.

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ADR2 states the API uses synchronous SQLAlchemy sessions backed by psycopg, but db/engine.py constructs postgresql+pg8000:// URLs for both Cloud SQL and local Postgres. To avoid misleading maintainers, update the ADR to reflect the actual driver(s) in use (or explain when/where psycopg is used).

Copilot uses AI. Check for mistakes.
Comment on lines +154 to +156
git tag -a "testing-deploy-$(date -u +%Y-%m-%d)T$(date -u +%H-%M-%S%z)" -m "testing gcloud deployment: $
(date
-u +%Y-%m-%d)T$(date -u +%H:%M:%S%z)"

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tag message construction in this step appears syntactically broken: the -m string contains a literal $ followed by a newline, and the date command is split across lines. This will likely cause the shell to error and prevent tags from being created/pushed. Build the message on a single line (or use a heredoc/variable) so the git tag -m argument is valid.

Suggested change
git tag -a "testing-deploy-$(date -u +%Y-%m-%d)T$(date -u +%H-%M-%S%z)" -m "testing gcloud deployment: $
(date
-u +%Y-%m-%d)T$(date -u +%H:%M:%S%z)"
git tag -a "testing-deploy-$(date -u +%Y-%m-%d)T$(date -u +%H-%M-%S%z)" -m "testing gcloud deployment: $(date -u +%Y-%m-%d)T$(date -u +%H:%M:%S%z)"

Copilot uses AI. Check for mistakes.
git config --global user.name "${{ github.actor }}"
git config --global user.email "${{ github.actor }}@users.noreply.github.com"

# ":" are not alloed in git tags, so replace with "-"

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: "alloed" -> "allowed" in this comment.

Suggested change
# ":" are not alloed in git tags, so replace with "-"
# ":" are not allowed in git tags, so replace with "-"

Copilot uses AI. Check for mistakes.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7c053eb64e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +40 to +44
- name: Run Alembic migrations on staging database
env:
DB_DRIVER: "cloudsql"
CLOUD_SQL_INSTANCE_NAME: "${{ secrets.CLOUD_SQL_INSTANCE_NAME }}"
CLOUD_SQL_DATABASE: "${{ vars.CLOUD_SQL_DATABASE }}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Isolate testing deploy from staging schema migrations

This workflow runs on every jir* branch push but still executes uv run alembic upgrade head with staging credentials (environment: staging and CLOUD_SQL_* values), so an unmerged feature branch can apply schema changes to the shared staging database. In branches that contain migrations, this can break staging for other users before code review/merge; this job should target an isolated testing database/environment or skip migrations.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants