|
| 1 | +# ADR2: API Concurrency Fix Strategy |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +This document describes a verified FastAPI concurrency issue in the API stack and recommends a two-phase remediation plan for maintainers. |
| 6 | + |
| 7 | +The API uses synchronous SQLAlchemy sessions backed by `psycopg`. When those sessions are consumed from `async def` route handlers, blocking database work runs on the event loop thread if the handlers call synchronous ORM helpers directly. The lowest-risk immediate fix is to convert database-bound route handlers that do not perform asynchronous work into plain `def`. The longer-term fix is to introduce a real async SQLAlchemy stack and migrate the affected handlers and helpers incrementally. |
| 8 | + |
| 9 | +## Problem |
| 10 | + |
| 11 | +FastAPI supports synchronous generator dependencies such as `get_db_session()`. The issue is not the dependency shape itself. The issue is that the injected object is a synchronous SQLAlchemy `Session`, and any `async def` route that consumes it while executing synchronous ORM queries directly will block the event loop thread. |
| 12 | + |
| 13 | +In this configuration, FastAPI runs the `async def` route body on the event loop thread. If that body performs blocking database I/O through the synchronous session, the worker cannot make progress on other requests assigned to that event loop until the database call returns. A slow well query can therefore delay unrelated lightweight requests handled by the same worker. |
| 14 | + |
| 15 | +This is a concurrency problem, not a correctness problem. The endpoints can still return correct data while reducing throughput and responsiveness under load. |
| 16 | + |
| 17 | +## Evidence In This Repo |
| 18 | + |
| 19 | +- [`db/engine.py`](db/engine.py) creates `database_sessionmaker = sessionmaker(engine, expire_on_commit=False)` and `get_db_session()` yields a regular synchronous `Session`. |
| 20 | +- [`db/engine.py`](db/engine.py) builds synchronous `postgresql+psycopg` engines for both the default PostgreSQL path and the Cloud SQL path, confirming that the active database layer is synchronous. |
| 21 | +- [`core/dependencies.py`](core/dependencies.py) injects that session through `session_dependency`. |
| 22 | +- [`services/well_details_helper.py`](services/well_details_helper.py) performs synchronous ORM operations such as `session.scalars(...).all()` and related query chains. |
| 23 | +- [`api/thing.py`](api/thing.py) contains representative database-backed routes that pass the synchronous session into helper functions such as `get_db_things(...)` and `get_well_details_payload(...)`. |
| 24 | +- [`api/asset.py`](api/asset.py) shows a contrasting safe pattern for non-database blocking work by wrapping synchronous GCS calls in `run_in_threadpool(...)`. |
| 25 | +- The short-term fix described in this ADR converts database-bound routes from `async def` to `def` where they do not need `await`, but the helper/query layer remains synchronous until a real async session stack is introduced. |
| 26 | + |
| 27 | +## Short-Term Fix |
| 28 | + |
| 29 | +The short-term fix is to convert database-bound route handlers from `async def` to `def` when they do not actually perform asynchronous work. |
| 30 | + |
| 31 | +This lets FastAPI offload the entire route function to a worker thread instead of running its synchronous database calls on the event loop thread. It does not require changing the current database engine, dependency, query helpers, or response schemas. |
| 32 | + |
| 33 | +### Short-term implementation guidance |
| 34 | + |
| 35 | +- Convert any route handler that: |
| 36 | + - receives `session: session_dependency`, |
| 37 | + - performs synchronous ORM work directly or through helpers, and |
| 38 | + - does not require `await` for other operations in the route body. |
| 39 | +- Prioritize the highest-value endpoints first: |
| 40 | + - high-traffic list and detail endpoints, |
| 41 | + - endpoints known to run expensive joins or eager-loads, |
| 42 | + - endpoints that affect warmup or perceived application responsiveness. |
| 43 | +- Keep route behavior unchanged: |
| 44 | + - do not change paths, status codes, payloads, or auth dependencies as part of this phase. |
| 45 | +- Avoid mixed patterns: |
| 46 | + - do not leave a route as `async def` if it still calls synchronous SQLAlchemy code directly. |
| 47 | +- Use `run_in_threadpool(...)` only when a route must remain `async def` for a separate reason, such as mixing in another async operation, and only for isolated blocking helpers rather than as a blanket wrapper for all DB access. |
| 48 | + |
| 49 | +### Expected impact |
| 50 | + |
| 51 | +- Lower risk than a full async migration. |
| 52 | +- No intended HTTP contract changes. |
| 53 | +- Better worker responsiveness because blocking DB work moves off the event loop thread. |
| 54 | + |
| 55 | +## Long-Term Fix |
| 56 | + |
| 57 | +The long-term fix is to add a real async database stack and migrate selected API areas to it incrementally. |
| 58 | + |
| 59 | +This phase should introduce an explicit async path rather than trying to reuse the current synchronous dependency. Importing async SQLAlchemy primitives is not enough; the repo needs a working async engine, async sessionmaker, async dependency, and async query/helper layer for migrated endpoints. |
| 60 | + |
| 61 | +### Long-term target architecture |
| 62 | + |
| 63 | +- Add an `AsyncEngine` configured for the intended async driver. |
| 64 | +- Add an `async_sessionmaker` that yields `AsyncSession` instances. |
| 65 | +- Add a dedicated async dependency such as `get_async_db_session()` rather than overloading `get_db_session()`. |
| 66 | +- Update migrated handlers and helper functions to use async database access: |
| 67 | + - `await session.execute(...)` |
| 68 | + - `await session.scalars(...)` |
| 69 | + - other `AsyncSession`-compatible patterns as needed |
| 70 | + |
| 71 | +### Long-term migration guidance |
| 72 | + |
| 73 | +- Migrate by subsystem, not all at once. |
| 74 | +- Start with a bounded route/helper cluster where the query patterns are understood. |
| 75 | +- Keep sync and async paths separate during migration to avoid ambiguous dependencies and accidental sync calls from async routes. |
| 76 | +- Treat helper-layer migration as part of the work. Converting route signatures alone is insufficient if the helper functions still expect synchronous sessions. |
| 77 | + |
| 78 | +### Non-goals and cautions |
| 79 | + |
| 80 | +- Do not claim the repo already has a working async DB session path unless one is actually implemented and used. |
| 81 | +- Do not treat “switch everything to async” as a trivial refactor. |
| 82 | +- Do not mix `AsyncSession` route code with synchronous helper/query internals. |
| 83 | + |
| 84 | +## Recommended Path |
| 85 | + |
| 86 | +The recommended order is: |
| 87 | + |
| 88 | +1. Convert database-bound `async def` routes that do not use `await` into plain `def`. |
| 89 | +2. Validate behavior and measure the effect on responsiveness. |
| 90 | +3. Introduce a dedicated async DB stack. |
| 91 | +4. Migrate selected route/helper subsystems incrementally to `AsyncSession`. |
| 92 | + |
| 93 | +This sequence delivers immediate concurrency improvement with limited risk, while preserving a clear path to a full async architecture later. |
| 94 | + |
| 95 | +## Acceptance Criteria |
| 96 | + |
| 97 | +### Short-term acceptance criteria |
| 98 | + |
| 99 | +- Targeted API tests continue to pass after `async def` to `def` conversions. |
| 100 | +- HTTP behavior is unchanged: |
| 101 | + - same routes, |
| 102 | + - same auth requirements, |
| 103 | + - same status codes, |
| 104 | + - same payload shapes. |
| 105 | +- Concurrency smoke checks or request-timing instrumentation show that DB-heavy requests no longer block the event loop thread for that worker in the same way they do today. |
| 106 | + |
| 107 | +### Long-term acceptance criteria |
| 108 | + |
| 109 | +- Migrated endpoints pass the existing API test coverage for their subsystem. |
| 110 | +- The async session lifecycle is correct for successful and failing requests. |
| 111 | +- Migrated `async def` routes do not call synchronous session helpers. |
| 112 | +- Before/after measurements are captured for latency and concurrency so the migration can be evaluated against real behavior rather than assumptions. |
| 113 | + |
| 114 | +## Defaults And Assumptions |
| 115 | + |
| 116 | +- This document is written for maintainers and assumes familiarity with FastAPI and SQLAlchemy internals. |
| 117 | +- The document is self-contained and does not require code changes to be useful. |
| 118 | +- The recommended short-term action is intentionally conservative and does not prescribe a file-by-file rollout sequence. |
| 119 | +- The recommended long-term action is a staged migration, not a flag-day rewrite. |
0 commit comments