Motivation
The current Cursor / AsyncCursor implementation buffers the full
result of a query in memory before fetchone / fetchmany / fetchall
can be called. For large result sets (analytical queries, full-table
scans, ETL-style jobs) this is either infeasible or prohibitively
memory-hungry.
The YDB Query API exposes result sets as a stream
(SyncResponseContextIterator / AsyncResponseContextIterator) —
result sets arrive incrementally over the wire. A DB-API cursor that
consumes that stream lazily would let users process arbitrarily large
results with bounded memory.
Downstream use case: SQLAlchemy
SQLAlchemy has first-class support for server-side cursors via:
Connection.execution_options(stream_results=True)
Query.yield_per(N) / select(...).execution_options(yield_per=N)
For these to work, the DB-API driver must expose a cursor that fetches
from the server incrementally rather than materialising everything up
front. Without a streaming cursor on our side, SQLAlchemy users can't
use yield_per / stream_results against YDB and have to either page
manually or blow up memory.
Equivalents in other drivers:
psycopg2 — named (server-side) cursors: conn.cursor(name="...")
psycopg (v3) — conn.cursor(name="...") / ClientCursor vs
ServerCursor
asyncpg — cursor objects returned from conn.cursor(query) inside
a transaction
Proposed API
Expose a streaming variant through an extra kwarg on Connection.cursor:
with connection.cursor(stream_results=True) as cur:
cur.execute("SELECT ... FROM huge_table")
for row in iter(cur.fetchone, None):
...
And for async:
async with async_connection.cursor(stream_results=True) as cur:
await cur.execute("SELECT ... FROM huge_table")
while (row := await cur.fetchone()) is not None:
...
New public classes: StreamCursor, AsyncStreamCursor, exported from
ydb_dbapi.
Scope
Motivation
The current
Cursor/AsyncCursorimplementation buffers the fullresult of a query in memory before
fetchone/fetchmany/fetchallcan be called. For large result sets (analytical queries, full-table
scans, ETL-style jobs) this is either infeasible or prohibitively
memory-hungry.
The YDB Query API exposes result sets as a stream
(
SyncResponseContextIterator/AsyncResponseContextIterator) —result sets arrive incrementally over the wire. A DB-API cursor that
consumes that stream lazily would let users process arbitrarily large
results with bounded memory.
Downstream use case: SQLAlchemy
SQLAlchemy has first-class support for server-side cursors via:
Connection.execution_options(stream_results=True)Query.yield_per(N)/select(...).execution_options(yield_per=N)For these to work, the DB-API driver must expose a cursor that fetches
from the server incrementally rather than materialising everything up
front. Without a streaming cursor on our side, SQLAlchemy users can't
use
yield_per/stream_resultsagainst YDB and have to either pagemanually or blow up memory.
Equivalents in other drivers:
psycopg2— named (server-side) cursors:conn.cursor(name="...")psycopg(v3) —conn.cursor(name="...")/ClientCursorvsServerCursorasyncpg— cursor objects returned fromconn.cursor(query)insidea transaction
Proposed API
Expose a streaming variant through an extra kwarg on
Connection.cursor:And for async:
New public classes:
StreamCursor,AsyncStreamCursor, exported fromydb_dbapi.Scope
StreamCursor(sync) consumingSyncResponseContextIteratorAsyncStreamCursorconsumingAsyncResponseContextIteratora session from the pool for the duration of the stream and
releases on finish / close / error
with exclusivity — while a stream is active no other cursor on
the same connection may execute (would corrupt the tx session);
commit/rollbackshould reject while a stream is runningrowcountsemantics for streaming (likely-1until drained)close()must cleanly terminate a mid-flight stream (cancel +discard session, or drain) in both sync and async paths
.github/docker/docker-compose.yml)stream_results/yield_per— likely a follow-up inydb-sqlalchemy, but worthmentioning here