Conversation
There was a problem hiding this comment.
Review Summary
I reviewed the changes in the context of the full session lifecycle: BaseQuerySession, sync/async QuerySession, sync/async QuerySessionPool, the retry mechanism (retry_operation_impl, check_retriable_error), and the SimpleQuerySessionCheckout context manager.
Core idea is sound: the PR introduces an _invalidated flag to distinguish server-side session invalidation (retryable via BadSession) from explicit client-side deletion (non-retryable RuntimeError). This correctly addresses the issue where _check_session_ready_to_use() raised a non-retryable RuntimeError for invalidated sessions, causing retry_operation_sync (and the async counterpart) to fail immediately instead of retrying with a fresh session.
The retry path works correctly:
- Session invalidated (e.g. attach stream error in background thread) →
_invalidated = True,_closed = True - Next use hits
_check_session_ready_to_use()→ raisesBadSession BadSessionis in_errors_retriable_slow_backoff_typesandskip_yield_error_types→ retried immediately without sleepcheckout.__exit__releases the invalidated session back to pool- On retry,
pool.acquire()gets the dead session, checksis_active→ False, creates a new one
No critical issues found in the code. See inline comments below for non-critical observations.
🌋 SLO Test ResultsStatus: 🟢 2 workloads tested • All passed
Generated by ydb-slo-action |
There was a problem hiding this comment.
Pull request overview
This PR adjusts query session lifecycle handling so that sessions invalidated by background errors surface as retriable issues.BadSession (instead of non-retriable RuntimeError), improving resiliency of QuerySessionPool.retry_operation_sync (and corresponding async codepaths) in concurrent workloads (Issue #814).
Changes:
- Add an
_invalidatedflag toBaseQuerySessionand raiseissues.BadSessionwhen an invalidated session is used. - Update sync/async
QuerySession.delete()to close/cancel the attach stream explicitly (instead of calling_invalidate()). - Add a sync regression test ensuring
retry_operation_syncreacquires when the first acquired session is invalidated.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
ydb/query/session.py |
Introduces _invalidated state and changes readiness checks; adjusts sync delete behavior. |
ydb/aio/query/session.py |
Mirrors delete behavior change for async sessions. |
ydb/query/pool_test.py |
Adds regression test for retry reacquire on invalidated session. |
ydb/query/pool.py |
Minor formatting change in release() docstring area. |
ydb/aio/query/pool.py |
Minor formatting change in release() docstring area. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c30ca38 to
cf366ca
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Review of PR #815: Fix session invalidation scheme
I reviewed all changed files in the context of surrounding code, the retry logic, pool behavior, and the referenced issue #814.
Summary of changes
The PR introduces a separate _invalidated flag on BaseQuerySession to distinguish sessions that were invalidated due to errors (server-side issues, connection problems, BadSession from the server) from sessions that were intentionally closed via delete(). The key effect: _check_session_ready_to_use() now raises issues.BadSession for invalidated sessions instead of RuntimeError. Since BadSession is in the retriable error list (_errors.py), the retry logic in retry_operation_sync/retry_operation_async correctly catches it and retries with a fresh session — fixing the crashes reported in #814.
The rename from _invalidate() to _close_session(invalidate=bool) consolidates the close logic into a single method. All call sites across sync, async, session, transaction, and base modules have been updated consistently.
Detailed observations
1. _close_session early-return guard and invalidation ordering (ydb/query/session.py:132-137)
def _close_session(self, invalidate: bool = False) -> None:
if self._closed:
return
if invalidate:
self._invalidated = True
self._closed = TrueIf a session is first closed normally (_close_session()) and then a background error tries to invalidate it (_close_session(invalidate=True)), the early return prevents _invalidated from being set. In practice this is benign — a closed session is already discarded by the pool's acquire() check on is_active, so _check_session_ready_to_use is unreachable for that session. Just noting it as a subtle edge case for future maintainers.
2. Removal of session._closed = True from wrapper_delete_session (ydb/query/session.py:56-59)
This is safe. Both QuerySession.delete() (sync) and the async counterpart call _close_session() unconditionally after _delete_call(), including in the exception path (the try/except around _delete_call is followed by _close_session() outside the try block). So _closed is always set after a delete, regardless of whether the RPC succeeds.
3. Error message for invalidated sessions (ydb/query/session.py:128)
raise issues.BadSession(f"Session is not active, session_id: {self._session_id}, closed: {self._closed}")Minor observation: for invalidated sessions, closed will always be True here (since _close_session sets both flags). Consider including invalidated: {self._invalidated} in the message to help distinguish "invalidated by error" from other BadSession scenarios during debugging. Not blocking.
4. Test coverage
Both sync and async tests verify the core scenario — an invalidated session being retried. The tests correctly mock pool.acquire/pool.release to isolate the retry behavior from pool internals. The async test uses QuerySession.__new__ to bypass the constructor (which needs a running event loop) — this is reasonable for unit testing.
5. Sync/async parity
All changes are consistently applied to both sync (ydb/query/) and async (ydb/aio/query/) paths. The sync QueryTxContext.__exit__ does not have the except BaseException handler that the async __aexit__ has, but this pre-dates this PR and is not introduced by these changes.
No critical issues found in the code changes.
cf366ca to
974f9c2
Compare
Pull request type
Please check the type of change your PR introduces:
What is the current behavior?
Fixes: #814
What is the new behavior?
Other information