Skip to content

Fix session invalidation scheme#815

Merged
vgvoleg merged 2 commits intomainfrom
fix_session_invalidate
Apr 24, 2026
Merged

Fix session invalidation scheme#815
vgvoleg merged 2 commits intomainfrom
fix_session_invalidate

Conversation

@vgvoleg
Copy link
Copy Markdown
Collaborator

@vgvoleg vgvoleg commented Apr 24, 2026

Pull request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

What is the current behavior?

Fixes: #814

What is the new behavior?

Other information

Copy link
Copy Markdown

@robot-vibe-db robot-vibe-db Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

I reviewed the changes in the context of the full session lifecycle: BaseQuerySession, sync/async QuerySession, sync/async QuerySessionPool, the retry mechanism (retry_operation_impl, check_retriable_error), and the SimpleQuerySessionCheckout context manager.

Core idea is sound: the PR introduces an _invalidated flag to distinguish server-side session invalidation (retryable via BadSession) from explicit client-side deletion (non-retryable RuntimeError). This correctly addresses the issue where _check_session_ready_to_use() raised a non-retryable RuntimeError for invalidated sessions, causing retry_operation_sync (and the async counterpart) to fail immediately instead of retrying with a fresh session.

The retry path works correctly:

  1. Session invalidated (e.g. attach stream error in background thread) → _invalidated = True, _closed = True
  2. Next use hits _check_session_ready_to_use() → raises BadSession
  3. BadSession is in _errors_retriable_slow_backoff_types and skip_yield_error_types → retried immediately without sleep
  4. checkout.__exit__ releases the invalidated session back to pool
  5. On retry, pool.acquire() gets the dead session, checks is_active → False, creates a new one

No critical issues found in the code. See inline comments below for non-critical observations.

Comment thread ydb/query/session.py
Comment thread ydb/query/session.py
Comment thread ydb/aio/query/session.py
Comment thread ydb/query/pool_test.py
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

🌋 SLO Test Results

Status: 🟢 2 workloads tested • All passed

Workload Metrics Regressions Improvements Links
🟢 ydb-python-sync-table 8 0 0 ReportCheck
🚀 ydb-python-sync-query 8 0 1 ReportCheck

Generated by ydb-slo-action

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts query session lifecycle handling so that sessions invalidated by background errors surface as retriable issues.BadSession (instead of non-retriable RuntimeError), improving resiliency of QuerySessionPool.retry_operation_sync (and corresponding async codepaths) in concurrent workloads (Issue #814).

Changes:

  • Add an _invalidated flag to BaseQuerySession and raise issues.BadSession when an invalidated session is used.
  • Update sync/async QuerySession.delete() to close/cancel the attach stream explicitly (instead of calling _invalidate()).
  • Add a sync regression test ensuring retry_operation_sync reacquires when the first acquired session is invalidated.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
ydb/query/session.py Introduces _invalidated state and changes readiness checks; adjusts sync delete behavior.
ydb/aio/query/session.py Mirrors delete behavior change for async sessions.
ydb/query/pool_test.py Adds regression test for retry reacquire on invalidated session.
ydb/query/pool.py Minor formatting change in release() docstring area.
ydb/aio/query/pool.py Minor formatting change in release() docstring area.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ydb/query/session.py Outdated
Comment thread ydb/aio/query/session.py Outdated
Comment thread ydb/query/session.py
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@robot-vibe-db robot-vibe-db Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of PR #815: Fix session invalidation scheme

I reviewed all changed files in the context of surrounding code, the retry logic, pool behavior, and the referenced issue #814.

Summary of changes

The PR introduces a separate _invalidated flag on BaseQuerySession to distinguish sessions that were invalidated due to errors (server-side issues, connection problems, BadSession from the server) from sessions that were intentionally closed via delete(). The key effect: _check_session_ready_to_use() now raises issues.BadSession for invalidated sessions instead of RuntimeError. Since BadSession is in the retriable error list (_errors.py), the retry logic in retry_operation_sync/retry_operation_async correctly catches it and retries with a fresh session — fixing the crashes reported in #814.

The rename from _invalidate() to _close_session(invalidate=bool) consolidates the close logic into a single method. All call sites across sync, async, session, transaction, and base modules have been updated consistently.

Detailed observations

1. _close_session early-return guard and invalidation ordering (ydb/query/session.py:132-137)

def _close_session(self, invalidate: bool = False) -> None:
    if self._closed:
        return
    if invalidate:
        self._invalidated = True
    self._closed = True

If a session is first closed normally (_close_session()) and then a background error tries to invalidate it (_close_session(invalidate=True)), the early return prevents _invalidated from being set. In practice this is benign — a closed session is already discarded by the pool's acquire() check on is_active, so _check_session_ready_to_use is unreachable for that session. Just noting it as a subtle edge case for future maintainers.

2. Removal of session._closed = True from wrapper_delete_session (ydb/query/session.py:56-59)

This is safe. Both QuerySession.delete() (sync) and the async counterpart call _close_session() unconditionally after _delete_call(), including in the exception path (the try/except around _delete_call is followed by _close_session() outside the try block). So _closed is always set after a delete, regardless of whether the RPC succeeds.

3. Error message for invalidated sessions (ydb/query/session.py:128)

raise issues.BadSession(f"Session is not active, session_id: {self._session_id}, closed: {self._closed}")

Minor observation: for invalidated sessions, closed will always be True here (since _close_session sets both flags). Consider including invalidated: {self._invalidated} in the message to help distinguish "invalidated by error" from other BadSession scenarios during debugging. Not blocking.

4. Test coverage

Both sync and async tests verify the core scenario — an invalidated session being retried. The tests correctly mock pool.acquire/pool.release to isolate the retry behavior from pool internals. The async test uses QuerySession.__new__ to bypass the constructor (which needs a running event loop) — this is reasonable for unit testing.

5. Sync/async parity

All changes are consistently applied to both sync (ydb/query/) and async (ydb/aio/query/) paths. The sync QueryTxContext.__exit__ does not have the except BaseException handler that the async __aexit__ has, but this pre-dates this PR and is not introduced by these changes.

No critical issues found in the code changes.

@vgvoleg vgvoleg force-pushed the fix_session_invalidate branch from cf366ca to 974f9c2 Compare April 24, 2026 20:52
@vgvoleg vgvoleg merged commit ee12f95 into main Apr 24, 2026
34 checks passed
@vgvoleg vgvoleg deleted the fix_session_invalidate branch April 24, 2026 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Session is not active, session_id:

2 participants