Skip to content

[SAP] Implement graceful shutdown for cinder services#314

Open
hemna wants to merge 2 commits into
stable/2023.1-m3from
graceful-shutdown
Open

[SAP] Implement graceful shutdown for cinder services#314
hemna wants to merge 2 commits into
stable/2023.1-m3from
graceful-shutdown

Conversation

@hemna
Copy link
Copy Markdown

@hemna hemna commented Feb 20, 2026

Graceful Shutdown for Cinder Volume Services

Implements three-phase graceful shutdown that allows in-flight volume operations to complete before the pod exits during Kubernetes rolling updates.

How It Works

Phase 1 — Stop new messages (without killing in-flight handlers):

  • Sends Basic.Cancel directly for each AMQP consumer tag via conn.consumer_cancel()
  • Does NOT call conn.stop_consuming() (which causes _runner busy-loop starvation)
  • _runner greenthread stays blocked in drain_events() at 0% CPU

Phase 2 — Wait for in-flight operations:

  • GreenPool.waitall() blocks until all RPC handler greenthreads finish
  • Worker entry heartbeat keeps entries fresh (prevents new pod cleanup interference)
  • Heartbeats continue (service stays "up" in DB)

Phase 3 — Clean exit:

  • Skip rpcserver.stop()/rpcserver.wait() (hangs on dead AMQP socket)
  • Process exits cleanly after stop() returns

Additional Mechanisms

  • Worker entry heartbeat (cinder/objects/cleanable.py): set_workers decorator spawns a greenthread that touches worker DB entries every 10s during operations. Prevents new pod's init_host_do_cleanup from resetting in-flight volumes to 'error'.
  • do_cleanup freshness check (cinder/manager.py): Skips worker entries updated within service_down_time (60s). Only cleans up truly stale/crashed entries.
  • reject_if_draining decorator: Rejects new RPC calls during shutdown so scheduler routes to healthy backends.
  • Semaphore guard: Prevents concurrent stop() calls on same Service instance.

Requirements (separate changes)

  • dumb-init --single-child on cinder-volume container command — ensures ProcessLauncher parent waits for all children before exit
  • terminationGracePeriodSeconds: 900 on pod spec
  • oslo.messaging PR (Send basic.cancel to broker in stop_consuming() oslo.messaging#4): Adds basic.cancel to stop_consuming(). Not strictly required by our code path (we call consumer_cancel() directly) but provides a safety net if other code paths invoke stop_consuming() during cleanup.

Test Results

See sap-doc/graceful-shutdown-test-results.md for full details.

Test Operation Result
Idle shutdown Clean exit ✅ <1s
Volume create from image 16GB, kill pod mid-download ✅ (41s to 8min drains)
Backup to Swift Kill backup pod mid-stream
Scheduler rerouting New work during drain
Snapshot create Kill pod mid-snapshot
Snapshot delete Kill pod mid-delete
Volume clone Kill pod mid-clone
Backup (kill volume pod) Multi-service coordination

Files Changed

File Purpose
cinder/service.py Three-phase shutdown, semaphore guard, heartbeat continuation
cinder/manager.py do_cleanup freshness check for worker entries
cinder/objects/cleanable.py Worker heartbeat greenthread in set_workers decorator
cinder/volume/manager.py Direct flow execution (removed tpool.execute)
tox.ini Exclude sap-tools/sap-doc from flake8
doc/source/admin/graceful-shutdown-race-condition.rst Race condition documentation
sap-doc/graceful-shutdown-test-results.md Sanitized test results

No oslo.messaging source changes required

All changes are self-contained in cinder. We access oslo.messaging internal attributes defensively (getattr with fallbacks) but don't modify oslo.messaging source.

@hemna hemna force-pushed the graceful-shutdown branch 3 times, most recently from 5a72074 to 1f69e00 Compare February 23, 2026 14:24
@hemna hemna force-pushed the graceful-shutdown branch 3 times, most recently from 4c183ec to d331087 Compare April 30, 2026 12:37
@hemna hemna force-pushed the graceful-shutdown branch 2 times, most recently from 1dd055a to d2fddd7 Compare May 14, 2026 22:25
@hemna hemna changed the title [SAP] Try graceful shutdown [SAP] Implement graceful shutdown for cinder services May 14, 2026
@hemna hemna force-pushed the graceful-shutdown branch 3 times, most recently from ca845fc to a235a58 Compare May 14, 2026 22:40
Three-phase graceful shutdown that allows in-flight volume operations
(create, delete, clone, snapshot, backup) to complete before the pod
exits during Kubernetes rolling updates.

Phase 1: Send Basic.Cancel to RabbitMQ consumers (no new messages)
         without disrupting the _runner greenthread or setting
         _consume_loop_stopped (avoids busy-loop CPU starvation).

Phase 2: Block in pool.waitall() until all in-flight RPC handler
         greenthreads in the GreenPool complete their operations.

Phase 3: Skip rpcserver.stop()/wait() (hangs on dead AMQP socket).
         Process exits cleanly after stop() returns.

Additional mechanisms:
- Worker entry heartbeat in set_workers decorator: touches worker DB
  entries every 10s during operations, preventing new pod's init_host
  _do_cleanup from resetting in-flight volumes to 'error'.
- do_cleanup freshness check: skips worker entries updated within
  service_down_time (60s), only cleans up truly stale/crashed entries.
- Semaphore guard: prevents concurrent stop() calls on same Service.
- Heartbeat continues during drain: service stays 'up' in DB.
- reject_if_draining decorator: rejects new RPC calls during shutdown
  so scheduler routes to healthy backends.

Requires:
- dumb-init --single-child (Helm chart change in separate commit)
- terminationGracePeriodSeconds: 900 on pod spec

Tested operations surviving pod termination:
- Volume create from image (41s to 8min drains)
- Volume clone, snapshot create, snapshot delete
- Backup (kill backup pod during stream)
- Backup (kill volume pod during snapshot prep)
- Scheduler rerouting during drain
- Idle shutdown (clean exit <1s)

Change-Id: Icdd28affc73fd34491b656a68410dce8e46264d4
@hemna hemna force-pushed the graceful-shutdown branch from a235a58 to b77438d Compare May 14, 2026 22:41
Scsabiii
Scsabiii previously approved these changes May 15, 2026
Move eventlet imports after stdlib imports (inspect, os, random, etc.)
to comply with flake8-import-order (import-order-style = pep8).

Change-Id: Icdd28affc73fd34491b656a68410dce8e46264d4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants