Core: Fix background thread leak in ScanTaskIterable#16768
Core: Fix background thread leak in ScanTaskIterable#16768sejal-gupta-ksolves wants to merge 1 commit into
Conversation
|
At present, In StarRocks FE deployment, when a query is cancelled or times out during the DeployScanRanges / scan-planning phase, close() can be invoked? |
07f4e38 to
be101a2
Compare
|
@chenwyi2 Yes, During query planning and range deployment, StarRocks processes the file splits via Iceberg's Because this patch unifies the cleanup paths by having the iterator directly delegate to |
Fixes an issue where background PlanTaskWorker threads remain indefinitely blocked in offerWithTimeout() when a query is cancelled or abandoned early because the outer ScanTaskIterable.close() method was a no-op.
be101a2 to
70f17fc
Compare
Closes: #16758
Problem
When downstream query engines (such as StarRocks, Trino, or Spark) cancel or abort a REST table scan early due to client disconnects, timeouts, or query limits, they trigger the cleanup sequence on the outer execution container.
In Apache Iceberg,
ScanTaskIterable.close()was implemented as an empty no-op method. Because this outerclose()call failed to cascade the shutdown signal to the underlying data structures:shutdownstate atomic flag remainedfalse.PlanTaskWorkerthreads continued running indefinitely.taskQueuereached its1000item capacity limit, all active worker threads became permanently deadlocked insideofferWithTimeout(), leading to thread pool exhaustion on the engine coordinator side.Solution
ScanTaskIterable.close()utilizingshutdown.compareAndSet(false, true).taskQueue,planTasks, andinitialFileScanTaskslists upon termination. This allows background threads stuck in anofferwait cycle to instantly unblock, evaluate the flipped shutdown state, and exit gracefully.ScanTasksIterator.close()block to eliminate redundant code duplication, rewriting it to delegate its cleanup tasks straight up toScanTaskIterable.this.close(). This ensures unified thread termination safety across all potential entry points.TestScanTaskIterableLeakunder theorg.apache.iceberg.resttest package, proving that active planning thread allocations successfully scale back down to0upon premature termination.Verification Testing