Skip to content

GH-20314: [C++] Add GCS connection pool size option#49810

Open
azhu248 wants to merge 1 commit intoapache:mainfrom
azhu248:feature/gcs-connection-pool-size
Open

GH-20314: [C++] Add GCS connection pool size option#49810
azhu248 wants to merge 1 commit intoapache:mainfrom
azhu248:feature/gcs-connection-pool-size

Conversation

@azhu248
Copy link
Copy Markdown

@azhu248 azhu248 commented Apr 20, 2026

This maps the Google Cloud Client ConnectionPoolSizeOption directly to Arrow's IO thread pool capacity via the io_context, increasing parallel read throughput for cloud blob systems. It also includes a test covering the fallback Thread Pool capacity mapping.

Closes #20314

Rationale for this change

Multithreaded read performance can be artificially bottlenecked by Google Cloud Client Library's default ConnectionPoolSize. Instead of exposing an entirely new option solely for this, we link it intrinsically to the Arrow I/O Thread Pool capacity.

What changes are included in this PR?

  • Extended the initialization path to pass io_context down to internal::AsGoogleCloudOptions().
  • Dynamically assigned gcs::ConnectionPoolSizeOption from io_context.executor()->GetCapacity() or fell back safely to ::arrow::io::GetIOThreadPoolCapacity().
  • Guaranteed a minimum connection pool size of 4 utilizing std::max. This prevents accidentally penalizing single-threaded users (e.g. users with capacity set to 1)

Are these changes tested?

Yes. I added the unit test OptionsConnectionPoolSizeFallback to gcsfs_test.cc that validates:

  • The fallback logic defaults correctly to the system's global IO thread pool.
  • Modifying the thread pool via arrow::io::SetIOThreadPoolCapacity(...) updates the corresponding generated Google Cloud Option dynamically and perfectly.

Are there any user-facing changes?

No breaking APIs.

This maps the Google Cloud Client ConnectionPoolSizeOption directly to Arrow's IO thread pool capacity via the io_context, increasing parallel read throughput for cloud blob systems. It also includes a test covering the fallback Thread Pool capacity mapping.

Closes apache#20314
@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #20314 has been automatically assigned in GitHub to PR creator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] Add GCS connection pool size option

1 participant