From 1daa996f934fdeb939a7617ad1f0060dfe15db60 Mon Sep 17 00:00:00 2001
From: Lawrence Qiu <lawrenceqiu@google.com>
Date: Thu, 23 Apr 2026 15:55:43 -0400
Subject: [PATCH 1/3] docs: Add a guide on configuring ChannelPools for gRPC

---
 docs/grpc_channel_pool_guide.md | 240 ++++++++++++++++++++++++++++++++
 1 file changed, 240 insertions(+)
 create mode 100644 docs/grpc_channel_pool_guide.md

diff --git a/docs/grpc_channel_pool_guide.md b/docs/grpc_channel_pool_guide.md
new file mode 100644
index 000000000000..e25731cfaa31
--- /dev/null
+++ b/docs/grpc_channel_pool_guide.md
@@ -0,0 +1,240 @@
+# Configuring gRPC Channel Pools in Java Client Libraries
+
+## Overview
+
+This guide provides general best practices for configuring and tuning gRPC channel pools in Google Cloud Java client libraries. By understanding how the Java SDK manages gRPC connections, you can optimize your application for throughput and latency, and prevent common pitfalls like client-side request queuing, cold-start connection delays, and resource thrashing.
+
+> [!NOTE]
+> The default channel pool configuration is designed to work well for the majority of workloads. You do not need to change these settings unless you are experiencing performance issues or have specific throughput requirements. If you do run into issues, this guide will help you tune the channel pool for your workload.
+
+> [!CAUTION]
+> ChannelPool and ChannelPoolSettings are marked with @BetaApi and may be subject to change while the API stabilizes. While the API may seem stable, we do not promise any compatibility guarantees.
+
+---
+
+## Background
+
+When handling heavy traffic, a single gRPC connection often becomes a performance bottleneck. To overcome this, the Google Cloud Java client libraries use **Channel Pooling** via the Gax-Java (Google API Extensions for Java) library.
+
+GAX is the internal transport layer shared by all Google Cloud Java client libraries. It handles connection management, retries, and configuration, including channel pools.
+
+Channel pooling spreads the outbound RPC load across multiple identical gRPC connections, ensuring higher throughput and better resilience.
+
+---
+
+## How Channel Pools Work Under the Hood
+
+Understanding the internals helps you choose the right settings before diving into configuration.
+
+1. **Multiplexing with Round-Robin**: GAX distributes RPC traffic across pooled channels using a round-robin strategy. Each incoming request is assigned to the next channel in sequence.
+    - **The Hotspot Gotcha**: Strict round-robin does not differentiate between light and heavy payloads. If one channel is assigned multiple heavy operations in succession, you might encounter queuing delays on that channel while other channels remain idle.
+
+2. **Dynamic Sizing**: GAX can dynamically scale the number of open connections based on the number of concurrent in-flight requests (outstanding RPCs). It evaluates load every **1 minute** and scales in controlled increments limited by `maxResizeDelta` (default 2).
+
+   > [!IMPORTANT]
+   > **Dynamic sizing is opt-in.** With the default settings (`minRpcsPerChannel=0`, `maxRpcsPerChannel=Integer.MAX_VALUE`), the pool is treated as **statically sized** and will never resize automatically. To enable dynamic sizing, you must configure meaningful threshold values for `minRpcsPerChannel` and/or `maxRpcsPerChannel`. See [Configuration](#configuration) below.
+
+3. **Preemptive Refresh** *(disabled by default)*: When enabled, connections are asynchronously replaced every **50 minutes** to prevent forceful termination by Google Front End (GFE), the Google infrastructure layer that manages external connections and forcibly closes long-lived ones after roughly an hour. Enable preemptive refresh explicitly with `.setPreemptiveRefreshEnabled(true)`.
+
+4. **Idle Connection Pruning**: If a connection remains completely idle, the GFE layer gracefully drops it to save server resources. This is why oversizing the pool can cause tail latency: idle channels get dropped, and re-establishing them takes time when traffic resumes. (Source: [Bigtable Documentation](https://docs.cloud.google.com/bigtable/docs/connection-pools))
+
+5. **Dampening**: The `maxResizeDelta` setting rate-limits how many channels can be added or removed per resize cycle, protecting both client and server against aggressive connection thrashing.
+
+---
+
+## Configuration
+
+Settings are configured via `setChannelPoolSettings()` on the `InstantiatingGrpcChannelProvider.Builder` within your service client builder.
+
+All settings described below are properties of the [`ChannelPoolSettings`](https://docs.cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.grpc.ChannelPoolSettings) class.
+
+### Quick Start Example
+
+```java
+// Step 1: Get the default gRPC transport provider builder for your service.
+//         Replace YourServiceSettings with the actual settings class, e.g. DatastoreSettings.
+InstantiatingGrpcChannelProvider.Builder transportChannelProviderBuilder =
+    YourServiceSettings.defaultGrpcTransportProviderBuilder();
+
+// Step 2: Configure the channel pool settings.
+//         These are example values. Tune based on your workload using the Sizing section below.
+transportChannelProviderBuilder.setChannelPoolSettings(
+    ChannelPoolSettings.builder()
+        .setInitialChannelCount(2)
+        .setMinChannelCount(2)
+        .setMaxChannelCount(10)
+        .setMinRpcsPerChannel(1)   // shrink if concurrent load drops below 1 RPC/channel
+        .setMaxRpcsPerChannel(50)  // grow if concurrent load exceeds 50 RPCs/channel
+        .setPreemptiveRefreshEnabled(true) // replace channels every 50 min before GFE drops them
+        .build()
+);
+
+// Step 3: Build the service settings with the custom transport provider.
+YourServiceSettings settings = YourServiceSettings.newBuilder()
+    .setTransportChannelProvider(transportChannelProviderBuilder.build())
+    .build();
+
+// Step 4: Create the client using the configured settings.
+YourServiceClient client = YourServiceClient.create(settings);
+```
+
+### Key Pool Configuration Parameters
+
+| Configuration Property | Default | Description | Recommended Use |
+| :--- | :--- | :--- | :--- |
+| [`initialChannelCount`](https://docs.cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.grpc.ChannelPoolSettings#getInitialChannelCount()) | 1 | Number of channels created upon client instantiation. | Set higher if your app experiences high throughput bursts at startup. |
+| [`minChannelCount`](https://docs.cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.grpc.ChannelPoolSettings#getMinChannelCount()) | 1 | The absolute floor for active channels in the pool. The pool will never shrink below this number, even during idle periods. | Prevents all connections from being dropped during quiet periods, which protects against latency spikes when traffic resumes. |
+| [`maxChannelCount`](https://docs.cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.grpc.ChannelPoolSettings#getMaxChannelCount()) | 200 | The absolute ceiling for active channels in the pool. | Set high enough to handle peak throughput. |
+| `maxResizeDelta` | 2 | Maximum channels added or removed per resize cycle. Acts as a rate limiter to prevent sudden resource surges. | Keep at or below 25. Higher values allow faster scaling but risk cold-start latency penalties from establishing many connections at once. |
+| [`minRpcsPerChannel`](https://docs.cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.grpc.ChannelPoolSettings#getMinRpcsPerChannel()) | 0 | The pool shrinks when in-flight request load implies fewer channels are needed. **Setting to 0 (default) disables shrinking and dynamic sizing entirely.** | Set to `1` or higher to enable dynamic shrinking. Match to your target floor utilization per channel (see Sizing section). |
+| [`maxRpcsPerChannel`](https://docs.cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.grpc.ChannelPoolSettings#getMaxRpcsPerChannel()) | `Integer.MAX_VALUE` | The pool grows when in-flight request load implies more channels are needed. **Setting to `Integer.MAX_VALUE` (default) disables growing and dynamic sizing entirely.** | Set to `50` or lower to enable dynamic growth. Match to your target ceiling utilization per channel (see Sizing section). |
+| [`preemptiveRefreshEnabled`](https://docs.cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.grpc.ChannelPoolSettings#isPreemptiveRefreshEnabled()) | `false` | When `true`, all channels are replaced on a 50-minute cycle before GFE forcibly disconnects them. | Enable for long-running clients to reduce tail latency from forced reconnects. |
+
+> [!IMPORTANT]
+> **The pool is statically sized by default.** Dynamic resizing only activates when at least one of `minRpcsPerChannel` or `maxRpcsPerChannel` is set to a non-default value. If both remain at their defaults (0 and `Integer.MAX_VALUE`), the pool holds `initialChannelCount` channels for the lifetime of the client and never resizes.
+
+---
+
+## Sizing Your Channel Pool
+
+Having **too few** connections causes in-flight requests to queue on the client side, which shows up as high latency. Having **too many** causes idle connections to be dropped by GFE, inducing latency spikes as connections are re-established when traffic returns.
+
+### Why Do We Account for Concurrency?
+
+gRPC sends multiple requests over the same connection simultaneously using HTTP/2 streams. Each request holds its stream open for as long as the request takes to complete (its latency). This means:
+
+- **Serial throughput of one stream**: If each request takes 20ms, a single stream can complete 1,000ms / 20ms = **50 requests per second**.
+- **Concurrent streams needed**: If your application must serve 5,000 requests per second, one stream handling 50 req/s is not enough. You need enough simultaneous streams to collectively handle 5,000 req/s. That requires 5,000 / 50 = **100 streams open at the same time**.
+
+If you size your pool without accounting for this concurrency demand, you risk **client-side queuing** (too few channels, too few streams) or **idle connection dropouts** (more channels than your traffic needs).
+
+### Calculating Optimal Channel Pool Bounds
+
+A single gRPC channel supports up to 100 concurrent streams. To leave safe headroom and prevent queuing, the [Datastore Java client documentation](https://docs.cloud.google.com/datastore/docs/java-client-grpc#connection_pool_configuration) recommends keeping each channel between **10 and 50 concurrent streams**.
+
+These thresholds map directly to `minRpcsPerChannel` and `maxRpcsPerChannel`:
+- Set `maxRpcsPerChannel` to the **ceiling** (e.g. 50): the pool grows when any channel would exceed this.
+- Set `minRpcsPerChannel` to the **floor** (e.g. 10): the pool shrinks when channels drop below this.
+
+Use the following formulas to find how many channels you need for a given workload:
+
+- **Minimum channels needed** = Concurrent streams required / `maxRpcsPerChannel`
+- **Maximum channels needed** = Concurrent streams required / `minRpcsPerChannel`
+
+> [!NOTE]
+> These utilization bounds are recommended starting points. Evaluate your actual production traffic and latency profiles when tuning these thresholds. If your calculated minimum channel count is less than 2, use at least **2 channels** to maintain redundancy.
+
+> [!TIP]
+> **Worked example**: An application handles 5,000 peak QPS with an average latency of 20ms.
+>
+> 1. Serial throughput per stream: 1,000ms / 20ms = **50 requests/sec per stream**
+> 2. Concurrent streams required: 5,000 QPS / 50 req/s = **100 concurrent streams**
+> 3. Minimum channels: 100 / 50 = **2 channels** (each channel at peak: 50 RPCs/channel)
+> 4. Maximum channels: 100 / 10 = **10 channels** (each channel at minimum: 10 RPCs/channel)
+>
+> Configure your pool with `setMinChannelCount(2)`, `setMaxChannelCount(10)`, `setMaxRpcsPerChannel(50)`, and `setMinRpcsPerChannel(10)`.
+
+---
+
+## Troubleshooting
+
+### Enabling Resize Logging
+
+By default, resize events are logged at `FINE` level and are **not visible** in standard application output. To observe pool resizing behavior, enable detailed logging for the channel pool logger (`com.google.api.gax.grpc.ChannelPool`):
+
+```java
+// Add this during application startup, before creating any clients.
+import java.util.logging.Level;
+import java.util.logging.Logger;
+
+Logger.getLogger("com.google.api.gax.grpc.ChannelPool").setLevel(Level.FINEST);
+```
+
+Or configure it via a `logging.properties` file:
+
+```properties
+com.google.api.gax.grpc.ChannelPool.level=FINEST
+```
+
+Once enabled, you will see log entries like:
+
+```
+Detected throughput peak of 87, expanding channel pool size: 4 -> 6.
+Detected throughput drop to 3, shrinking channel pool size: 6 -> 4.
+```
+
+See the [Google Cloud Java troubleshooting guide](https://docs.cloud.google.com/java/docs/troubleshooting) for more details on enabling logging.
+
+**At default log levels, the only visible resize signal** is a `WARNING` emitted after 5 consecutive resize cycles:
+
+```
+Channel pool is repeatedly resizing. Consider adjusting `initialChannelCount` or
+`maxResizeDelta` to a more reasonable value.
+```
+
+If you see this warning in production without having enabled detailed logging, your pool is continuously scaling up or down, which is a sign that it is not sized for your actual baseline traffic.
+
+---
+
+### Common Latency Scenarios and Fixes
+
+#### 1. Gradual Startup Latency Peaks
+- **Symptom**: After enabling detailed logging, you see the pool expand in successive 1-minute cycles (e.g., `4 -> 6`, then `6 -> 8`, then `8 -> 10`). Alternatively, the consecutive-resize warning appears in your logs.
+- **Probable Cause**: The pool is under-provisioned at startup. Each scale-up step requires new connections to complete TCP/TLS handshakes before serving traffic, adding latency.
+- **Possible Fixes**:
+    - Increase `initialChannelCount` to match the stable pool size you observe in logs after ramp-up completes.
+    - Increase `maxResizeDelta` to handle spikes in requests
+
+#### 2. Performance Ceiling at High Scale
+- **Symptom**: Resize logs stop appearing (pool has stabilized at its ceiling) while latency continues to spike.
+- **Probable Cause**: The pool has reached its `maxChannelCount` ceiling, and new requests are queuing behind the existing channels.
+- **Possible Fixes**:
+    - Increase `maxChannelCount` using the sizing formulas above
+    - Re-evaluate your `maxRpcsPerChannel` threshold
+
+#### 3. Network Dropouts and Intermittent Failures
+- **Log Event**: `Entering TRANSIENT_FAILURE state`
+- **Probable Cause**: Unstable network connectivity, firewalls, or misconfigured HTTP/2 keep-alive settings. A single isolated occurrence is likely a transient network blip and nothing to worry about. Repeated occurrences warrant investigation.
+- **Possible Fixes**:
+    - Evaluate network connectivity between your hosts and the service endpoint
+
+---
+
+## Advanced Information
+
+### Standard gRPC Channel Lifecycle States
+
+Logged by `io.grpc.ChannelLogger`, these states define connection phases:
+- **CONNECTING**: Negotiating TCP/TLS handshakes.
+- **READY**: Connection is alive and processing RPCs.
+- **IDLE**: Connection is inactive and may be dropped to save resources.
+- **TRANSIENT_FAILURE**: Connection broke unexpectedly; automatic reconnection with back-off is in progress.
+- **SHUTDOWN**: Explicitly terminated by the client application.
+
+### Warning: Deprecated CPU-Based Scaling Heuristics
+
+> [!CAUTION]
+> Historically, the Java SDK used host CPU processor counts as a scaling heuristic via `InstantiatingGrpcChannelProvider.Builder.setChannelsPerCpu()`.
+> - **Status**: CPU-based heuristics are obsolete and **highly discouraged**.
+> - **Risk**: In virtualized container environments (Kubernetes, Docker), the heuristic frequently misreads virtualized CPU counts, often defaulting to a static pool of just 1 channel and severely throttling throughput.
+>
+> **How to Migrate Off `setChannelsPerCpu`:**
+> Note that you may need to update your client library or `gax-java` dependency versions to use `ChannelPoolSettings`.
+>
+> ```java
+> // Do NOT use the deprecated method:
+> // transportChannelProviderBuilder.setChannelsPerCpu(2.0, 100);
+>
+> // Instead, configure ChannelPoolSettings directly (tune values to your workload):
+> transportChannelProviderBuilder.setChannelPoolSettings(
+>     ChannelPoolSettings.builder()
+>         .setInitialChannelCount(2)
+>         .setMinChannelCount(2)
+>         .setMaxChannelCount(10)
+>         .setMinRpcsPerChannel(1)
+>         .setMaxRpcsPerChannel(50)
+>         .build()
+> );
+> ```
+
+grpc_channel_pool_guide.md
\ No newline at end of file

From 44fd2b0f6e273d6a231b13f79387e2e10161c407 Mon Sep 17 00:00:00 2001
From: Lawrence Qiu <lawrenceqiu@google.com>
Date: Thu, 23 Apr 2026 15:58:39 -0400
Subject: [PATCH 2/3] chore: Remove extra text at end of guide

---
 docs/grpc_channel_pool_guide.md | 2 --
 1 file changed, 2 deletions(-)

diff --git a/docs/grpc_channel_pool_guide.md b/docs/grpc_channel_pool_guide.md
index e25731cfaa31..ecf3e6fced1a 100644
--- a/docs/grpc_channel_pool_guide.md
+++ b/docs/grpc_channel_pool_guide.md
@@ -236,5 +236,3 @@ Logged by `io.grpc.ChannelLogger`, these states define connection phases:
 >         .build()
 > );
 > ```
-
-grpc_channel_pool_guide.md
\ No newline at end of file

From 38e11df22048e39fba5878613c9a718ab3b2c4a3 Mon Sep 17 00:00:00 2001
From: Lawrence Qiu <lawrenceqiu@google.com>
Date: Fri, 24 Apr 2026 12:02:16 -0400
Subject: [PATCH 3/3] docs: Add softer wording to explain the potential issues
 for single channel

---
 docs/grpc_channel_pool_guide.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/grpc_channel_pool_guide.md b/docs/grpc_channel_pool_guide.md
index ecf3e6fced1a..9bd221e7af29 100644
--- a/docs/grpc_channel_pool_guide.md
+++ b/docs/grpc_channel_pool_guide.md
@@ -14,11 +14,11 @@ This guide provides general best practices for configuring and tuning gRPC chann
 
 ## Background
 
-When handling heavy traffic, a single gRPC connection often becomes a performance bottleneck. To overcome this, the Google Cloud Java client libraries use **Channel Pooling** via the Gax-Java (Google API Extensions for Java) library.
+For applications with high throughput or concurrency demands, a single gRPC connection can potentially become a performance bottleneck due to limits on concurrent streams. Users may experience a spike in latency as requests queue on the gRPC connection when the limit is reached. Google middleware enforces a limit of 100 streams per connection. To overcome this, the Google Cloud Java client libraries use **Channel Pooling** via the Gax-Java (Google API Extensions for Java) library.
 
 GAX is the internal transport layer shared by all Google Cloud Java client libraries. It handles connection management, retries, and configuration, including channel pools.
 
-Channel pooling spreads the outbound RPC load across multiple identical gRPC connections, ensuring higher throughput and better resilience.
+Channel pooling spreads the outbound RPC load across multiple identical gRPC connections, which can help achieve higher throughput and better resilience for demanding workloads.
 
 ---
 
@@ -97,7 +97,7 @@ YourServiceClient client = YourServiceClient.create(settings);
 
 ## Sizing Your Channel Pool
 
-Having **too few** connections causes in-flight requests to queue on the client side, which shows up as high latency. Having **too many** causes idle connections to be dropped by GFE, inducing latency spikes as connections are re-established when traffic returns.
+If a pool has **too few** connections for its workload, in-flight requests might queue on the client side, which can show up as high latency. Conversely, having **too many** connections may lead to idle connections being dropped by GFE, potentially inducing tail latency spikes as connections need to be re-established when traffic returns.
 
 ### Why Do We Account for Concurrency?
 
@@ -106,7 +106,7 @@ gRPC sends multiple requests over the same connection simultaneously using HTTP/
 - **Serial throughput of one stream**: If each request takes 20ms, a single stream can complete 1,000ms / 20ms = **50 requests per second**.
 - **Concurrent streams needed**: If your application must serve 5,000 requests per second, one stream handling 50 req/s is not enough. You need enough simultaneous streams to collectively handle 5,000 req/s. That requires 5,000 / 50 = **100 streams open at the same time**.
 
-If you size your pool without accounting for this concurrency demand, you risk **client-side queuing** (too few channels, too few streams) or **idle connection dropouts** (more channels than your traffic needs).
+If the pool is sized without considering these concurrency factors, it may lead to **client-side queuing** (if sized too small) or **idle connection dropouts** (if sized too large).
 
 ### Calculating Optimal Channel Pool Bounds