Skip to content

feat(spanner): add shared endpoint cooldowns for location-aware rerouting#12845

Open
rahul2393 wants to merge 6 commits intogoogleapis:mainfrom
rahul2393:endpoint-cooldown-re
Open

feat(spanner): add shared endpoint cooldowns for location-aware rerouting#12845
rahul2393 wants to merge 6 commits intogoogleapis:mainfrom
rahul2393:endpoint-cooldown-re

Conversation

@rahul2393
Copy link
Copy Markdown
Contributor

@rahul2393 rahul2393 commented Apr 17, 2026

Summary

This PR improves Java Spanner's location-aware bypass routing when routed replicas are overloaded or unavailable, and extends score-based replica selection

The client now:

  • avoids recently overloaded routed endpoints using shared cooldowns
  • records RESOURCE_EXHAUSTED / UNAVAILABLE as EWMA error penalties
  • uses EWMA-based selection for both preferLeader=false and strong preferLeader=true read/query routing when
    operation_uid is available

It also keeps the location-aware read path lock-free via immutable group snapshots.

What changed

  • Added shared channel-level cooldown tracking for routed endpoints that return RESOURCE_EXHAUSTED / UNAVAILABLE, while still keeping request-scoped exclusions for same-logical-request retries.
  • Updated bypass retry behavior so eligible reads/queries can reroute to another replica instead of immediately
    returning to the same failed endpoint.
  • Recorded RESOURCE_EXHAUSTED / UNAVAILABLE as EWMA error penalties for routed replicas, so unhealthy endpoints are deprioritized even after the immediate retry/cooldown window.
  • Extended score-based routing to strong preferLeader=true read/query traffic when operation_uid is present, using leader preference as a bias instead of a hard override.
  • Kept preferLeader=true behavior unchanged for paths without operation_uid such as mutation/commit routing.
  • Refactored KeyRangeCache group state to immutable snapshots and removed per-group synchronization from the routing hot path.

@rahul2393 rahul2393 requested review from a team as code owners April 17, 2026 22:34
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an endpoint cooldown mechanism to handle RESOURCE_EXHAUSTED errors and refactors the KeyRangeCache to use immutable snapshots, replacing per-group locking to improve read performance. The new EndpointOverloadCooldownTracker manages short-lived cooldowns with exponential backoff and jitter, while KeyAwareChannel is updated to exclude endpoints on both RESOURCE_EXHAUSTED and UNAVAILABLE status codes. Feedback is provided to optimize the GroupSnapshot constructor by removing a redundant list copy.

Comment on lines +573 to +577
private GroupSnapshot(ByteString generation, int leaderIndex, List<TabletSnapshot> tablets) {
this.generation = generation;
this.leaderIndex = leaderIndex;
this.tablets = Collections.unmodifiableList(new ArrayList<>(tablets));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The GroupSnapshot constructor performs a redundant copy of the tablets list. Since the only caller (CachedGroup.update) already creates a new ArrayList, we can wrap it directly in an unmodifiable list to avoid unnecessary allocations.

Suggested change
private GroupSnapshot(ByteString generation, int leaderIndex, List<TabletSnapshot> tablets) {
this.generation = generation;
this.leaderIndex = leaderIndex;
this.tablets = Collections.unmodifiableList(new ArrayList<>(tablets));
}
private GroupSnapshot(ByteString generation, int leaderIndex, List<TabletSnapshot> tablets) {
this.generation = generation;
this.leaderIndex = leaderIndex;
this.tablets = Collections.unmodifiableList(tablets);
}

@rahul2393
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements latency-aware routing for Spanner endpoints by introducing a score-based replica selection mechanism using time-decayed EWMA. Key additions include registries for tracking endpoint latency and inflight requests, a cooldown tracker for overloaded endpoints, and updates to the KeyRangeCache to support score-aware selection. Feedback identifies several high-priority issues in the new static registries, including a memory leak in the latency tracker map due to accumulating operation identifiers, potential key collisions between different client instances sharing a JVM, and a race condition when updating inflight request counts. There is also a recommendation to reduce the maximum size of the request ID cache to prevent excessive memory consumption.

Comment on lines +32 to +33
private static final ConcurrentHashMap<TrackerKey, LatencyTracker> TRACKERS =
new ConcurrentHashMap<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The TRACKERS map is a static ConcurrentHashMap that is never cleared. Since it is keyed by operationUid (which is per-range/group and can change over time) and address, this will lead to a memory leak in long-running applications as old operationUid entries accumulate indefinitely. Consider using a Cache with a maximum size and/or time-based expiration (e.g., com.google.common.cache.Cache with expireAfterAccess) to bound memory usage.

References
  1. For safety, use bounded resources with a defined maximum size to prevent resource exhaustion.

Comment on lines +140 to +142
static final class TrackerKey {
private final long operationUid;
private final String address;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using only operationUid and address as a key in a static registry poses a collision risk if multiple Spanner client instances (e.g., targeting different projects or instances) are used within the same JVM. While operationUid is intended to be unique, it is likely only scoped to a specific database or instance context. Consider including a more globally unique identifier, such as a databaseId or an instance-specific prefix, in the TrackerKey to ensure isolation between different clients.

Comment on lines +97 to +100
int updated = counter.decrementAndGet();
if (updated <= 0) {
INFLIGHT_REQUESTS.remove(address, counter);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a race condition in finishRequest. INFLIGHT_REQUESTS.remove(address, counter) can remove the AtomicInteger object even if another thread has just incremented it via beginRequest (since beginRequest uses computeIfAbsent which returns the existing object). This leads to lost inflight counts. Since the number of Spanner endpoints is relatively small and stable, it is safer to keep the AtomicInteger in the map indefinitely rather than attempting to remove it when it reaches zero.

Suggested change
int updated = counter.decrementAndGet();
if (updated <= 0) {
INFLIGHT_REQUESTS.remove(address, counter);
}
counter.decrementAndGet();


private static final Cache<String, RoutingTarget> TARGETS =
CacheBuilder.newBuilder()
.maximumSize(100_000_000L)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The maximumSize of 100,000,000 for TARGETS is excessively large and could lead to significant memory consumption if the cache fills up. Since entries are typically removed upon call closure in HeaderInterceptor and have a 10-minute expiration, a much smaller limit (e.g., 1,000,000) should be more than sufficient for even the most high-throughput applications while providing better safety against memory exhaustion.

Suggested change
.maximumSize(100_000_000L)
.maximumSize(1_000_000L)
References
  1. For safety, use a defined maximum size for bounded resources to prevent memory exhaustion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant