Skip to content

Conversation

@LukeAVanDrie
Copy link
Contributor

What type of PR is this?
/kind cleanup

What this PR does / why we need it:
This PR refactors the candidate resolution logic out of the Director and into a dedicated PodLocator component.

Reasoning:

  1. Reduce Lock Contention: High-throughput dispatch loops frequently query the Datastore for the same subset of pods. This introduces CachedPodLocator, a decorator that caches resolution results (TTL 50ms), significantly reducing RLock contention on the central Datastore.
  2. Preparation for Lazy Resolution: To support Scale-from-Zero (in a future PR), the Flow Control layer will need to resolve pods after requests have been enqueued. This interface decouples the resolution logic from the Director's immediate request scope.

Changes:

  • Added contracts.PodLocator interface.
  • Implemented DatastorePodLocator (logic moved from Director) and CachedPodLocator.
  • Injected PodLocator into the Director.
  • Replaced getCandidatePodsForScheduling with locator.Locate().

Note: This is a pure refactor. The sequence of events in HandleRequest remains unchanged (Admission is still checked after resolution).

Which issue(s) this PR fixes:
Tracks #1800

Does this PR introduce a user-facing change?:
NONE

This defines the contract for resolving candidate pods based on request
metadata, decoupling the resolution logic from the storage layer.
Introduces DatastorePodLocator and a caching decorator. This reduces
contention on the Datastore RWMutex during high-throughput dispatch
cycles by caching subset resolution results for a short TTL.
Refactors the Director to use the injected PodLocator interface instead
of the private getCandidatePodsForScheduling method. This prepares the
Director for lazy resolution without changing current behavior.
@k8s-ci-robot k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Dec 4, 2025
@netlify
Copy link

netlify bot commented Dec 4, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit cce12bf
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6931e188c0e0d7000981e423
😎 Deploy Preview https://deploy-preview-1950--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 4, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Dec 4, 2025
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am defining this interface within the Flow Control contracts package as it will serve as the primary contract for lazy candidate resolution during the dispatch cycle.

Crucially, this abstraction decouples the core request lifecycle from specific upstream filtering logic. While the current implementation handles Envoy subsetting, isolating this behavior behind an interface paves the way for promoting it to an EPP Extension Point. This would allow adopters to inject environment-specific or vendor-customized discovery mechanisms in the future without polluting the core directory.

@LukeAVanDrie
Copy link
Contributor Author

@lionelvillard

I have one more PR stacked on top of this that moves the candidate resolution line to after the admission control check. This involves injecting the PodLocator into the FlowControl layer code and performing lazy candidate resolution before each call to the saturation detector.

Splitting that out as a separate PR to keep this one as a behavioral no-op refactor. I will likely need to update some integration tests on the subsequent PR as well.

@ahg-g
Copy link
Contributor

ahg-g commented Dec 4, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 4, 2025
@ahg-g
Copy link
Contributor

ahg-g commented Dec 5, 2025

Reduce Lock Contention: High-throughput dispatch loops frequently query the Datastore for the same subset of pods. This introduces CachedPodLocator, a decorator that caches resolution results (TTL 50ms), significantly reducing RLock contention on the central Datastore.

What changed that made us concerned about RLock contention? is that expected as a result of the follow up PR? is this why we are introducing this cashing layer?

@LukeAVanDrie
Copy link
Contributor Author

What changed that made us concerned about RLock contention? is that expected as a result of the follow up PR?

Yes, this is specifically to support the scale-from-zero logic in the follow-up PR (#1952).

In the current legacy path, we resolve pods once per request. In the Flow Control path, the ShardProcessor needs to re-evaluate saturation state frequently for HoL requests (once per dispatch attempt--effectively polling to see if new capacity has appeared). Without this caching layer, that polling loop (1ms at its slowest) would hammer the Datastore RLock, potentially starving the writer threads trying to update the endpoint state.

is this why we are introducing this cashing layer?

Yes. I opted to encapsulate this inside the PodLocator decorator rather than adding complexity to the FlowController internals. This keeps the controller logic focused purely on queuing mechanics, while the locator handles the performance optimization transparently.

Perhaps this CachedPodLocator change belongs in a separate PR or even #1952. I thought it would be easiest to review alongside the DatastorePodLocator delegate though.

Copy link
Contributor

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the most expensive part is that we are creating a full snapshot of the pod metrics per request, we should find a way to take a snapshot and share it among requests with a "view" on that cache based on the latest subset.


// DatastorePodLocator implements contracts.PodLocator by querying the EPP Datastore.
// It centralizes the logic for resolving candidate pods based on request metadata (specifically Envoy subset filters).
type DatastorePodLocator struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are converging on endpoints instead of pods, right @kfswain ?

//
// This interface allows the Flow Controller to fetch a fresh list of pods dynamically during the dispatch cycle,
// enabling support for "Scale-from-Zero" scenarios where pods may not exist when the request is first enqueued.
type PodLocator interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not keep the candidates term instead? like EndpointCandidates or CandidateEndpoints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you just referring to the interface name here? Both of these seem better to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, referring to the name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge this as is and then rename after the other PR merges just so we don't cause conflicts with the second PR.

@LukeAVanDrie
Copy link
Contributor Author

LukeAVanDrie commented Dec 5, 2025

It seems the most expensive part is that we are creating a full snapshot of the pod metrics per request, we should find a way to take a snapshot and share it among requests with a "view" on that cache based on the latest subset.

This is what the caching decorator is for. I am caching by "subset key" though. Not per-pod. If two requests share the same subset, we use the cached results. This handles 99% of the calls which are sourced from tight loop spinning on the HoL request in Flow Control. Cache misses for most unique requests are probably fine as this is how EPP operated before this change.

I was trying to strike a balance between caching on request ID and caching on pod namepsaced name.

@ahg-g
Copy link
Contributor

ahg-g commented Dec 5, 2025

It seems the most expensive part is that we are creating a full snapshot of the pod metrics per request, we should find a way to take a snapshot and share it among requests with a "view" on that cache based on the latest subset.

This is what the caching decorator is for. I am caching by "subset key" though. Not per-pod. If two requests share the same subset, we use the cached results. This handles 99% of the calls which are sourced from tight loop spinning on the HoL request in Flow Control. Cache misses for most unique requests are probably fine as this is how EPP operated before this change.

I was trying to strike a balance between caching on request ID and caching on pod namepsaced name.

Great!

@ahg-g
Copy link
Contributor

ahg-g commented Dec 6, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 6, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, LukeAVanDrie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 6, 2025
@k8s-ci-robot k8s-ci-robot merged commit a88e4fb into kubernetes-sigs:main Dec 6, 2025
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants