Skip to content

Conversation

@BenjaminBraunDev
Copy link
Contributor

This PR adds a running request scorer and tests, it's very similar to the queue scorer, but for the running requests metric added in #1899. It's weighted the same as queued requests in the default plugin config.

Does this PR introduce a user-facing change?:

NONE

@netlify
Copy link

netlify bot commented Dec 5, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit fa6ad7c
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6933801a763ecf0007833fbf
😎 Deploy Preview https://deploy-preview-1957--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: BenjaminBraunDev
Once this PR has been reviewed and has the lgtm label, please assign nirrozenbaum for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 5, 2025
@BenjaminBraunDev
Copy link
Contributor Author

Removed the running request scorer from the config until we benchmark it's performance.


// Score returns the scoring result for the given list of pods based on context.
func (s *RunningQueueSizeScorer) Score(_ context.Context, _ *types.CycleState, _ *types.LLMRequest, pods []types.Pod) map[types.Pod]float64 {
minQueueSize := math.MaxInt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix the names please, this is not queue, it is running


KVCacheUsagePercentKey = "KVCacheUsagePercent"
WaitingQueueSizeKey = "WaitingQueueSize"
RunningQueueSizeKey = "RunningQueueSize"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running requests are not a queue, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants