-
Notifications
You must be signed in to change notification settings - Fork 206
[feat] Add running requests scorer and tests #1957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[feat] Add running requests scorer and tests #1957
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: BenjaminBraunDev The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Removed the running request scorer from the config until we benchmark it's performance. |
|
|
||
| // Score returns the scoring result for the given list of pods based on context. | ||
| func (s *RunningQueueSizeScorer) Score(_ context.Context, _ *types.CycleState, _ *types.LLMRequest, pods []types.Pod) map[types.Pod]float64 { | ||
| minQueueSize := math.MaxInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix the names please, this is not queue, it is running
|
|
||
| KVCacheUsagePercentKey = "KVCacheUsagePercent" | ||
| WaitingQueueSizeKey = "WaitingQueueSize" | ||
| RunningQueueSizeKey = "RunningQueueSize" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running requests are not a queue, no?
This PR adds a running request scorer and tests, it's very similar to the queue scorer, but for the running requests metric added in #1899. It's weighted the same as queued requests in the default plugin config.
Does this PR introduce a user-facing change?: