fix: Add patch for placement groups in local_vllm_model.#694
Closed
fix: Add patch for placement groups in local_vllm_model.#694
Conversation
7d3a839 to
2971f31
Compare
Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>
Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>
2971f31 to
ce4168f
Compare
Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Improves
LocalVLLMModelrobustness when launching multiple vLLM response API servers concurrently on Ray clusters.This PR addresses multiple failure modes seen during concurrent startup:
dp_sizegroups)node:<ip>_group_*synthetic keys)LocalVLLMModelinstancesIssues
Fixes https://github.com/NVIDIA-NeMo/Internal-Planning/issues/148
Usage
No user-facing API changes.
Existing
LocalVLLMModelconfigs continue to work as-is.Behavioral change: startup of local vLLM servers is now serialized via a Ray-wide lock to avoid placement races when multiple models are launched at the same time.
Additional Information
Root causes
create_dp_placement_groupscan continue outer-loop iteration after reachingdp_size, leading to over-creation and assertion failure.node:*keys (e.g.node:<ip>_group_*), which broke code assuming a single node key.LocalVLLMModelActors can race during placement-group creation against the same shared Ray cluster resources.Fixes in this PR
responses_api_models/local_vllm_model/vllm_patches.pypatch for DP placement:dp_sizeplacement groups are created (breaks outer loop)responses_api_models/local_vllm_model/app.py:finallyfor safetyNet effect