fix: Add patch for placement groups in local_vllm_model. by ffrujeri · Pull Request #694 · NVIDIA-NeMo/Gym

ffrujeri · 2026-02-13T23:59:05Z

What does this PR do?

Improves LocalVLLMModel robustness when launching multiple vLLM response API servers concurrently on Ray clusters.

This PR addresses multiple failure modes seen during concurrent startup:

vLLM DP placement group over-allocation (creating more than dp_size groups)
Ray node resource key parsing incompatibility (node:<ip>_group_* synthetic keys)
startup-time scheduling races between multiple LocalVLLMModel instances

Issues

Fixes https://github.com/NVIDIA-NeMo/Internal-Planning/issues/148

Usage

No user-facing API changes.

Existing LocalVLLMModel configs continue to work as-is.
Behavioral change: startup of local vLLM servers is now serialized via a Ray-wide lock to avoid placement races when multiple models are launched at the same time.

Additional Information

Root causes
- vLLM v1 create_dp_placement_groups can continue outer-loop iteration after reaching dp_size, leading to over-creation and assertion failure.
- Ray resource maps may include multiple node:* keys (e.g. node:<ip>_group_*), which broke code assuming a single node key.
- Two independent LocalVLLMModelActors can race during placement-group creation against the same shared Ray cluster resources.
Fixes in this PR
- Added/updated responses_api_models/local_vllm_model/vllm_patches.py patch for DP placement:
  - stops allocation once dp_size placement groups are created (breaks outer loop)
  - handles Ray synthetic node keys by selecting canonical node IP keys
  - supports strict/headless-compatible allocation paths safely
- Updated responses_api_models/local_vllm_model/app.py:
  - improved head-node selection using required local GPU capacity heuristics
  - added a detached Ray startup lock actor to serialize local vLLM server startup across concurrent model launches
  - lock is acquired before actor startup and released in finally for safety
Net effect
- Concurrent startup of multiple response API model servers is significantly more reliable.
- DP placement logic is resilient to newer Ray node-resource key formats.
- No config migration required.

Testing

config_paths="responses_api_models/local_vllm_model/configs/openai/gpt-oss-20b-reasoning-high.yaml,\
responses_api_models/local_vllm_model/configs/openai/gpt-oss-120b-reasoning-high.yaml"
ng_run "+config_paths=[${config_paths}]"

[1] gpt-oss-20b-reasoning-high (responses_api_models/local_vllm_model)
{
    'config_path': 'gpt-oss-20b-reasoning-high',
    'dir_path': (
        '/scratch/fsw/portfolios/llmservice/projects/llmservice_modelalignment_ppo/users/ffrujeri/Gym/responses_api_mo'
        'dels/local_vllm_model'
    ),
    'entrypoint': 'app.py',
    'host': '127.0.0.1',
    'name': 'local_vllm_model',
    'pid': 432161,
    'port': 13538,
    'process_name': 'gpt-oss-20b-reasoning-high',
    'server_type': 'responses_api_models',
    'url': 'http://127.0.0.1:13538',
}
[2] gpt-oss-120b-reasoning-high (responses_api_models/local_vllm_model)
{
    'config_path': 'gpt-oss-120b-reasoning-high',
    'dir_path': (
        '/scratch/fsw/portfolios/llmservice/projects/llmservice_modelalignment_ppo/users/ffrujeri/Gym/responses_api_mo'
        'dels/local_vllm_model'
    ),
    'entrypoint': 'app.py',
    'host': '127.0.0.1',
    'name': 'local_vllm_model',
    'pid': 432162,
    'port': 18253,
    'process_name': 'gpt-oss-120b-reasoning-high',
    'server_type': 'responses_api_models',
    'url': 'http://127.0.0.1:18253',
}
####################################################################################################

copy-pr-bot · 2026-02-13T23:59:08Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

ffrujeri changed the title ~~Add patch for placement groups in local_vllm_model.~~ fix: Add patch for placement groups in local_vllm_model. Feb 17, 2026

ffrujeri marked this pull request as ready for review February 18, 2026 02:38

ffrujeri force-pushed the ffrujeri/multi-node-local-vllm branch from 7d3a839 to 2971f31 Compare February 18, 2026 16:58

ffrujeri requested a review from bxyu-nvidia February 28, 2026 00:43

ffrujeri added 2 commits February 28, 2026 18:55

Add patch for placement groups in local_vllm_model.

313aaae

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

Move vllm_patches to the local_vllm_model dir.

ce4168f

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

ffrujeri force-pushed the ffrujeri/multi-node-local-vllm branch from 2971f31 to ce4168f Compare February 28, 2026 18:55

Resolve concurrency issue.

f8eea89

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

ffrujeri closed this Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Add patch for placement groups in local_vllm_model.#694

fix: Add patch for placement groups in local_vllm_model.#694
ffrujeri wants to merge 3 commits intomainfrom
ffrujeri/multi-node-local-vllm

ffrujeri commented Feb 13, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ffrujeri commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Issues

Usage

Additional Information

Uh oh!

copy-pr-bot bot commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ffrujeri commented Feb 13, 2026 •

edited

Loading