feat: Adds GenRM Response API Model with support for custom roles used in pairwise response comparison. by ffrujeri · Pull Request #674 · NVIDIA-NeMo/Gym

ffrujeri · 2026-02-11T22:49:34Z

What does this PR do?

Adds GenRM support via a dedicated Response API Model package (responses_api_models/genrm_model/). The package provides a single local variant: a GenRM model that uses a locally managed vLLM server (download model + start vLLM, e.g. via Ray).

The key design goal is to keep all GenRM-specific logic inside the model server, so that the resources server (and the base schema) can use standard OpenAI roles throughout. Role remapping to the GenRM chat-template roles (response_1, response_2, principle) happens as a final preprocessing step, just before the request is forwarded to vLLM.

Related to PR #523. Part of #516.

Architecture

Package layout (`responses_api_models/genrm_model/`)

genrm_model/
├── __init__.py
├── app.py
├── pyproject.toml
├── setup.py
├── README.md
├── configs/
│   └── genrm_model.yaml
└── tests/
    └── test_app.py      # config tests + preprocessing unit tests

Components

GenRMModelConfig — extends LocalVLLMModelConfig with supports_principle_role.
GenRMModelMixin — overrides two hooks on VLLMModel:
- get_converter() — returns a plain VLLMConverter (no custom converter needed).
- _preprocess_chat_completion_create_params() — reads the comparison payload from
  metadata and appends the GenRM chat-template roles immediately before the vLLM call:
  - metadata["response_1"] → appended as a "response_1" message
  - metadata["response_2"] → appended as a "response_2" message
  - metadata["principle"] → appended as a "principle" message (when supports_principle_role=True)
  - metadata is consumed here and not forwarded to vLLM.
GenRMModel — GenRMModelMixin + LocalVLLMModel.

Changes to `vllm_model/app.py`

_preprocess_chat_completion_create_params(self, request, body_dict) -> Dict[str, Any] is extracted from chat_completions() as an overrideable hook. The base implementation covers replace_developer_role_with_system, model / chat_template_kwargs injection, token-ID augmentation, reasoning-parser handling, and extra_body merging. Subclasses override it to apply model-specific transformations before the vLLM call.

Schema (`nemo_gym/openai_utils.py`)

NeMoGymEasyInputMessage.role and NeMoGymMessage.role use standard OpenAI roles only (user, assistant, system, developer). The custom response_1 / response_2 / principle literals are not part of the request schema — they are an internal vLLM chat-template detail handled entirely within GenRMModelMixin.

Usage

Config key: genrm_model under responses_api_models, with entrypoint: app.py. See configs/genrm_model.yaml.

from responses_api_models.genrm_model.app import GenRMModel, GenRMModelConfig

How the resources server calls the GenRM model server:

The input field carries only the conversation history (standard OpenAI roles). The comparison
payload is passed via metadata so the request schema stays generic:

responses_create_params.input = conversation_history_messages   # user / assistant turns only
responses_create_params.metadata = {
    "response_1": response_1_text,
    "response_2": response_2_text,
    "principle":  principle_text,   # omit key when use_principle=False
}

GenRMModelMixin._preprocess_chat_completion_create_params reads metadata, appends the
custom-role messages to the conversation, and pops metadata before forwarding to vLLM.

Testing

Launching a multi-node config:

ng_run "+config_paths=[responses_api_models/genrm_model/configs/genrm_model.yaml]"

genrm_model:
  responses_api_models:
    genrm_model:
      entrypoint: app.py
      model: <MODEL_PATH>
      uses_reasoning_parser: true
      return_token_id_information: false
      supports_principle_role: true
      debug: true
      hf_home: null

      vllm_serve_env_vars:
        VLLM_RAY_DP_PACK_STRATEGY: strict

      vllm_serve_kwargs:
        tensor_parallel_size: 8
        data_parallel_size: 2
        data_parallel_size_local: 1
        pipeline_parallel_size: 1
        reasoning_parser: deepseek_r1
        gpu_memory_utilization: 0.85
        max_model_len: 60000
        model_loader_extra_config:
          enable_multithread_load: true
          num_threads: 108

We get a runnable server:

All 1 / 1 servers ready! Polling every 60s

####################################################################################################
#
# Server Instances
#
####################################################################################################

[1] genrm_model (responses_api_models/genrm_model)
{
    'config_path': 'genrm_model',
    'entrypoint': 'app.py',
    'host': '127.0.0.1',
    'name': 'genrm_model',
    'port': 11093,
    'server_type': 'responses_api_models',
    'url': 'http://127.0.0.1:11093',
}
####################################################################################################

And a successful response (input is conversation history only; responses and principle go in metadata):

curl -s -X POST http://127.0.0.1:11093/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "input": [
      {"role": "user", "content": "What is the capital of France?", "type": "message"}
    ],
    "metadata": {
      "principle":  "Judge which response is better.",
      "response_1": "The capital of France is Paris.",
      "response_2": "Paris is the capital city of France."
    },
    "temperature": 0.0,
    "max_output_tokens": 512
  }'

copy-pr-bot · 2026-02-11T22:49:38Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

responses_api_models/genrm_model/setup.py

responses_api_models/genrm_model/pyproject.toml

responses_api_models/genrm_model/app.py

responses_api_models/genrm_model/configs/genrm_model.yaml

nemo_gym/openai_utils.py

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

…training workflows. (#679) # GenRM Compare Resource Server & Cohort-Based Verify ## What does this PR do? This PR adds a production-ready **Resource Server** for comparing multiple candidate responses using GenRM models, and moves RLHF-specific reward logic (cohort buffering and comparison) into that server so that **rollout collection and consumer libraries (e.g. nemo-rl) stay generic**. ## Issues - Related to PR #523 (reference). - Part of #516. ## Summary In RLHF, rewards are **relative to other rollouts for the same task** (e.g. same prompt), not independent. This PR addresses that by: - **Cohort-based verify**: The genrm_compare server’s `/verify` endpoint buffers rollouts by prompt (and optional principle). When `num_rollouts_per_prompt` rollouts have been received for a prompt, it runs pairwise comparison, aggregates scores, and returns the appropriate reward to each of the N callers. Callers naturally “wait” until their cohort is complete via the async verify flow. - **No RLHF hacks in Gym or NeMo RL**: Rollout collection stays a simple “post each row to agent `/run`”. The agent calls the resources server’s `/verify` with the response; genrm_compare owns all buffering and comparison. No comparison strategy or prompt buffering in `rollout_collection.py`. ### Key features - **Cohort-based verify**: Configurable `num_rollouts_per_prompt`; verify buffers by prompt (and principle), runs comparison when cohort is full, distributes rewards. - **Batch `/compare` API**: Direct comparison of N `response_objs` (e.g. for scripts or tests). - **Pairwise comparison**: Circular and all-pairs strategies; tiebreaker and length-based bonuses; optional principle-based judging. - **GenRM model alignment**: Config aligned with `genrm_model` (server name `genrm_model`; custom roles `response_1`, `response_2`, `principle`). - **Clean boundaries**: Zero GenRM-specific code in rollout collection or config types; all RLHF logic in genrm_compare. ## Architecture ``` Rollout collection └── For each row: POST to agent /run (unchanged; no strategy or buffering) Agent (e.g. simple_agent) └── /run: generate response → POST to resources server /verify with (params, response, optional principle) GenRM Compare Resource Server ├── /verify (per-rollout) │ ├── num_rollouts_per_prompt <= 1 → return default_score │ └── num_rollouts_per_prompt > 1: │ ├── Buffer by prompt_key (input + principle) │ ├── When cohort size == num_rollouts_per_prompt: │ │ ├── Run pairwise comparison (GenRM model) │ │ ├── Aggregate scores (tiebreaker, length bonuses) │ │ └── Resolve all N pending verify callers with their rewards │ └── Return this rollout’s reward └── /compare (batch) └── Compare N response_objs; return rewards + metrics (for scripts/tests) ``` - **Config**: genrm_compare config includes `num_rollouts_per_prompt`, `genrm_model_server` (name `genrm_model`), and comparison/aggregation options. No `comparison_strategy` in global config for rollout. - **Data**: For RLHF, provide `num_rollouts_per_prompt` rows per prompt (e.g. via `num_repeats` when loading data). ## Testing ```bash curl -s -X POST http://127.0.0.1:17795/compare \ -H "Content-Type: application/json" \ -d '{ "conversation_history": [{"role": "user", "content": "What is SKILL?"}], "response_objs": [ {"output": [{"type": "message", "content": [{"type": "output_text", "text": "SKILL is a verb meaning to kill."}]}]}, {"output": [{"type": "message", "content": [{"type": "output_text", "text": "Skill refers to the ability to perform a task well."}]}]} ] }' | jq . ``` GenRM returns a response with reasoning and a final message containing JSON scores, e.g.: ```json { "rewards": [ 1.025, 4.475 ], "comparison_results": [ { "response_i": 0, "response_j": 1, "judge_idx": 0, "score_1": 1.0, "score_2": 5.0, "ranking": 6.0 }, { "response_i": 1, "response_j": 0, "judge_idx": 0, "score_1": 4.0, "score_2": 1.0, "ranking": 1.0 } ], "metrics": { "mean_individual_score": 2.75, "std_individual_score": 1.7853571071357126, "tiebreak_usage_rate": 0.0 } } ``` Unit tests cover genrm_compare (verify stub when N≤1, cohort logic, compare), utils (prompt key, parsing, aggregation), and comparison_strategies (batch client and helpers). --------- Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

ffrujeri changed the title ~~feat: Adds GenRM (Generative Reward Model) Response API Model with support for custom roles used in pairwise response comparison.~~ feat: Adds GenRM Response API Model with support for custom roles used in pairwise response comparison. Feb 12, 2026

ffrujeri force-pushed the ffrujeri/genrm-model branch from c22967f to dd172a0 Compare February 18, 2026 02:33

ffrujeri changed the base branch from main to ffrujeri/multi-node-local-vllm February 18, 2026 02:36

ffrujeri marked this pull request as ready for review February 18, 2026 02:38

ffrujeri requested a review from a team as a code owner February 18, 2026 02:38

bxyu-nvidia linked an issue Feb 18, 2026 that may be closed by this pull request

feat: Reward model support #516

Closed

ffrujeri force-pushed the ffrujeri/multi-node-local-vllm branch from 7d3a839 to 2971f31 Compare February 18, 2026 16:58

ffrujeri force-pushed the ffrujeri/genrm-model branch from dd172a0 to 7458914 Compare February 18, 2026 16:58

ffrujeri requested review from bxyu-nvidia February 28, 2026 00:44

ffrujeri force-pushed the ffrujeri/multi-node-local-vllm branch from 2971f31 to ce4168f Compare February 28, 2026 18:55

ffrujeri force-pushed the ffrujeri/genrm-model branch from 7458914 to 588a33b Compare March 3, 2026 00:08

ffrujeri changed the base branch from ffrujeri/multi-node-local-vllm to bxyu/rollout-collection-infra March 3, 2026 00:08

Base automatically changed from bxyu/rollout-collection-infra to main March 3, 2026 04:10

bxyu-nvidia requested changes Mar 3, 2026

View reviewed changes

ffrujeri force-pushed the ffrujeri/genrm-model branch 2 times, most recently from 4fc3148 to 6138812 Compare March 3, 2026 19:53

ffrujeri added 5 commits March 3, 2026 22:55

Add GenRM Response API Model with custom role support.

114f750

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

Refactor GenRMModel conversion logic.

e36cd32

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

Add unit tests.

c0692fb

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

Modify message passing through metadata field.

83179ec

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

Fix local_vllm_model issue.

ca71797

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

ffrujeri force-pushed the ffrujeri/genrm-model branch from 90f96d3 to ca71797 Compare March 3, 2026 22:55

ffrujeri added 2 commits March 3, 2026 23:04

Remove NeMoGymChatCompletionToolMessageParam.

746025a

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

bxyu-nvidia approved these changes Mar 3, 2026

View reviewed changes

bxyu-nvidia merged commit 0c62ff0 into main Mar 4, 2026
5 checks passed

bxyu-nvidia deleted the ffrujeri/genrm-model branch March 4, 2026 00:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Adds GenRM Response API Model with support for custom roles used in pairwise response comparison.#674

feat: Adds GenRM Response API Model with support for custom roles used in pairwise response comparison.#674
bxyu-nvidia merged 7 commits intomainfrom
ffrujeri/genrm-model

ffrujeri commented Feb 11, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ffrujeri commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Architecture

Package layout (responses_api_models/genrm_model/)

Components

Changes to vllm_model/app.py

Schema (nemo_gym/openai_utils.py)

Usage

Testing

Uh oh!

copy-pr-bot bot commented Feb 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ffrujeri commented Feb 11, 2026 •

edited

Loading

Package layout (`responses_api_models/genrm_model/`)

Changes to `vllm_model/app.py`

Schema (`nemo_gym/openai_utils.py`)