Merged
Conversation
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…llection Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…r handling, simplified judge prompt Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…racted answer for judge, add truncation and warmup support Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…er normalization, numeric fallback, max_steps limit Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
6f11b41 to
9270c18
Compare
2 tasks
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…l-envs Signed-off-by: Frankie Siino <fsiino@nvidia.com> # Conflicts: # README.md
bxyu-nvidia
requested changes
Mar 3, 2026
nemo_gym/rollout_collection.py
Outdated
| res = await server_client.post(server_name=row["agent_ref"]["name"], url_path="/run", json=row) | ||
| await raise_for_status(res) | ||
| return row, await get_response_json(res) | ||
| try: |
Contributor
There was a problem hiding this comment.
can we revert this pls given we don't want to artificially limit the rollout collection time?
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
bxyu-nvidia
approved these changes
Mar 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains infra + environment-specific changes made during pipecleaning efforts to get the respective environments to run successfully in nemo-rl.
rollout_collection:
Any server crash, network timeout, model error or resource exhaustion can cause a rollout to fail. Any single failed row kills the entire batch. We return a zero-reward fallback after waiting for
ROLLOUT_ROW_TIMEOUT_SECONDS.equivalence_llm_judge:Previously any overly long generated answers from a thinking model would overflow the judge model's context window, crash the judge call and lead to cascading rollout failures. The implementation allows opt-in via
max_judge_input_tokensandchars_per_token_estimateto truncate the generated answer before passing it to the judge model.math_with_code:
Adds fallback for \boxed{} answer extraction. Previously only assistant text messages were searched for the answer. If a model were to answer via code execution, it ended up getting a zero reward. This fallback also searches
function_call_outputso an answer can be extracted from the tool output if the model prints it inside the executed code.math_with_judge:
- Add config fields for judge truncation, similar toequivalence_llm_judge.\boxed{}to judge model instead of the raw generated answer to avoid overwhelming the judge's context window.\(...\)produce\boxed{\42\)}which is not parsable by the math_verify library. This strips the outer delimiters to fix parsing.mcqa:
Address null option values in the dataset. Previously these were skipped which led to invalid rows being collected and training crashes.
mini_swe_agent:
Add a standalone data processing script to convert raw SWE-Bench/SWE-Gym data into the nemo-gym training format.