feat: adding wandb table log feature, showing concrete test samples by vinhngx · Pull Request #2018 · NVIDIA-NeMo/RL

vinhngx · 2026-02-24T04:51:31Z

What does this PR do?

VERL has this useful feature, where at each validation step, a number of input/output samples are included in Wandb for easy manual inspection:

This PR add a similar functionality to NemoRL:

Add optional logging of validation input/output samples to WandB as tables so runs can be inspected manually in the WandB UI (prompt, response, reward per sample).

Issues

Usage

Enable WandB and set num_val_samples_to_log in the logger config. At each validation step, a subset of samples (mix of high- and low-reward) is logged as a WandB table with columns input, output, and reward.

# In your recipe or config (e.g. under logger:)
logger:
  wandb_enabled: true
  wandb:
    project: "my-project"
    name: "my-run"
  num_val_samples_to_log: 16   # 0 to disable; number of val samples to log to wandb table

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

Files changed: nemo_rl/algorithms/grpo.py (wandb Table logging and helpers), nemo_rl/utils/logger.py (num_val_samples_to_log in LoggerConfig), examples/configs/grpo_math_1B.yaml (config option), docs/design-docs/logger.md (Validation Table Logging section).
No tables are logged when wandb_enabled is false or num_val_samples_to_log is 0.

Summary by CodeRabbit

Release Notes

New Features

Added WandB table logging for validation samples, displaying input, output, and reward data to monitor validation performance in real-time.

Documentation

Added comprehensive guide for configuring and using validation table logging in WandB, including configuration examples and sample selection details.

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

coderabbitai · 2026-02-24T04:57:40Z

📝 Walkthrough

Walkthrough

Adds WandB table logging for validation samples during GRPO training. Introduces configuration field num_val_samples_to_log to logger and GRPO configs, implements helper functions to select representative samples (best, worst, middle), and provides documentation.

Changes

Cohort / File(s)	Summary
Documentation `docs/design-docs/logger.md`	Added new section documenting WandB table logging feature for validation samples, including configuration, behavior, and usage details.
Configuration & Logger `examples/configs/grpo_math_1B.yaml`, `nemo_rl/utils/logger.py`	Added `num_val_samples_to_log` configuration field to enable WandB logging. Updated example config to set num_val_samples_to_print to 16 and num_val_samples_to_log to 16. Also scaled cluster hardware from 1 GPU per node to 8 GPUs across 2 nodes.
Core Implementation `nemo_rl/algorithms/grpo.py`	Implemented WandB table logging with new helper functions: `_message_content_to_str()` converts message content to strings, `_select_sample_indices_for_logging()` selects best/worst/middle samples, and `_log_validation_samples_to_wandb()` constructs and logs WandB table with input, output, and reward columns. Added wandb.Table import and new `num_val_samples_to_log` field to GRPOLoggerConfig. Integrated logging into validate() method.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

ease of use

Suggested reviewers

guyueh1
terrykong

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR introduces new WandB logging feature but provides no unit tests, integration tests, or formal testing information for the three new helper functions added to grpo.py.	Add unit tests covering the new helper functions (_message_content_to_str, _select_sample_indices_for_logging, _log_validation_samples_to_wandb) and their integration with validation workflow, then reference test results in PR description.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: adding a WandB table logging feature for validation samples. It is concise, specific, and clearly summarizes the primary objective of the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

nemo_rl/utils/logger.py (1)

77-90: ⚠️ Potential issue | 🟡 Minor

Document num_val_samples_to_log in the TypedDict interface.
Please add a Google‑style class docstring (Attributes section) that states the purpose, valid values (≥0), and recommended default (0 disables). This satisfies the config-key documentation requirement at the definition site.
✍️ Suggested docstring update
 class LoggerConfig(TypedDict):
+    """Logger configuration.
+
+    Attributes:
+        num_val_samples_to_log (int | None): Number of validation samples to log to WandB
+            tables. Use 0 to disable. Recommended default: 0.
+    """
     log_dir: str
     wandb_enabled: bool
     swanlab_enabled: bool
@@
     num_val_samples_to_print: NotRequired[int]
     num_val_samples_to_log: NotRequired[int]
As per coding guidelines, "When adding a new config key to a TypedDict subclass, document the key's purpose, valid values/types, recommended default, and reflect the default in exemplar YAMLs under examples/configs/*.yaml".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_rl/utils/logger.py` around lines 77 - 90, Add a Google‑style class
docstring to the LoggerConfig TypedDict that includes an "Attributes" section
documenting the new config key "num_val_samples_to_log" (purpose: number of
validation samples to log, valid values: integer >= 0, recommended default: 0
disables logging), and ensure the description notes type and behavior; update
any exemplar YAMLs under examples/configs/*.yaml to reflect the default as well.
Reference: LoggerConfig and the key name num_val_samples_to_log when locating
where to add the docstring.

nemo_rl/algorithms/grpo.py (1)

2154-2166: ⚠️ Potential issue | 🟡 Minor

Avoid non‑None defaults in code for config values.
get(..., 0) introduces a hidden default. Gate the call on presence of the key instead.

✅ Suggested change (no hidden default)

-        _log_validation_samples_to_wandb(
-            logger=logger,
-            message_logs=all_message_logs,
-            rewards=total_rewards,
-            step=step,
-            num_samples=master_config["logger"].get("num_val_samples_to_log", 0),
-        )
+        num_val_samples_to_log = master_config["logger"].get("num_val_samples_to_log")
+        if num_val_samples_to_log is not None:
+            _log_validation_samples_to_wandb(
+                logger=logger,
+                message_logs=all_message_logs,
+                rewards=total_rewards,
+                step=step,
+                num_samples=num_val_samples_to_log,
+            )

As per coding guidelines, "YAML is the single source of truth for configuration defaults; do not set non-None defaults in code for configuration values".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemo_rl/algorithms/grpo.py` around lines 2154 - 2166, The code currently uses
master_config["logger"].get("num_val_samples_to_log", 0) which embeds a non-None
default in code; instead check for the key and only pass the value when present.
Modify the call to _log_validation_samples_to_wandb: compute num_samples only if
"num_val_samples_to_log" in master_config["logger"] (e.g. num_samples =
master_config["logger"]["num_val_samples_to_log"]) and call
_log_validation_samples_to_wandb with that num_samples; otherwise call
_log_validation_samples_to_wandb without the num_samples argument (or pass None
if the function accepts it), removing the .get(..., 0) usage and avoiding a
hidden default. Ensure the change touches the invocation site of
_log_validation_samples_to_wandb and not logger.log_batched_dict_as_jsonl.

🧹 Nitpick comments (2)

docs/design-docs/logger.md (1)

167-167: Tighten the sentence for clarity.
Suggested edit: “When set to >0, the logger uploads a table at each validation step with columns input, output, and reward.”

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/design-docs/logger.md` at line 167, The sentence is wordy—tighten it to
clearly state behavior for num_val_samples_to_log: replace the current line with
a concise version like “When set to >0, the logger uploads a table at each
validation step with columns input, output, and reward.” Keep references to the
resulting WandB tables (val/generations and val/generations_step_{step}) and
preserve the mention of num_val_samples_to_log and the three column names
(input, output, reward) so readers can find related settings and outputs.

examples/configs/grpo_math_1B.yaml (1)

331-334: Clarify the WandB-disabled logging knobs.
With wandb_enabled: false, num_val_samples_to_log: 16 has no effect. Consider setting it to 0 here (or enabling WandB) to avoid confusion in the example.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/configs/grpo_math_1B.yaml` around lines 331 - 334, The example
config shows wandb_enabled and num_val_samples_to_log but with wandb_enabled:
false the logging count is a no-op; update the example to avoid confusion by
either setting num_val_samples_to_log: 0 when wandb_enabled: false or flip
wandb_enabled to true—change the num_val_samples_to_log key (and/or
wandb_enabled) in the config so the settings are consistent (refer to
wandb_enabled and num_val_samples_to_log).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_rl/algorithms/grpo.py`:
- Around line 2178-2238: Add Google-style docstrings to the three helper
functions _message_content_to_str, _select_sample_indices_for_logging, and
_log_validation_samples_to_wandb: for each function include a short one-line
summary, args section with types and descriptions (e.g., content: Any; rewards:
list[float]; message_logs: list[LLMMessageLogType]; step: int; num_samples: int;
logger: Optional[Logger]), a Returns section describing the return type and
meaning, and any Raises or Notes if applicable; ensure the docstrings follow the
Google style (triple-quoted, Sphinx-parsable) and attach them immediately above
each function definition without changing function signatures or behavior.
- Around line 191-195: Add a Google-style docstring to the GRPOLoggerConfig
TypedDict class explaining the purpose of each key and documenting the new key
num_val_samples_to_log: describe its purpose (number of validation samples to
write to wandb table), valid values/types (int >= 0), and a recommended default
(e.g., 0 or a small positive int); update the class docstring to list
num_val_samples_to_print and num_val_samples_to_log with types and defaults, and
update the exemplar YAMLs under examples/configs/*.yaml to reflect the chosen
default for num_val_samples_to_log so examples match the documented default.

---

Outside diff comments:
In `@nemo_rl/algorithms/grpo.py`:
- Around line 2154-2166: The code currently uses
master_config["logger"].get("num_val_samples_to_log", 0) which embeds a non-None
default in code; instead check for the key and only pass the value when present.
Modify the call to _log_validation_samples_to_wandb: compute num_samples only if
"num_val_samples_to_log" in master_config["logger"] (e.g. num_samples =
master_config["logger"]["num_val_samples_to_log"]) and call
_log_validation_samples_to_wandb with that num_samples; otherwise call
_log_validation_samples_to_wandb without the num_samples argument (or pass None
if the function accepts it), removing the .get(..., 0) usage and avoiding a
hidden default. Ensure the change touches the invocation site of
_log_validation_samples_to_wandb and not logger.log_batched_dict_as_jsonl.

In `@nemo_rl/utils/logger.py`:
- Around line 77-90: Add a Google‑style class docstring to the LoggerConfig
TypedDict that includes an "Attributes" section documenting the new config key
"num_val_samples_to_log" (purpose: number of validation samples to log, valid
values: integer >= 0, recommended default: 0 disables logging), and ensure the
description notes type and behavior; update any exemplar YAMLs under
examples/configs/*.yaml to reflect the default as well. Reference: LoggerConfig
and the key name num_val_samples_to_log when locating where to add the
docstring.

---

Nitpick comments:
In `@docs/design-docs/logger.md`:
- Line 167: The sentence is wordy—tighten it to clearly state behavior for
num_val_samples_to_log: replace the current line with a concise version like
“When set to >0, the logger uploads a table at each validation step with columns
input, output, and reward.” Keep references to the resulting WandB tables
(val/generations and val/generations_step_{step}) and preserve the mention of
num_val_samples_to_log and the three column names (input, output, reward) so
readers can find related settings and outputs.

In `@examples/configs/grpo_math_1B.yaml`:
- Around line 331-334: The example config shows wandb_enabled and
num_val_samples_to_log but with wandb_enabled: false the logging count is a
no-op; update the example to avoid confusion by either setting
num_val_samples_to_log: 0 when wandb_enabled: false or flip wandb_enabled to
true—change the num_val_samples_to_log key (and/or wandb_enabled) in the config
so the settings are consistent (refer to wandb_enabled and
num_val_samples_to_log).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9148186 and 3adb2bf.

📒 Files selected for processing (4)

docs/design-docs/logger.md
examples/configs/grpo_math_1B.yaml
nemo_rl/algorithms/grpo.py
nemo_rl/utils/logger.py

coderabbitai · 2026-02-24T04:57:43Z

nemo_rl/algorithms/grpo.py

 class GRPOLoggerConfig(LoggerConfig):
    num_val_samples_to_print: int  # number of val samples to print to stdout
+    num_val_samples_to_log: NotRequired[
+        int
+    ]  # number of val samples to log to wandb table


⚠️ Potential issue | 🟡 Minor

Add a Google‑style docstring for GRPOLoggerConfig (new key docs + default).
num_val_samples_to_log is a new config key and should be documented with purpose, valid values, and recommended default at the type definition site.

✍️ Suggested docstring update

class GRPOLoggerConfig(LoggerConfig): + """Logger config for GRPO training. + + Attributes: + num_val_samples_to_print (int): Number of validation samples to print to stdout. + num_val_samples_to_log (int | None): Number of validation samples to log to WandB + tables. Use 0 to disable. Recommended default: 0. + """ num_val_samples_to_print: int # number of val samples to print to stdout num_val_samples_to_log: NotRequired[ int ] # number of val samples to log to wandb table

As per coding guidelines, "When adding a new config key to a TypedDict subclass, document the key's purpose, valid values/types, recommended default, and reflect the default in exemplar YAMLs under examples/configs/*.yaml".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_rl/algorithms/grpo.py` around lines 191 - 195, Add a Google-style docstring to the GRPOLoggerConfig TypedDict class explaining the purpose of each key and documenting the new key num_val_samples_to_log: describe its purpose (number of validation samples to write to wandb table), valid values/types (int >= 0), and a recommended default (e.g., 0 or a small positive int); update the class docstring to list num_val_samples_to_print and num_val_samples_to_log with types and defaults, and update the exemplar YAMLs under examples/configs/*.yaml to reflect the chosen default for num_val_samples_to_log so examples match the documented default.

coderabbitai · 2026-02-24T04:57:43Z

nemo_rl/algorithms/grpo.py

+def _message_content_to_str(content: Any) -> str:
+    if isinstance(content, list):
+        return "".join(str(part) for part in content)
+    return str(content)
+
+
+def _select_sample_indices_for_logging(
+    rewards: list[float], num_samples: int
+) -> list[int]:
+    if num_samples <= 0:
+        return []
+    indices = list(range(len(rewards)))
+    if len(indices) <= num_samples:
+        return indices
+    sorted_indices = sorted(indices, key=lambda i: rewards[i], reverse=True)
+    half = num_samples // 2
+    selected = sorted_indices[:half] + sorted_indices[-half:]
+    if num_samples % 2 == 1:
+        middle_idx = len(sorted_indices) // 2
+        selected.append(sorted_indices[middle_idx])
+    return selected[:num_samples]
+
+
+def _log_validation_samples_to_wandb(
+    logger: Optional[Logger],
+    message_logs: list[LLMMessageLogType],
+    rewards: list[float],
+    step: int,
+    num_samples: int,
+) -> None:
+    if logger is None or logger.wandb_logger is None:
+        return
+    if not message_logs or not rewards or num_samples <= 0:
+        return
+
+    num_samples = min(num_samples, len(message_logs))
+    indices = _select_sample_indices_for_logging(rewards, num_samples)
+    rows: list[list[Any]] = []
+    for idx in indices:
+        message_log = message_logs[idx]
+        prompt = ""
+        response = ""
+        for msg in message_log:
+            role = msg.get("role")
+            content = _message_content_to_str(msg.get("content", ""))
+            if role == "user" and not prompt:
+                prompt = content
+            elif role == "assistant":
+                response = content
+        rows.append([prompt, response, float(rewards[idx])])
+
+    if rows:
+        table = Table(columns=["input", "output", "reward"], data=rows)
+        logger.wandb_logger.log_metrics(
+            {
+                "val/generations": table,
+                f"val/generations_step_{step}": table,
+            },
+            step=step,
+            prefix="",
+        )


⚠️ Potential issue | 🟡 Minor

Add Google‑style docstrings to new helper functions.
These helpers are new public‑ish utilities (even if underscored) and should have Sphinx‑parseable docstrings.

✍️ Suggested docstrings

def _message_content_to_str(content: Any) -> str: + """Convert message content to a string. + + Args: + content: Message content value (string, list, etc.). + + Returns: + String representation of the content. Lists are joined element-wise. + """ if isinstance(content, list): return "".join(str(part) for part in content) return str(content) def _select_sample_indices_for_logging( rewards: list[float], num_samples: int ) -> list[int]: + """Select indices for logging a mix of high/low reward samples. + + Args: + rewards: Reward values aligned with message logs. + num_samples: Number of indices to select. + + Returns: + List of selected indices (best/worst and optional middle). + """ if num_samples <= 0: return [] @@ def _log_validation_samples_to_wandb( logger: Optional[Logger], message_logs: list[LLMMessageLogType], rewards: list[float], step: int, num_samples: int, ) -> None: + """Log validation samples to WandB as a table. + + Args: + logger: Logger with an initialized WandB backend. + message_logs: Message logs containing prompts and responses. + rewards: Reward values aligned with message logs. + step: Validation step number. + num_samples: Number of samples to log. + """ if logger is None or logger.wandb_logger is None: return

As per coding guidelines, "Use Google style docstrings for classes and functions, which can be parsed by Sphinx".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_rl/algorithms/grpo.py` around lines 2178 - 2238, Add Google-style docstrings to the three helper functions _message_content_to_str, _select_sample_indices_for_logging, and _log_validation_samples_to_wandb: for each function include a short one-line summary, args section with types and descriptions (e.g., content: Any; rewards: list[float]; message_logs: list[LLMMessageLogType]; step: int; num_samples: int; logger: Optional[Logger]), a Returns section describing the return type and meaning, and any Raises or Notes if applicable; ensure the docstrings follow the Google style (triple-quoted, Sphinx-parsable) and attach them immediately above each function definition without changing function signatures or behavior.

adding wandb table log feature, showing concrete test samples

3adb2bf

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

vinhngx requested review from a team as code owners February 24, 2026 04:51

github-actions bot added the documentation Improvements or additions to documentation label Feb 24, 2026

vinhngx changed the title ~~[feat] adding wandb table log feature, showing concrete test samples~~ feat: adding wandb table log feature, showing concrete test samples Feb 24, 2026

coderabbitai bot reviewed Feb 24, 2026

View reviewed changes

github-actions bot added the community-request label Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adding wandb table log feature, showing concrete test samples#2018

feat: adding wandb table log feature, showing concrete test samples#2018
vinhngx wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
vinhngx:wandb-table-log

vinhngx commented Feb 24, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 24, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 24, 2026

Uh oh!

coderabbitai bot Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vinhngx commented Feb 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vinhngx commented Feb 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading