test: add a diagnostic script for prefix caching naning by terrykong · Pull Request #1987 · NVIDIA-NeMo/RL

terrykong · 2026-02-18T21:18:08Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

Documentation
- Added diagnostic guidance for troubleshooting prefix caching NaN logprobs issues.
New Features
- Introduced a diagnostic tool to validate prefix caching behavior and detect NaN logprobs.

coderabbitai · 2026-02-18T21:23:39Z

📝 Walkthrough

Walkthrough

Documentation updated to include a new diagnostic section for prefix caching NaN logprobs validation. A new Python script added to tools/model_diagnostics/ that reproduces and validates prefix caching behavior in vLLM, including multi-iteration generation with prefix cache reuse and NaN logprob detection.

Changes

Cohort / File(s)	Summary
Documentation `docs/adding-new-models.md`	Added diagnostic section documenting prefix caching NaN logprobs test case with usage instructions, expected outputs for different vLLM versions, and explanation of underlying prefix caching bug.
Diagnostic Tool `tools/model_diagnostics/5.prefix_caching_nan.py`	New script to reproduce prefix caching behavior: instantiates LLM with prefix caching, performs two-iteration generation (initial build, then cache reuse), detects NaN logprobs, and asserts expected behavior.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested labels

CI:docs

Suggested reviewers

chtruong814

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title contains a typo ('naning' instead of 'NaN') and lacks clarity about what the diagnostic script addresses, making it unclear despite referencing the main change.	Correct the typo to 'test: add a diagnostic script for prefix caching NaN logprobs' to accurately describe the script's purpose of detecting NaN values in prefix caching.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes	✅ Passed	PR adds minor diagnostic utilities and documentation; no major production code changes or breaking changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch tk/vllm-nan-diagnostic

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

tools/model_diagnostics/5.prefix_caching_nan.py (2)
75-84: Unconditional break silently under-counts NaNs if logprobs > 1.

The break at line 84 exits after inspecting only the first token-id entry per step, regardless of whether a NaN was found. With logprobs=1 and temperature=0.0 (greedy), each step's dict has exactly one entry so this is functionally correct today. However, vLLM returns up to logprobs+1 elements per step, meaning if logprobs is ever bumped above 1, the counter would silently undercount NaN occurrences (at most 1 per step). Removing the break makes the intent clear and future-proof:
♻️ Proposed fix
     for _tid, lp_obj in step.items():
         lp = lp_obj.logprob if hasattr(lp_obj, "logprob") else lp_obj
         if isinstance(lp, float) and math.isnan(lp):
             nan_count += 1
-        break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/model_diagnostics/5.prefix_caching_nan.py` around lines 75 - 84, The
loop over out2.logprobs currently contains an unconditional break after
inspecting the first token-id entry, which causes under-counting NaNs when a
step contains multiple entries; remove the break so the inner loop over
step.items() examines every lp_obj (keep existing hasattr(lp_obj, "logprob")
check and NaN detection for lp) so nan_count increments for every NaN in all
token-id entries rather than just the first one per step.
35-41: Consider consolidating under if __name__ == "__main__": and moving imports to the top.

All module-level logic (argparse, LLM instantiation, generation) runs unconditionally on import. A __main__ guard is the standard protection against accidental execution when scripts are discovered by tooling. Additionally, the vllm import (lines 40–41) appears mid-file after parse_args() — while this speeds up --help, it diverges from PEP 8 and can surprise readers.
♻️ Suggested structure
 import argparse
 import math
+from vllm import LLM, SamplingParams
+import vllm

 MODEL = "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16"
 ...

-parser = argparse.ArgumentParser()
-parser.add_argument("--model", type=str, default=MODEL)
-parser.add_argument("--tp", type=int, default=TP)
-args = parser.parse_args()
-
-import vllm
-from vllm import LLM, SamplingParams
-
-print(f"vLLM version: {vllm.__version__}")
-...
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--model", type=str, default=MODEL)
+    parser.add_argument("--tp", type=int, default=TP)
+    args = parser.parse_args()
+
+    print(f"vLLM version: {vllm.__version__}")
+    ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/model_diagnostics/5.prefix_caching_nan.py` around lines 35 - 41, Move
all runtime logic (argparse setup and calls using parser/args, LLM
instantiation, and generation) under a guarded block: wrap code that calls
parser.parse_args(), creates the vllm LLM and SamplingParams, and runs
generation inside if __name__ == "__main__":. Also relocate imports (import vllm
and from vllm import LLM, SamplingParams) to the top of the file with other
imports to follow PEP8; keep only lightweight module-level constants like MODEL
and TP outside the guard and ensure no heavy side-effect code runs at import
time.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/adding-new-models.md`:
- Around line 330-344: The two fenced code blocks under "Expected pass output
(vLLM 0.13.0)" and "Expected fail output (vLLM 0.14.0)" are missing language
specifiers causing markdownlint MD040; update both fences to include a language
(e.g., use ```text) so the pass-output and fail-output blocks explicitly start
with ```text and end with ``` to satisfy MD040 and preserve formatting.

In `@tools/model_diagnostics/5.prefix_caching_nan.py`:
- Line 87: The print statement currently uses an unnecessary f-string: locate
the call print(f"\n  Sample logprobs from iteration 2:") and remove the leading
"f" so it becomes a plain string literal; this eliminates the Ruff F541 spurious
f-string warning without changing behavior.
- Around line 29-75: The module defines several module-level mutable bindings
(parser, args, numbers, prompt, llm, sampling_params, out1, out2, nan_count)
which violate the global naming guideline; wrap all runtime code that creates or
mutates these symbols inside an if __name__ == "__main__": block so they become
local to main (keep MODEL, TP, MAX_TOKENS, MAX_MODEL_LEN, COUNT_UP_TO and
imports at module scope), e.g., move creation of
argparse.ArgumentParser()/parser, args = parser.parse_args(), numbers, prompt
construction, LLM() instantiation, SamplingParams(), the two generate calls that
produce out1/out2, and nan_count into that guard; alternatively if you must keep
any of them global, rename using upper snake_case with the G_ prefix (e.g.,
G_PARSER, G_LLM) to satisfy the guideline.

---

Nitpick comments:
In `@tools/model_diagnostics/5.prefix_caching_nan.py`:
- Around line 75-84: The loop over out2.logprobs currently contains an
unconditional break after inspecting the first token-id entry, which causes
under-counting NaNs when a step contains multiple entries; remove the break so
the inner loop over step.items() examines every lp_obj (keep existing
hasattr(lp_obj, "logprob") check and NaN detection for lp) so nan_count
increments for every NaN in all token-id entries rather than just the first one
per step.
- Around line 35-41: Move all runtime logic (argparse setup and calls using
parser/args, LLM instantiation, and generation) under a guarded block: wrap code
that calls parser.parse_args(), creates the vllm LLM and SamplingParams, and
runs generation inside if __name__ == "__main__":. Also relocate imports (import
vllm and from vllm import LLM, SamplingParams) to the top of the file with other
imports to follow PEP8; keep only lightweight module-level constants like MODEL
and TP outside the guard and ensure no heavy side-effect code runs at import
time.

docs/adding-new-models.md

tools/model_diagnostics/5.prefix_caching_nan.py

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong requested review from a team as code owners February 18, 2026 21:18

github-actions bot added the documentation Improvements or additions to documentation label Feb 18, 2026

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

docs/adding-new-models.md Show resolved Hide resolved

tools/model_diagnostics/5.prefix_caching_nan.py Show resolved Hide resolved

tools/model_diagnostics/5.prefix_caching_nan.py Outdated Show resolved Hide resolved

terrykong added the CI:docs Run doctest label Feb 18, 2026

terrykong temporarily deployed to nemo-ci February 18, 2026 21:38 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci February 18, 2026 21:42 — with GitHub Actions Inactive

benchislett mentioned this pull request Feb 19, 2026

[Bug]: Prefix caching failing for Nemotron models vllm-project/vllm#34865

Closed

1 task

yfw previously approved these changes Feb 19, 2026

View reviewed changes

terrykong changed the title ~~tests: add a diagnostic script for prefix caching naning~~ test: add a diagnostic script for prefix caching naning Feb 20, 2026

terrykong added 2 commits February 27, 2026 08:08

tests: add a diagnostic script for prefix caching naning

01fcb08

Signed-off-by: Terry Kong <terryk@nvidia.com>

copyright

5f9f9be

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong force-pushed the tk/vllm-nan-diagnostic branch from c5c0bee to 5f9f9be Compare February 27, 2026 08:09

lint

9cd5425

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong dismissed yfw’s stale review via 9cd5425 February 27, 2026 08:10

terrykong added CI:docs Run doctest and removed CI:docs Run doctest labels Feb 27, 2026

terrykong temporarily deployed to nemo-ci February 27, 2026 08:11 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci February 27, 2026 08:23 — with GitHub Actions Inactive

pyrefly

e75e249

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong added CI:docs Run doctest and removed CI:docs Run doctest labels Feb 27, 2026

terrykong temporarily deployed to nemo-ci February 27, 2026 22:12 — with GitHub Actions Inactive

terrykong enabled auto-merge (squash) February 27, 2026 22:12

terrykong temporarily deployed to nemo-ci February 28, 2026 00:30 — with GitHub Actions Inactive

terrykong requested a review from yuki-97 March 1, 2026 08:43

yuki-97 approved these changes Mar 1, 2026

View reviewed changes

terrykong merged commit ac2c774 into main Mar 1, 2026
42 of 43 checks passed

terrykong deleted the tk/vllm-nan-diagnostic branch March 1, 2026 15:17

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

test: add a diagnostic script for prefix caching naning (#1987)

10bad2a

Signed-off-by: Terry Kong <terryk@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

test: add a diagnostic script for prefix caching naning (#1987)

3964c8c

Signed-off-by: Terry Kong <terryk@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 9, 2026

test: add a diagnostic script for prefix caching naning (#1987)

2db7fbe

Signed-off-by: Terry Kong <terryk@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add a diagnostic script for prefix caching naning#1987

test: add a diagnostic script for prefix caching naning#1987
terrykong merged 4 commits intomainfrom
tk/vllm-nan-diagnostic

terrykong commented Feb 18, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 18, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

terrykong commented Feb 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

terrykong commented Feb 18, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 18, 2026 •

edited

Loading