Enable pipeline model parallelism for Evo2 inference by kjaniknvidia · Pull Request #1478 · NVIDIA/bionemo-framework

kjaniknvidia · 2026-02-20T22:41:54Z

Remove the PP > 1 guard, argparse choices=[1] restriction, and hardcoded pre_process/post_process=True so the model provider auto-detects pipeline stage. Tested with PP=1, PP=2, and PP=5.

Description

For the most part I just removed the guarding that forces PP=1. There's only one functional line change.

Line 257 — Removed the if pipeline_model_parallel_size != 1: raise ValueError(...) guard (3 lines deleted)
Line 334 — Changed model_provider.provide(pre_process=True, post_process=True) to model_provider.provide() so each pipeline stage auto-detects whether it needs embedding/output layers
Line 508 — Removed choices=[1] from the --pipeline-model-parallel-size argparse argument
Lines 245, 553 — Updated docstrings removing "(must be 1)"

Usage

torchrun --nproc-per-node 2 /workspace/bionemo/src/bionemo/evo2/run/infer.py
--ckpt-dir /workspace/bionemo/evo2_1b_8k_bf16_mbridge
--prompt "ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG"
--max-new-tokens 10
--top-k 1
--temperature 1.0
--pipeline-model-parallel-size 2

torchrun --nproc-per-node 5 /workspace/bionemo/src/bionemo/evo2/run/infer.py
--ckpt-dir /workspace/bionemo/evo2_1b_8k_bf16_mbridge
--prompt "ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG"
--max-new-tokens 10
--top-k 1
--temperature 1.0
--pipeline-model-parallel-size 5

│ PP=1 inference (1 GPU) PASS ATCGATCGAT │
│ PP=2 inference (2 GPUs) PASS ATCGATCGAT │
│ PP=5 inference (5 GPUs) PASS ATCGATCGAT │

Type of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Refactor
Documentation update
Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run.

ciflow:skip - Skip all CI tests for this PR

Unit tests marked as @pytest.mark.multi_gpu or @pytest.mark.distributed are not run in the PR pipeline.

For more details, see CONTRIBUTING

Note

By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
/ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Triggering Code Rabbit AI Review

To trigger a code review from code rabbit, comment on a pull request with one of these commands:

@coderabbitai review - Triggers a standard review
@coderabbitai full review - Triggers a comprehensive review

See https://docs.coderabbit.ai/reference/review-commands for a full list of commands.

Pre-submit Checklist

I have tested these changes locally
I have updated the documentation accordingly
I have added/updated tests as needed
All existing tests pass successfully

copy-pr-bot · 2026-02-20T22:41:58Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-20T22:42:06Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Remove the PP > 1 guard, argparse choices=[1] restriction, and hardcoded pre_process/post_process=True so the model provider auto-detects pipeline stage. Tested with PP=1, PP=2, and PP=5. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ken Janik <kjanik@nvidia.com>

bionemo-recipes/recipes/evo2_megatron/src/bionemo/evo2/run/infer.py

jstjohn

Approve with one comment to address.

Signed-off-by: Ken Janik <kjanik@nvidia.com>

kjaniknvidia requested review from cspades, dorotat-nv, jomitchellnv, jstjohn, jwilber, pstjohn, savitha-eng and trvachov as code owners February 20, 2026 22:41

kjaniknvidia force-pushed the feat/pp_infer branch from 6abb40b to cd04464 Compare February 23, 2026 18:44

jstjohn reviewed Feb 23, 2026

View reviewed changes

bionemo-recipes/recipes/evo2_megatron/src/bionemo/evo2/run/infer.py Show resolved Hide resolved

jstjohn approved these changes Feb 23, 2026

View reviewed changes

kjaniknvidia added 2 commits February 23, 2026 16:37

Added tests for PP combinations. Removed the skip CI for 1B model tests.

d061db4

Signed-off-by: Ken Janik <kjanik@nvidia.com>

Merge branch 'main' into feat/pp_infer

d5fe08b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Enable pipeline model parallelism for Evo2 inference#1478

Enable pipeline model parallelism for Evo2 inference#1478
kjaniknvidia wants to merge 3 commits intoNVIDIA:mainfrom
kjaniknvidia:feat/pp_infer

kjaniknvidia commented Feb 20, 2026

Uh oh!

copy-pr-bot bot commented Feb 20, 2026

Uh oh!

coderabbitai bot commented Feb 20, 2026 •

edited

Loading

Review skipped

Uh oh!

Uh oh!

jstjohn left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

kjaniknvidia commented Feb 20, 2026

Description

Usage

Type of changes

CI Pipeline Configuration

Authorizing CI Runs

Triggering Code Rabbit AI Review

Pre-submit Checklist

Uh oh!

copy-pr-bot bot commented Feb 20, 2026

Uh oh!

coderabbitai bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Uh oh!

jstjohn left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Feb 20, 2026 •

edited

Loading