Update Datapipes API by coreyjadams · Pull Request #1468 · NVIDIA/physicsnemo

coreyjadams · 2026-03-02T15:45:08Z

PhysicsNeMo Pull Request

This PR updates the Datapipes API docs. The idea here was to break down the datapipes API into several pages that are smaller and more consumable. I separated the "legacy" datapipes (v1 format) and the composable datapipes (v2 format). The main intro page gives the high level summary and there are details on the subsequent pages.

Hard to review until it's rendered, but hard to render without the docs build ...

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

greptile-apps · 2026-03-02T15:58:32Z

Greptile Summary

Added comprehensive documentation for the PhysicsNeMo datapipes module, including new RST files for benchmarks, CAE, climate, GNN, readers, transforms, and the main datapipes page. The documentation includes usage examples, architecture diagrams, and API references.

Key changes:

Reorganized datapipes documentation into dedicated subdirectory
Added detailed docs for v2.0 redesigned datapipes architecture with GPU-first philosophy
Documented legacy v1 datapipes for backward compatibility
Included quick start guides and custom extension examples

Issues found:

Multiple spelling errors in main documentation files that need correction
Missing newline at end of one file

Important Files Changed

Filename	Overview
docs/api/datapipes/physicsnemo.datapipes.cae.rst	added CAE datapipes docs; has typo on line 18 and missing newline at EOF
docs/api/datapipes/physicsnemo.datapipes.rst	added main datapipes documentation; contains multiple typos (lines 14, 35, 49, 156, 159)
docs/api/datapipes/physicsnemo.datapipes.transforms.rst	added transforms documentation; has typo on line 25

_{Last reviewed commit: 0897f6d}

greptile-apps

_{8 files reviewed, 8 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-03-02T15:58:36Z

docs/api/datapipes/physicsnemo.datapipes.cae.rst

+
+The DoMINO DataPipe reads the DrivearML dataset, and other datasets, for 
+the DoMINO model for external aerodynamics.  The expected format of inputs can
+be acheived using PhysicsNeMo-Curator.


'acheived' is misspelled

Suggested change

be acheived using PhysicsNeMo-Curator.

be achieved using PhysicsNeMo-Curator.

greptile-apps · 2026-03-02T15:58:36Z

docs/api/datapipes/physicsnemo.datapipes.cae.rst

+
+.. automodule:: physicsnemo.datapipes.cae.transolver_datapipe
+    :members:
+    :show-inheritance:


missing newline at end of file

Suggested change

:show-inheritance:

:show-inheritance:

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-03-02T15:58:38Z

docs/api/datapipes/physicsnemo.datapipes.rst

+are described in the climate, cae, gnn, and
+benchmark subsections.
+
+In PhysicsNeMo v2.0, the datapipes API has been redesign from scratch to focus


'redesign' should be 'redesigned' (past tense)

Suggested change

In PhysicsNeMo v2.0, the datapipes API has been redesign from scratch to focus

In PhysicsNeMo v2.0, the datapipes API has been redesigned from scratch to focus

greptile-apps · 2026-03-02T15:58:38Z

docs/api/datapipes/physicsnemo.datapipes.rst

+- Composability and Extensibility: We aim to provide a tool kit and examples that
+  lets you build what you need yourself, easily, if it's not here.
+- Datapipes as configuration: Changing a pipeline shouldn't require source code
+  modification; the registry system in PhysicsNeMo datapipes enables hydra instantition


'instantition' is misspelled

Suggested change

modification; the registry system in PhysicsNeMo datapipes enables hydra instantition

modification; the registry system in PhysicsNeMo datapipes enables hydra instantiation

greptile-apps · 2026-03-02T15:58:40Z

docs/api/datapipes/physicsnemo.datapipes.rst

+
+At the highest level, `physicsnemo.datapipes.DataLoader` has a similar API and
+model as `pytorch.utils.data.DataLoader`, enabling a drop-in replacement in many
+cases.  Under the hood, physicsnemo follows a very differnt computation orchestration.


'differnt' is misspelled

Suggested change

cases. Under the hood, physicsnemo follows a very differnt computation orchestration.

cases. Under the hood, physicsnemo follows a very different computation orchestration.

greptile-apps · 2026-03-02T15:58:41Z

docs/api/datapipes/physicsnemo.datapipes.rst

+^^^^^^^^^
+
+Combining a set of tensordict objects into a batch of data can, at times,
+be complex.  Some dataset, like graph datasets, require special care.  For


'dataset' should be plural to match context

Suggested change

be complex. Some dataset, like graph datasets, require special care. For

Some datasets, like graph datasets, require special care. For

greptile-apps · 2026-03-02T15:58:42Z

docs/api/datapipes/physicsnemo.datapipes.rst

+be complex.  Some dataset, like graph datasets, require special care.  For
+this reason, PhysicsNeMo datapipes offers custom collation functions 
+as well as an interface to write your own collator.  If the dataset you are
+trying to collate can not be accomodated here, please open an issue on github.


'accomodated' is misspelled

Suggested change

trying to collate can not be accomodated here, please open an issue on github.

trying to collate can not be accommodated here, please open an issue on github.

greptile-apps · 2026-03-02T15:58:43Z

docs/api/datapipes/physicsnemo.datapipes.transforms.rst

+By default, transforms accept and return `tensordict` objects: this is not, 
+strictly, a requirement that must be enforced.
+If you implement custom transforms that return different data types, downstream
+transforms should expect that data type.  On example of this, found in the


'On example' should be 'One example'

Suggested change

transforms should expect that data type. On example of this, found in the

transforms should expect that data type. One example of this, found in the

ktangsali

Looks good, have a few minor comments.

ktangsali · 2026-03-04T22:08:59Z

docs/api/datapipes/physicsnemo.datapipes.benchmarks.rst

+===================
+
+The Benchmark Datapipes are targeted v1 datapipes for specific datasets.  These
+are largely maintained but not actively developed.


I think it would be good to highlight here that the datapipes here are generating the data "on the fly". i.e., they are not loading anything from file system and hence are useful for some quick model development and testing. I think the individual docstrings call this out, but might be a good idea to highlight it here.

ktangsali · 2026-03-04T22:12:01Z

docs/api/datapipes/physicsnemo.datapipes.gnn.rst

+The VortexSheddingDataset processes flow field data around bluff bodies,
+capturing vortex shedding patterns and flow structures for graph-based learning.
+The VortexSheddingDataset is used in the VortexShedding CFD examples.
+


Should we call out why these are "datasets" and not "datapipes"?

ktangsali · 2026-03-04T22:19:30Z

docs/api/datapipes/physicsnemo.datapipes.readers.rst

+
+Readers handle IO exclusively - it is highly encouraged, if you are building a
+a custom datapipe, to implement transforms as separate operations.  This will
+enable GPU computations and composable, extensible pipelines.


Are we saying that the dataset operations are better if they are done after the data is already on GPU? I think having these ops happen async would be beneficial from perf standpoint right?

ktangsali · 2026-03-04T22:33:14Z

docs/api/datapipes/physicsnemo.datapipes.transforms.rst

+
+The input to a `transform` is **mutable** by default, and so the order of transformations matters. 
+
+In general, transforms are transactional: take input in, manipulate it, return input out, and almost never update state.  Transforms should be device-agnostic,


nit: output out?

ktangsali · 2026-03-04T22:36:07Z

docs/api/datapipes/physicsnemo.datapipes.transforms.rst

+.. BEGIN transforms-intro
+.. Suggested outline:
+..   - All transforms implement the :class:`~physicsnemo.datapipes.transforms.base.Transform` ABC
+..   - Designed for GPU tensors (device transfer happens before transforms in Dataset)
+..   - Composable: pass a list to Dataset or use Compose explicitly
+..   - Transforms with internal tensor state are auto-moved via .to(device)
+.. END transforms-intro


Is this left over?

ktangsali · 2026-03-04T22:42:33Z

docs/api/datapipes/physicsnemo.datapipes.rst

+    )
+
+    # 1. Choose a Reader for your data format
+    reader = HDF5Reader(


Can we use some real data here? might make it easy for users to follow. Or something that we prepare (e.g. here: https://github.com/NVIDIA/physicsnemo/blob/main/examples/cfd/navier_stokes_rnn/navier_stokes_rnn.py). that will help clarify how the h5 should be structured to be able to use this reader.

What do you think?

Trying again with datapipes check in

0897f6d

coreyjadams marked this pull request as ready for review March 2, 2026 15:55

coreyjadams requested review from ktangsali and megnvidia as code owners March 2, 2026 15:55

greptile-apps bot reviewed Mar 2, 2026

View reviewed changes

coreyjadams changed the title ~~Trying again with datapipes check in~~ Update Datapipes API Mar 2, 2026

ktangsali approved these changes Mar 4, 2026

View reviewed changes

	be acheived using PhysicsNeMo-Curator.
	be achieved using PhysicsNeMo-Curator.

	In PhysicsNeMo v2.0, the datapipes API has been redesign from scratch to focus
	In PhysicsNeMo v2.0, the datapipes API has been redesigned from scratch to focus

	modification; the registry system in PhysicsNeMo datapipes enables hydra instantition
	modification; the registry system in PhysicsNeMo datapipes enables hydra instantiation

	cases. Under the hood, physicsnemo follows a very differnt computation orchestration.
	cases. Under the hood, physicsnemo follows a very different computation orchestration.

	be complex. Some dataset, like graph datasets, require special care. For
	Some datasets, like graph datasets, require special care. For

	trying to collate can not be accomodated here, please open an issue on github.
	trying to collate can not be accommodated here, please open an issue on github.

	transforms should expect that data type. On example of this, found in the
	transforms should expect that data type. One example of this, found in the


		The input to a `transform` is mutable by default, and so the order of transformations matters.

		In general, transforms are transactional: take input in, manipulate it, return input out, and almost never update state. Transforms should be device-agnostic,

Conversation

coreyjadams commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

Uh oh!

greptile-apps bot commented Mar 2, 2026

Greptile Summary

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

ktangsali left a comment

Choose a reason for hiding this comment

Uh oh!

ktangsali Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

ktangsali Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

ktangsali Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

ktangsali Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

ktangsali Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

ktangsali Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coreyjadams commented Mar 2, 2026 •

edited

Loading