Conversation
Greptile SummaryAdded comprehensive documentation for the PhysicsNeMo datapipes module, including new RST files for benchmarks, CAE, climate, GNN, readers, transforms, and the main datapipes page. The documentation includes usage examples, architecture diagrams, and API references. Key changes:
Issues found:
Important Files Changed
Last reviewed commit: 0897f6d |
|
|
||
| The DoMINO DataPipe reads the DrivearML dataset, and other datasets, for | ||
| the DoMINO model for external aerodynamics. The expected format of inputs can | ||
| be acheived using PhysicsNeMo-Curator. |
There was a problem hiding this comment.
'acheived' is misspelled
| be acheived using PhysicsNeMo-Curator. | |
| be achieved using PhysicsNeMo-Curator. |
|
|
||
| .. automodule:: physicsnemo.datapipes.cae.transolver_datapipe | ||
| :members: | ||
| :show-inheritance: No newline at end of file |
There was a problem hiding this comment.
missing newline at end of file
| :show-inheritance: | |
| :show-inheritance: | |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| are described in the climate, cae, gnn, and | ||
| benchmark subsections. | ||
|
|
||
| In PhysicsNeMo v2.0, the datapipes API has been redesign from scratch to focus |
There was a problem hiding this comment.
'redesign' should be 'redesigned' (past tense)
| In PhysicsNeMo v2.0, the datapipes API has been redesign from scratch to focus | |
| In PhysicsNeMo v2.0, the datapipes API has been redesigned from scratch to focus |
| - Composability and Extensibility: We aim to provide a tool kit and examples that | ||
| lets you build what you need yourself, easily, if it's not here. | ||
| - Datapipes as configuration: Changing a pipeline shouldn't require source code | ||
| modification; the registry system in PhysicsNeMo datapipes enables hydra instantition |
There was a problem hiding this comment.
'instantition' is misspelled
| modification; the registry system in PhysicsNeMo datapipes enables hydra instantition | |
| modification; the registry system in PhysicsNeMo datapipes enables hydra instantiation |
|
|
||
| At the highest level, `physicsnemo.datapipes.DataLoader` has a similar API and | ||
| model as `pytorch.utils.data.DataLoader`, enabling a drop-in replacement in many | ||
| cases. Under the hood, physicsnemo follows a very differnt computation orchestration. |
There was a problem hiding this comment.
'differnt' is misspelled
| cases. Under the hood, physicsnemo follows a very differnt computation orchestration. | |
| cases. Under the hood, physicsnemo follows a very different computation orchestration. |
| ^^^^^^^^^ | ||
|
|
||
| Combining a set of tensordict objects into a batch of data can, at times, | ||
| be complex. Some dataset, like graph datasets, require special care. For |
There was a problem hiding this comment.
'dataset' should be plural to match context
| be complex. Some dataset, like graph datasets, require special care. For | |
| Some datasets, like graph datasets, require special care. For |
| be complex. Some dataset, like graph datasets, require special care. For | ||
| this reason, PhysicsNeMo datapipes offers custom collation functions | ||
| as well as an interface to write your own collator. If the dataset you are | ||
| trying to collate can not be accomodated here, please open an issue on github. |
There was a problem hiding this comment.
'accomodated' is misspelled
| trying to collate can not be accomodated here, please open an issue on github. | |
| trying to collate can not be accommodated here, please open an issue on github. |
| By default, transforms accept and return `tensordict` objects: this is not, | ||
| strictly, a requirement that must be enforced. | ||
| If you implement custom transforms that return different data types, downstream | ||
| transforms should expect that data type. On example of this, found in the |
There was a problem hiding this comment.
'On example' should be 'One example'
| transforms should expect that data type. On example of this, found in the | |
| transforms should expect that data type. One example of this, found in the |
ktangsali
left a comment
There was a problem hiding this comment.
Looks good, have a few minor comments.
| =================== | ||
|
|
||
| The Benchmark Datapipes are targeted v1 datapipes for specific datasets. These | ||
| are largely maintained but not actively developed. |
There was a problem hiding this comment.
I think it would be good to highlight here that the datapipes here are generating the data "on the fly". i.e., they are not loading anything from file system and hence are useful for some quick model development and testing. I think the individual docstrings call this out, but might be a good idea to highlight it here.
| The VortexSheddingDataset processes flow field data around bluff bodies, | ||
| capturing vortex shedding patterns and flow structures for graph-based learning. | ||
| The VortexSheddingDataset is used in the VortexShedding CFD examples. | ||
|
|
There was a problem hiding this comment.
Should we call out why these are "datasets" and not "datapipes"?
|
|
||
| Readers handle IO exclusively - it is highly encouraged, if you are building a | ||
| a custom datapipe, to implement transforms as separate operations. This will | ||
| enable GPU computations and composable, extensible pipelines. |
There was a problem hiding this comment.
Are we saying that the dataset operations are better if they are done after the data is already on GPU? I think having these ops happen async would be beneficial from perf standpoint right?
|
|
||
| The input to a `transform` is **mutable** by default, and so the order of transformations matters. | ||
|
|
||
| In general, transforms are transactional: take input in, manipulate it, return input out, and almost never update state. Transforms should be device-agnostic, |
| .. BEGIN transforms-intro | ||
| .. Suggested outline: | ||
| .. - All transforms implement the :class:`~physicsnemo.datapipes.transforms.base.Transform` ABC | ||
| .. - Designed for GPU tensors (device transfer happens before transforms in Dataset) | ||
| .. - Composable: pass a list to Dataset or use Compose explicitly | ||
| .. - Transforms with internal tensor state are auto-moved via .to(device) | ||
| .. END transforms-intro |
| ) | ||
|
|
||
| # 1. Choose a Reader for your data format | ||
| reader = HDF5Reader( |
There was a problem hiding this comment.
Can we use some real data here? might make it easy for users to follow. Or something that we prepare (e.g. here: https://github.com/NVIDIA/physicsnemo/blob/main/examples/cfd/navier_stokes_rnn/navier_stokes_rnn.py). that will help clarify how the h5 should be structured to be able to use this reader.
What do you think?
PhysicsNeMo Pull Request
This PR updates the Datapipes API docs. The idea here was to break down the datapipes API into several pages that are smaller and more consumable. I separated the "legacy" datapipes (v1 format) and the composable datapipes (v2 format). The main intro page gives the high level summary and there are details on the subsequent pages.
Hard to review until it's rendered, but hard to render without the docs build ...
Description
Checklist
Dependencies
Review Process
All PRs are reviewed by the PhysicsNeMo team before merging.
Depending on which files are changed, GitHub may automatically assign a maintainer for review.
We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.
AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.