Skip to content

[ntuple] Split RNTupleProcessor into RNTupleComposer and RNTupleProcessor#22615

Draft
enirolf wants to merge 3 commits into
root-project:masterfrom
enirolf:ntuple-composer
Draft

[ntuple] Split RNTupleProcessor into RNTupleComposer and RNTupleProcessor#22615
enirolf wants to merge 3 commits into
root-project:masterfrom
enirolf:ntuple-composer

Conversation

@enirolf

@enirolf enirolf commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Rationale

During the implementation of the RNTupleProcessor-based RDataSource, I realized that in this case, no processing is done by the RNTupleProcessor itself anymore, since this is already taken care of by RDF. From this point of view, I believe it makes sense to have a separation of concerns between the orchestration of the RNTuple compositions and the actual processor thereof, which is what this PR addresses.

Functionally, the changes are minimal. The bulk of the RNTupleProcessor class is moved to a new class, the RNTupleComposer, with only the iterator remaining. The RNTupleProcessor is now created by passing a reference to an RNTupleComposer.

The RNTupleComposer can now also serve as the backend for other data loading interfaces, most notably RDataSource as mentioned previously. In a similar vein, this could potentially open the possibility of merging the RNTupleProcessor into the RNTupleReader, but this requires further investigation.

Old interface

std::vector<RNTupleOpenSpec> ntuples = {{"ntuple1", "ntuple1.root"}, {"ntuple2", "ntuple2.root"}};
auto processor = RNTupleProcessor::CreateChain(ntuples);

auto pt = processor->RequestField<float>("pt");

for (const auto idx : *processor) {
   std::cout << "event = " << idx << ", pt = " << *pt << std::endl;
}

std::cout << "processed " << processor->GetNEntriesProcessed() << " events" << std::endl; 

New interface

std::vector<RNTupleOpenSpec> ntuples = {{"ntuple1", "ntuple1.root"}, {"ntuple2", "ntuple2.root"}};
auto composer = RNTupleComposer::CreateChain(ntuples);

auto pt = composer->RequestField<float>("pt");

RNTupleProcessor processor(*composer);

for (const auto idx : processor) {
   std::cout << "event = " << idx << ", pt = " << *pt << std::endl;
}

std::cout << "processed " << processor.GetNEntriesProcessed() << " events" << std::endl; 

enirolf added 3 commits June 15, 2026 15:27
The composition of RNTuples can also serve as the backend for other
data loading interfaces, most notably `RDataSource`, which is fully
separate from the processing interface offered by `RNTupleProcessor`.
It therefore makes sense to split both components into their own class.
...to reflect the separation between `RNTupleProcessor` and
  `RNTupleComposer`.
@enirolf enirolf requested review from hahnjo, pcanal and vepadulano June 15, 2026 13:29
@enirolf enirolf self-assigned this Jun 15, 2026
@enirolf enirolf marked this pull request as draft June 15, 2026 13:29
@github-actions

Copy link
Copy Markdown

Test Results

    21 files      21 suites   3d 11h 34m 42s ⏱️
 3 867 tests  3 867 ✅ 0 💤 0 ❌
73 568 runs  73 568 ✅ 0 💤 0 ❌

Results for commit 8ced87e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant