[ntuple] Split RNTupleProcessor into RNTupleComposer and RNTupleProcessor#22615
Draft
enirolf wants to merge 3 commits into
Draft
[ntuple] Split RNTupleProcessor into RNTupleComposer and RNTupleProcessor#22615enirolf wants to merge 3 commits into
RNTupleProcessor into RNTupleComposer and RNTupleProcessor#22615enirolf wants to merge 3 commits into
Conversation
The composition of RNTuples can also serve as the backend for other data loading interfaces, most notably `RDataSource`, which is fully separate from the processing interface offered by `RNTupleProcessor`. It therefore makes sense to split both components into their own class.
...to reflect the separation between `RNTupleProcessor` and `RNTupleComposer`.
Test Results 21 files 21 suites 3d 11h 34m 42s ⏱️ Results for commit 8ced87e. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale
During the implementation of the
RNTupleProcessor-basedRDataSource, I realized that in this case, no processing is done by theRNTupleProcessoritself anymore, since this is already taken care of by RDF. From this point of view, I believe it makes sense to have a separation of concerns between the orchestration of the RNTuple compositions and the actual processor thereof, which is what this PR addresses.Functionally, the changes are minimal. The bulk of the
RNTupleProcessorclass is moved to a new class, theRNTupleComposer, with only the iterator remaining. TheRNTupleProcessoris now created by passing a reference to anRNTupleComposer.The
RNTupleComposercan now also serve as the backend for other data loading interfaces, most notablyRDataSourceas mentioned previously. In a similar vein, this could potentially open the possibility of merging theRNTupleProcessorinto theRNTupleReader, but this requires further investigation.Old interface
std::vector<RNTupleOpenSpec> ntuples = {{"ntuple1", "ntuple1.root"}, {"ntuple2", "ntuple2.root"}}; auto processor = RNTupleProcessor::CreateChain(ntuples); auto pt = processor->RequestField<float>("pt"); for (const auto idx : *processor) { std::cout << "event = " << idx << ", pt = " << *pt << std::endl; } std::cout << "processed " << processor->GetNEntriesProcessed() << " events" << std::endl;New interface
std::vector<RNTupleOpenSpec> ntuples = {{"ntuple1", "ntuple1.root"}, {"ntuple2", "ntuple2.root"}}; auto composer = RNTupleComposer::CreateChain(ntuples); auto pt = composer->RequestField<float>("pt"); RNTupleProcessor processor(*composer); for (const auto idx : processor) { std::cout << "event = " << idx << ", pt = " << *pt << std::endl; } std::cout << "processed " << processor.GetNEntriesProcessed() << " events" << std::endl;