-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[ntuple] Address some RNTupleProcessor performance bottlenecks #22593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -347,13 +347,24 @@ ROOT::Experimental::RNTupleChainProcessor::AddFieldToEntry(const std::string &fi | |
|
|
||
| ROOT::NTupleSize_t ROOT::Experimental::RNTupleChainProcessor::LoadEntry(ROOT::NTupleSize_t entryNumber) | ||
| { | ||
| ROOT::NTupleSize_t localEntryNumber = entryNumber; | ||
| std::size_t currProcessorNumber = 0; | ||
| // If the requested entry number is lower than the current entry number, we have to again localise the correct local | ||
| // entry number starting from the first processor in the chain. Otherwise, we can continue looking from the inner | ||
| // processor that is currently connected, which is much faster when the chain consists of many inner processors. | ||
| if (entryNumber < fCurrentEntryNumber) { | ||
| fCurrentProcessorNumber = 0; | ||
| ConnectInnerProcessor(fCurrentProcessorNumber); | ||
| } | ||
|
|
||
| std::size_t currProcessorNumber = fCurrentProcessorNumber; | ||
| ROOT::NTupleSize_t entriesSeen = 0; | ||
| for (unsigned i = 0; i < currProcessorNumber; ++i) { | ||
| if (fInnerNEntries[i] == kInvalidNTupleIndex) { | ||
| fInnerNEntries[i] = fInnerProcessors[i]->GetNEntries(); | ||
| } | ||
| entriesSeen += fInnerNEntries[i]; | ||
| } | ||
|
Comment on lines
+358
to
+365
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not for this PR, but in principle we could have a cache vector of number of entries per processor which is filled lazily at discovery time whenever a processor needs to connect to file(s)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually this is exactly what is done a few lines down and somehow didn't think to do it here, so thanks for pointing this out :D. Let me quickly add it here as well. |
||
| ROOT::NTupleSize_t localEntryNumber = entryNumber - entriesSeen; | ||
|
|
||
| // As long as the entry fails to load from the current processor, we decrement the local entry number with the number | ||
| // of entries in this processor and try with the next processor until we find the correct local entry number. | ||
| while (fInnerProcessors[currProcessorNumber]->LoadEntry(localEntryNumber) == kInvalidNTupleIndex) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be speed up in case the
entryNumberis less than thefCurrentEntryNumberbut more than thestarting entry number of the current file? (and/or is the set of lengths cached and thus fast to go through again?)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the set of lengths is cached:
root/tree/ntuple/src/RNTupleProcessor.cxx
Lines 360 to 365 in d7d839b