Open
Conversation
If an `IndexWriter` is opened using an `IndexCommit` with an opened reader (through `IndexWriteConfig.setIndexCommit()`), the reader's `SegmentReader`s are reused and no files are re-read, but there are two exceptions: the `.fnm` file (field infos) is re-read in `IndexWriter.getFieldNumberMap()`. This in unnecessary, as their contents are already loaded by the reader, and we can reuse this information. This PR modifies the `getFieldNumberMap()` method to reuse this information. The other exception is the last `segments_N` file and the respective `.si` files which are re-read twice; we don't address this issue here. This change is important to our use case because we're storing the index on a high-latency remote location and have a custom directory implementation that caches the files locally. The cache works in a simple mode: it caches files when they are opened and releases them when the file is closed, so every unnecessary file re-opening is harmful. This is greatly aggravated with compound files, which we always use, as the whole compound data file is reopened and the `cfe` file re-loaded. However, we hope this change is beneficial for Lucene in general, as it avoids duplicate re-reading of information we already have loaded.
# Conflicts: # lucene/CHANGES.txt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
If an
IndexWriteris opened using anIndexCommitwith an opened reader (throughIndexWriteConfig.setIndexCommit()), the reader'sSegmentReaders are reused and no files are re-read, but there are two exceptions: the.fnmfile (field infos) is re-read inIndexWriter.getFieldNumberMap(). This in unnecessary, as their contents are already loaded by the reader, and we can reuse this information. This PR modifies thegetFieldNumberMap()method to reuse this information.The other exception is the last
segments_Nfile and the respective.sifiles which are re-read twice; we don't address this issue here.This change is important to our use case because we're storing the index on a high-latency remote location and have a custom directory implementation that caches the files locally. The cache works in a simple mode: it caches files when they are opened and releases them when the file is closed, so every unnecessary file re-opening is harmful. This is greatly aggravated with compound files, which we always use, as the whole compound data file is reopened and the
cfefile re-loaded. However, we hope this change is beneficial for Lucene in general, as it avoids duplicate re-reading of information we already have loaded.