Skip to content

Reuse reader's fieldInfos#15683

Open
viliam-durina wants to merge 4 commits intoapache:mainfrom
viliam-durina:reuse-reader's-fieldinfos
Open

Reuse reader's fieldInfos#15683
viliam-durina wants to merge 4 commits intoapache:mainfrom
viliam-durina:reuse-reader's-fieldinfos

Conversation

@viliam-durina
Copy link
Contributor

If an IndexWriter is opened using an IndexCommit with an opened reader (through IndexWriteConfig.setIndexCommit()), the reader's SegmentReaders are reused and no files are re-read, but there are two exceptions: the .fnm file (field infos) is re-read in IndexWriter.getFieldNumberMap(). This in unnecessary, as their contents are already loaded by the reader, and we can reuse this information. This PR modifies the getFieldNumberMap() method to reuse this information.

The other exception is the last segments_N file and the respective .si files which are re-read twice; we don't address this issue here.

This change is important to our use case because we're storing the index on a high-latency remote location and have a custom directory implementation that caches the files locally. The cache works in a simple mode: it caches files when they are opened and releases them when the file is closed, so every unnecessary file re-opening is harmful. This is greatly aggravated with compound files, which we always use, as the whole compound data file is reopened and the cfe file re-loaded. However, we hope this change is beneficial for Lucene in general, as it avoids duplicate re-reading of information we already have loaded.

If an `IndexWriter` is opened using an `IndexCommit` with an opened reader (through `IndexWriteConfig.setIndexCommit()`), the reader's `SegmentReader`s are reused and no files are re-read, but there are two exceptions: the `.fnm` file (field infos) is re-read in `IndexWriter.getFieldNumberMap()`. This in unnecessary, as their contents are already loaded by the reader, and we can reuse this information. This PR modifies the `getFieldNumberMap()` method to reuse this information.

The other exception is the last `segments_N` file and the respective `.si` files which are re-read twice; we don't address this issue here.

This change is important to our use case because we're storing the index on a high-latency remote location and have a custom directory implementation that caches the files locally. The cache works in a simple mode: it caches files when they are opened and releases them when the file is closed, so every unnecessary file re-opening is harmful. This is greatly aggravated with compound files, which we always use, as the whole compound data file is reopened and the `cfe` file re-loaded. However, we hope this change is beneficial for Lucene in general, as it avoids duplicate re-reading of information we already have loaded.
@github-actions github-actions bot added this to the 11.0.0 milestone Feb 9, 2026
# Conflicts:
#	lucene/CHANGES.txt
@viliam-durina viliam-durina changed the title Reuse reader's fieldinfos Reuse reader's fieldInfos Feb 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant