Reuse reader's fieldInfos by viliam-durina · Pull Request #15683 · apache/lucene

viliam-durina · 2026-02-09T15:36:14Z

If an IndexWriter is opened using an IndexCommit with an opened reader (through IndexWriteConfig.setIndexCommit()), the reader's SegmentReaders are reused and no files are re-read, but there are two exceptions: the .fnm file (field infos) is re-read in IndexWriter.getFieldNumberMap(). This in unnecessary, as their contents are already loaded by the reader, and we can reuse this information. This PR modifies the getFieldNumberMap() method to reuse this information.

The other exception is the last segments_N file and the respective .si files which are re-read twice; we don't address this issue here.

This change is important to our use case because we're storing the index on a high-latency remote location and have a custom directory implementation that caches the files locally. The cache works in a simple mode: it caches files when they are opened and releases them when the file is closed, so every unnecessary file re-opening is harmful. This is greatly aggravated with compound files, which we always use, as the whole compound data file is reopened and the cfe file re-loaded. However, we hope this change is beneficial for Lucene in general, as it avoids duplicate re-reading of information we already have loaded.

If an `IndexWriter` is opened using an `IndexCommit` with an opened reader (through `IndexWriteConfig.setIndexCommit()`), the reader's `SegmentReader`s are reused and no files are re-read, but there are two exceptions: the `.fnm` file (field infos) is re-read in `IndexWriter.getFieldNumberMap()`. This in unnecessary, as their contents are already loaded by the reader, and we can reuse this information. This PR modifies the `getFieldNumberMap()` method to reuse this information. The other exception is the last `segments_N` file and the respective `.si` files which are re-read twice; we don't address this issue here. This change is important to our use case because we're storing the index on a high-latency remote location and have a custom directory implementation that caches the files locally. The cache works in a simple mode: it caches files when they are opened and releases them when the file is closed, so every unnecessary file re-opening is harmful. This is greatly aggravated with compound files, which we always use, as the whole compound data file is reopened and the `cfe` file re-loaded. However, we hope this change is beneficial for Lucene in general, as it avoids duplicate re-reading of information we already have loaded.

# Conflicts: # lucene/CHANGES.txt

viliam-durina added 2 commits February 9, 2026 16:33

Tidy

70a0d41

github-actions bot added the module:core/index label Feb 9, 2026

Add changelog

3001547

github-actions bot added this to the 11.0.0 milestone Feb 9, 2026

Merge branch 'main' into reuse-reader's-fieldinfos

bdfb254

# Conflicts: # lucene/CHANGES.txt

viliam-durina changed the title ~~Reuse reader's fieldinfos~~ Reuse reader's fieldInfos Feb 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse reader's fieldInfos#15683

Reuse reader's fieldInfos#15683
viliam-durina wants to merge 4 commits intoapache:mainfrom
viliam-durina:reuse-reader's-fieldinfos

viliam-durina commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

viliam-durina commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant