Skip to content

Conversation

@HoustonPutman
Copy link
Contributor

@HoustonPutman HoustonPutman commented Dec 4, 2025

https://issues.apache.org/jira/browse/SOLR-12074

This is to replace the only functionality that TrieFields outperform PointFields in, namely term or termInSet matching.

Currently this uses the enhancedIndex option, but we should find a better name.

Other side issues, like expanding pointField functionality when this is enabled, will be done afterwards.

@HoustonPutman
Copy link
Contributor Author

HoustonPutman commented Dec 4, 2025

Ok, benchmarking with and without docvalues, enhancedIndex and Point vs Trie:

Benchmark                        Mode  Cnt     Score      Error  Units
NumericSearch.intDvEnhancedSet  thrpt    5  4495.128 ±  867.927  ops/s
NumericSearch.intDvSet          thrpt    5  2858.541 ±  594.706  ops/s
NumericSearch.intEnhancedSet    thrpt    5  4638.795 ±  568.743  ops/s
NumericSearch.intSet            thrpt    5  1191.282 ±  107.378  ops/s
NumericSearch.intTrieDvSet      thrpt    5  4845.332 ±  252.985  ops/s
NumericSearch.intTrieSet        thrpt    5  4495.456 ± 1568.532  ops/s

The same benchmark on main (No change to the queries):

Benchmark                    Mode  Cnt     Score     Error  Units
NumericSearch.intDvSet      thrpt    5  3298.151 ± 482.841  ops/s
NumericSearch.intSet        thrpt    5  1295.497 ±  44.262  ops/s
NumericSearch.intTrieDvSet  thrpt    5  4727.185 ± 357.855  ops/s
NumericSearch.intTrieSet    thrpt    5  4748.940 ± 667.751  ops/s

Looks like we are good to move forward with this and remove TrieFields.

Copy link
Contributor

@gerlowskija gerlowskija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few preliminary comments - nothing major, mostly just trying to understand some of the motivations underlying particular changes.

Still need to review the tests, but the 'main' code looks great. This'll be an awesome improvement to get in, especially if it opens the door to us moving the needle on Trie fields!


public QueryRequest intSetQuery(boolean dvs) {
return setQuery("numbers_i" + (dvs ? "_dv" : ""));
public QueryRequest intTrieSetQuery(boolean dvs, boolean enhancedIndex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Why provide the "enhancedIndex" flag here and then not use it?

limitations under the License.
-->
<schema name="minimal" version="1.7">
<schema name="minimal" version="1.6">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Why is this going backwards from 1.7?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh yeah, because it was testing docValues or not, but the docValues are enabled by docValues=true which is default in 1.7. So instead of going and changing to docValues=false, I just downgraded. I'll make it better before merging for sure. Same with the enhancedIndex flag above.

readableToIndexed(externalVal, br);
return new TermQuery(new Term(field.getName(), br));
} else {
return getPointRangeQuery(parser, field, externalVal, externalVal, true, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Prior to this PR, most of the implementing subclasses handled the no-terms-index case by calling Lucene's IntPoint.newExactQuery(...). Currently, that factory also creates a range-query under the hood, so it's (afaict) functionally equivalent to the getPointRangeQuery call here.

If the code's largely equivalent then, is there a reason for the switch? The Lucene IntPoint.newExactQuery call gives us a bit less control I guess, but it has the benefit of being code that we don't have to maintain ourselves.

Are there optimizations in getPointRangeQuery that the Lucene equivalent doesn't have? Or are there other reasons that make the switch worthwhile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, they are functionally the same. And the only reason I didn't do this was so that I didn't have to write another getSpecializedExactQuery that does the IntPoint.nextExactQuery or LongPoint.nextExactQuery, etc. But that's an easy thing to add back in.

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome; thanks!

I think "enhancedIndex" is a terrible name; sorry.

I think the below would be the most useful to explain/document/use:

  • indexed=true|false should toggle both "lookup" and "range" indexes accorindgly. It should be an error to explicitly set this and also set one of the settings below.
  • lookupIndexed=true|false should toggle "lookup" index (i.e. as implemented by Terms)
  • rangeIndexed=true|false a "range" index (i.e. as implemented by BKD)

@HoustonPutman
Copy link
Contributor Author

I think "enhancedIndex" is a terrible name; sorry.

Agreed, it was only a placeholder.

I've created a new PR to change this to a new fieldType that can have it enabled by default (which we can't really do currently since existing fields don't have the index). #3972 I think that's the direction I want to go, then deprecate and remove all other numeric fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants