-
Notifications
You must be signed in to change notification settings - Fork 792
SOLR-12074: Add optional terms index for PointFields #3922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
SOLR-12074: Add optional terms index for PointFields #3922
Conversation
|
Ok, benchmarking with and without The same benchmark on Looks like we are good to move forward with this and remove TrieFields. |
gerlowskija
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few preliminary comments - nothing major, mostly just trying to understand some of the motivations underlying particular changes.
Still need to review the tests, but the 'main' code looks great. This'll be an awesome improvement to get in, especially if it opens the door to us moving the needle on Trie fields!
|
|
||
| public QueryRequest intSetQuery(boolean dvs) { | ||
| return setQuery("numbers_i" + (dvs ? "_dv" : "")); | ||
| public QueryRequest intTrieSetQuery(boolean dvs, boolean enhancedIndex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] Why provide the "enhancedIndex" flag here and then not use it?
| limitations under the License. | ||
| --> | ||
| <schema name="minimal" version="1.7"> | ||
| <schema name="minimal" version="1.6"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] Why is this going backwards from 1.7?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh yeah, because it was testing docValues or not, but the docValues are enabled by docValues=true which is default in 1.7. So instead of going and changing to docValues=false, I just downgraded. I'll make it better before merging for sure. Same with the enhancedIndex flag above.
| readableToIndexed(externalVal, br); | ||
| return new TermQuery(new Term(field.getName(), br)); | ||
| } else { | ||
| return getPointRangeQuery(parser, field, externalVal, externalVal, true, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] Prior to this PR, most of the implementing subclasses handled the no-terms-index case by calling Lucene's IntPoint.newExactQuery(...). Currently, that factory also creates a range-query under the hood, so it's (afaict) functionally equivalent to the getPointRangeQuery call here.
If the code's largely equivalent then, is there a reason for the switch? The Lucene IntPoint.newExactQuery call gives us a bit less control I guess, but it has the benefit of being code that we don't have to maintain ourselves.
Are there optimizations in getPointRangeQuery that the Lucene equivalent doesn't have? Or are there other reasons that make the switch worthwhile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, they are functionally the same. And the only reason I didn't do this was so that I didn't have to write another getSpecializedExactQuery that does the IntPoint.nextExactQuery or LongPoint.nextExactQuery, etc. But that's an easy thing to add back in.
dsmiley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome; thanks!
I think "enhancedIndex" is a terrible name; sorry.
I think the below would be the most useful to explain/document/use:
indexed=true|false should toggle both "lookup" and "range" indexes accorindgly. It should be an error to explicitly set this and also set one of the settings below.lookupIndexed=true|false should toggle "lookup" index (i.e. as implemented by Terms)rangeIndexed=true|false a "range" index (i.e. as implemented by BKD)
Agreed, it was only a placeholder. I've created a new PR to change this to a new fieldType that can have it enabled by default (which we can't really do currently since existing fields don't have the index). #3972 I think that's the direction I want to go, then deprecate and remove all other numeric fields. |
https://issues.apache.org/jira/browse/SOLR-12074
This is to replace the only functionality that TrieFields outperform PointFields in, namely term or termInSet matching.
Currently this uses the
enhancedIndexoption, but we should find a better name.Other side issues, like expanding pointField functionality when this is enabled, will be done afterwards.