Fix #5150: Fix dedup aggregation pushdown nullifying renamed fields#5192
Fix #5150: Fix dedup aggregation pushdown nullifying renamed fields#5192opensearchpplteam wants to merge 4 commits intoopensearch-project:mainfrom
Conversation
PR Reviewer Guide 🔍(Review updated until commit e8a2e3a)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to e8a2e3a Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit e9a24ad
Suggestions up to commit d82a0a1
Suggestions up to commit 6eb5ac5
Suggestions up to commit eddc6b9
Suggestions up to commit 5cf5355
|
|
Persistent review updated to latest commit eddc6b9 |
…g renamed fields When the DedupPushdownRule converts a dedup to an aggregation-based top_hits query, fields that were renamed (via rename or eval) would return null values. This happened because the TopHitsParser returned results using original OpenSearch field names, but the output schema expected the renamed names. Added a field name mapping to TopHitsParser so it can translate original OS field names to their renamed output names in the LITERAL_AGG (dedup) aggregation response path. Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>
eddc6b9 to
6eb5ac5
Compare
|
Persistent review updated to latest commit 6eb5ac5 |
The testNoMvBasic and testNoMvWithEval failures in CalciteExplainIT are unrelated to this PR's dedup pushdown changes and only occur on Java 25. Re-triggering CI. Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>
|
Persistent review updated to latest commit d82a0a1 |
| String originalName = helper.inferNamedField(arg.getKey()).getRootName(); | ||
| String outputName = arg.getValue(); | ||
| if (!originalName.equals(outputName)) { | ||
| fieldNameMapping.put(originalName, outputName); |
There was a problem hiding this comment.
When two project args reference the same original field with different output names (e.g., eval pay2 = salary | rename salary as pay | dedup dept_id), the second HashMap.put silently overwrites the first mapping. One of the renamed fields would still get null values. This is an edge case from issue #5150 differential testing. Consider either documenting this as a known limitation or using a multimap / detecting the collision.
There was a problem hiding this comment.
Added a warning log when a collision is detected (when the same original field maps to multiple output names). The last mapping wins, which is documented in a code comment as a known limitation. This edge case requires two derived fields from the same original field before dedup, which is rare in practice.
| : new LinkedHashMap<>(hit.getSourceAsMap()); | ||
| hit.getFields().values().forEach(f -> map.put(f.getName(), f.getValue())); | ||
| return map; | ||
| return applyFieldNameMapping(map); |
There was a problem hiding this comment.
The field name mapping is applied only in the multi-row return path (the else branch at line 140). The returnSingleValue path (used by ARG_MAX/ARG_MIN pushdown) and returnMergeValue path skip mapping entirely. If those paths are ever used with renamed fields via a future pushdown rule, they would have the same null-value bug. Worth confirming this is an intentional scope limitation for now, or if those branches should also apply the mapping defensively.
There was a problem hiding this comment.
Confirmed this is intentional. The returnSingleValue and returnMergeValue paths use agg.getName() as the map key (not field names), so field name mapping is irrelevant for those paths. Added a clarifying comment in the code explaining why mapping is only applied in the multi-row return path.
…apping limitations - Add warning log when HashMap collision is detected in AggregateAnalyzer field name mapping (when same original field maps to multiple output names) - Add clarifying comments in TopHitsParser explaining why field name mapping is only needed in the multi-row return path Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>
|
Persistent review updated to latest commit e9a24ad |
| - is_true: datarows.0.1 | ||
| - is_true: datarows.1.1 |
There was a problem hiding this comment.
compare with real value, not just null value
There was a problem hiding this comment.
Updated — replaced is_true with match assertions for actual expected values (category and val columns).
| - is_true: datarows.0.2 | ||
| - is_true: datarows.1.2 |
There was a problem hiding this comment.
compare with real value, not just null value
There was a problem hiding this comment.
Updated — replaced is_true with match assertions for actual expected values (category, value, and doubled columns).
…project#5150 Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>
|
Persistent review updated to latest commit e8a2e3a |
Description
Root cause: The
DedupPushdownRuleconverts dedup operations toLITERAL_AGGtop_hits aggregations. When a field is renamed before dedup (e.g.,rename value as val), the top_hits response from OpenSearch still uses the original field name (value). However, the output schema expects the renamed name (val). TheOpenSearchIndexEnumeratorlooks up fields by the renamed names, finds nothing in the top_hits response map, and returns null.Fix: Build a field name mapping (original -> renamed) in
AggregateAnalyzerfor theLITERAL_AGGcase by comparinginferNamedField().getRootName()with the output alias from the project args. Pass this mapping toTopHitsParser, which applies it to translate OpenSearch field names back to the expected output names before results are consumed by the enumerator.Changes:
AggregateAnalyzer.java: Build original-to-renamed field name mapping from project args inLITERAL_AGGcase, pass toTopHitsParserTopHitsParser.java: AddedfieldNameMappingfield andapplyFieldNameMapping()method to translate OS field names to renamed output names. Added backward-compatible 4-arg constructor; existing 3-arg constructor delegates with empty map.Related Issues
Resolves #5150
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.