Fix #5150: Fix dedup aggregation pushdown nullifying renamed fields by opensearchpplteam · Pull Request #5192 · opensearch-project/sql

opensearchpplteam · 2026-02-27T00:00:21Z

Description

Root cause: The DedupPushdownRule converts dedup operations to LITERAL_AGG top_hits aggregations. When a field is renamed before dedup (e.g., rename value as val), the top_hits response from OpenSearch still uses the original field name (value). However, the output schema expects the renamed name (val). The OpenSearchIndexEnumerator looks up fields by the renamed names, finds nothing in the top_hits response map, and returns null.

Fix: Build a field name mapping (original -> renamed) in AggregateAnalyzer for the LITERAL_AGG case by comparing inferNamedField().getRootName() with the output alias from the project args. Pass this mapping to TopHitsParser, which applies it to translate OpenSearch field names back to the expected output names before results are consumed by the enumerator.

Changes:

AggregateAnalyzer.java: Build original-to-renamed field name mapping from project args in LITERAL_AGG case, pass to TopHitsParser
TopHitsParser.java: Added fieldNameMapping field and applyFieldNameMapping() method to translate OS field names to renamed output names. Added backward-compatible 4-arg constructor; existing 3-arg constructor delegates with empty map.
Unit test: Verifies dedup top_hits parsing with field name remapping
YAML REST tests: End-to-end tests for rename-then-dedup and eval-then-dedup scenarios

Related Issues

Resolves #5150

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2026-02-27T00:01:46Z

PR Reviewer Guide 🔍

(Review updated until commit `e8a2e3a`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Add fieldNameMapping support to TopHitsParser with unit tests Relevant files: opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java opensearch/src/test/java/org/opensearch/sql/opensearch/response/OpenSearchAggregationResponseParserTest.java Sub-PR theme: Wire fieldNameMapping in AggregateAnalyzer and add integration tests Relevant files: opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5150.yml
⚡ Recommended focus areas for review Mapping Collision The code documents a known limitation where multiple project args referencing the same original field with different output names will result in the later mapping overwriting the earlier one. The LOG.warn handles this, but the behavior may silently produce incorrect results in edge cases like `eval pay2 = salary \| rename salary as pay \| dedup dept_id`. This limitation should be validated to ensure it doesn't affect real-world use cases or should be addressed more robustly. Map<String, String> fieldNameMapping = new HashMap<>(); for (Pair<RexNode, String> arg : args) { if (arg.getKey() instanceof RexInputRef) { String originalName = helper.inferNamedField(arg.getKey()).getRootName(); String outputName = arg.getValue(); if (!originalName.equals(outputName)) { String previousMapping = fieldNameMapping.put(originalName, outputName); if (previousMapping != null) { LOG.warn( "Field name mapping collision: field '{}' was mapped to '{}', now" + " overwritten by '{}'", originalName, previousMapping, outputName); } } } } Mapping Not Applied The `applyFieldNameMapping` is only applied in the multi-row path (the last return statement). The `returnMergeValue` path also iterates over hits and builds maps using field names as keys (lines around the `returnMergeValue` block), but does not apply the field name mapping. If `returnMergeValue` is ever used with renamed fields, the mapping would be silently skipped. The comment at line 63-65 acknowledges this but it may be worth verifying the `returnMergeValue` path is truly not affected. // Field name mapping is not applied in returnSingleValue or returnMergeValue paths // because they use the aggregation name (agg.getName()) as the map key, not field names. // Only the multi-row path below uses actual field names as keys and needs mapping. Missing Schema Check The second test case ("Eval new field then dedup on different field retains eval values") does not include a `schema` match assertion, unlike the first test case. Adding a schema check would make the test more robust and consistent. - match: { total: 2 } - length: { datarows: 2 } - match: { datarows.0.0: "X" } - match: { datarows.0.1: 100 } - match: { datarows.0.2: 200 } - match: { datarows.1.0: "Y" } - match: { datarows.1.1: 300 } - match: { datarows.1.2: 600 }

github-actions · 2026-02-27T00:02:05Z

PR Code Suggestions ✨

Latest suggestions up to e8a2e3a

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Detect output key collisions during field remapping If two original field names map to the same output name, the second one will silently overwrite the first in the result map. This could cause data loss without any warning. Consider detecting such collisions and logging a warning or throwing an exception. opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java [162-172] private Map<String, Object> applyFieldNameMapping(Map<String, Object> map) { if (fieldNameMapping.isEmpty()) { return map; } Map<String, Object> result = new LinkedHashMap<>(); for (Map.Entry<String, Object> entry : map.entrySet()) { String mappedName = fieldNameMapping.getOrDefault(entry.getKey(), entry.getKey()); + if (result.containsKey(mappedName)) { + throw new IllegalStateException( + String.format( + "Field name mapping collision in result: key '%s' already exists when mapping '%s'", + mappedName, entry.getKey())); + } result.put(mappedName, entry.getValue()); } return result; } Suggestion importance[1-10]: 4 __ Why: The suggestion addresses a valid edge case where two original fields could map to the same output name, causing silent data loss. However, this scenario is unlikely in practice given the mapping is built from project args, and throwing an exception may be overly aggressive compared to a warning log.	Low
General	Store defensive copy of mutable map parameter The `fieldNameMapping` field is stored as a direct reference to the passed-in map, which could be mutated externally after construction. Store a defensive copy to ensure immutability of the parser's state. opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java [50] -this.fieldNameMapping = fieldNameMapping; +this.fieldNameMapping = Map.copyOf(fieldNameMapping); Suggestion importance[1-10]: 4 __ Why: Storing a defensive copy of `fieldNameMapping` is a good defensive programming practice to prevent external mutation. The `Map.copyOf()` change is minimal and correct, though the risk of external mutation is low given the call sites in this PR.	Low

Previous suggestions

Suggestions up to commit e9a24ad

Category	Suggestion	Impact
General	Strengthen integration test assertions with exact value checks The `is_true` assertion checks for truthiness, which would fail for falsy values like `0`. Since `value` fields contain `100` and `300` (non-zero), this works here, but it's fragile. Consider using `match` assertions to verify the exact expected values for stronger test coverage. integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5150.yml [49-53] - match: { schema: [{"name": "category", "type": "string"}, {"name": "val", "type": "bigint"}] } - length: { datarows: 2 } - # Each row should have non-null val - - is_true: datarows.0.1 - - is_true: datarows.1.1 + # Each row should have non-null val — verify exact values + - match: { datarows.0: ["X", 100] } + - match: { datarows.1: ["Y", 300] } Suggestion importance[1-10]: 5 __ Why: The suggestion improves test robustness by replacing `is_true` (truthiness check) with exact `match` assertions. This is a valid improvement since `is_true` would fail for falsy values like `0`, and exact value checks provide stronger test coverage for the renamed field behavior.	Low
General	Detect collisions when applying field name mapping If two original field names map to the same output name, the second one will silently overwrite the first in the result map. This could cause data loss without any warning. Consider detecting and logging a warning when a collision occurs during the mapping application. opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java [162-172] private Map<String, Object> applyFieldNameMapping(Map<String, Object> map) { if (fieldNameMapping.isEmpty()) { return map; } Map<String, Object> result = new LinkedHashMap<>(); for (Map.Entry<String, Object> entry : map.entrySet()) { String mappedName = fieldNameMapping.getOrDefault(entry.getKey(), entry.getKey()); + if (result.containsKey(mappedName)) { + LOG.warn( + "Field name mapping collision in result: mapped name '{}' already exists, overwriting value", + mappedName); + } result.put(mappedName, entry.getValue()); } return result; } Suggestion importance[1-10]: 4 __ Why: The suggestion adds collision detection in `applyFieldNameMapping`, which is a valid defensive measure. However, this is a relatively rare edge case and the impact is low since the primary collision warning is already handled upstream in `AggregateAnalyzer`. The `LOG` reference would also need to be added to `TopHitsParser`.	Low

Suggestions up to commit d82a0a1

Category	Suggestion	Impact
Possible issue	Guard against null field resolution result If `helper.inferNamedField(arg.getKey())` returns `null`, calling `.getRootName()` on it will throw a `NullPointerException`. A null check should be added before accessing `getRootName()` to prevent a runtime crash when the field cannot be resolved. opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java [609-617] for (Pair<RexNode, String> arg : args) { if (arg.getKey() instanceof RexInputRef) { - String originalName = helper.inferNamedField(arg.getKey()).getRootName(); + var namedField = helper.inferNamedField(arg.getKey()); + if (namedField == null) continue; + String originalName = namedField.getRootName(); String outputName = arg.getValue(); if (!originalName.equals(outputName)) { fieldNameMapping.put(originalName, outputName); } } } Suggestion importance[1-10]: 6 __ Why: If `helper.inferNamedField(arg.getKey())` can return null, calling `.getRootName()` would throw a `NullPointerException`. Adding a null check is a reasonable defensive measure, though it depends on whether the method contract guarantees non-null returns.	Low
General	Detect field name mapping collisions If two original field names map to the same output name, or if a mapped output name collides with another existing key in the result map, one value will silently overwrite the other. While this may be an edge case, it could lead to data loss without any warning. Consider detecting and handling such collisions explicitly. opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java [159-169] private Map<String, Object> applyFieldNameMapping(Map<String, Object> map) { if (fieldNameMapping.isEmpty()) { return map; } Map<String, Object> result = new LinkedHashMap<>(); for (Map.Entry<String, Object> entry : map.entrySet()) { String mappedName = fieldNameMapping.getOrDefault(entry.getKey(), entry.getKey()); + if (result.containsKey(mappedName)) { + throw new IllegalStateException( + String.format("Field name mapping collision: multiple fields map to '%s'", mappedName)); + } result.put(mappedName, entry.getValue()); } return result; } Suggestion importance[1-10]: 3 __ Why: While collision detection is a valid concern, this is an edge case that is unlikely in practice since field rename mappings are derived from explicit user-defined renames. Throwing an exception here could cause unexpected failures in valid scenarios where the mapping is well-defined.	Low

Suggestions up to commit 6eb5ac5

Category	Suggestion	Impact
Possible issue	Add null check for inferNamedField The code doesn't handle potential null values from `helper.inferNamedField(arg.getKey())`. If this method returns null, calling `getRootName()` will throw a NullPointerException. Add a null check before accessing `getRootName()` to prevent runtime failures. opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java [608-617] Map<String, String> fieldNameMapping = new HashMap<>(); for (Pair<RexNode, String> arg : args) { if (arg.getKey() instanceof RexInputRef) { - String originalName = helper.inferNamedField(arg.getKey()).getRootName(); - String outputName = arg.getValue(); - if (!originalName.equals(outputName)) { - fieldNameMapping.put(originalName, outputName); + var namedField = helper.inferNamedField(arg.getKey()); + if (namedField != null) { + String originalName = namedField.getRootName(); + String outputName = arg.getValue(); + if (!originalName.equals(outputName)) { + fieldNameMapping.put(originalName, outputName); + } } } } Suggestion importance[1-10]: 7 __ Why: Valid defensive programming suggestion to prevent potential `NullPointerException` if `helper.inferNamedField()` returns null. However, without knowing the implementation details of `inferNamedField()`, it's unclear if this is a real issue or if the method guarantees non-null returns for `RexInputRef` instances.	Medium

Suggestions up to commit eddc6b9

Category	Suggestion	Impact
Possible issue	Add null check for inferNamedField The code assumes `helper.inferNamedField(arg.getKey())` never returns null, which could cause a NullPointerException. Add a null check before calling `getRootName()` to prevent potential runtime failures when the field cannot be inferred. opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java [608-617] Map<String, String> fieldNameMapping = new HashMap<>(); for (Pair<RexNode, String> arg : args) { if (arg.getKey() instanceof RexInputRef) { - String originalName = helper.inferNamedField(arg.getKey()).getRootName(); - String outputName = arg.getValue(); - if (!originalName.equals(outputName)) { - fieldNameMapping.put(originalName, outputName); + var namedField = helper.inferNamedField(arg.getKey()); + if (namedField != null) { + String originalName = namedField.getRootName(); + String outputName = arg.getValue(); + if (!originalName.equals(outputName)) { + fieldNameMapping.put(originalName, outputName); + } } } } Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a potential `NullPointerException` if `helper.inferNamedField(arg.getKey())` returns null. Adding a null check improves robustness, though the actual likelihood of this occurring depends on the codebase's guarantees about `inferNamedField` behavior.	Medium

Suggestions up to commit 5cf5355

Category	Suggestion	Impact
Possible issue	Add null check for inferNamedField The code doesn't handle potential null values from `helper.inferNamedField(arg.getKey())`. If `inferNamedField` returns null, calling `getRootName()` will throw a NullPointerException. Add a null check before accessing `getRootName()` to prevent runtime failures. opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java [608-617] Map<String, String> fieldNameMapping = new HashMap<>(); for (Pair<RexNode, String> arg : args) { if (arg.getKey() instanceof RexInputRef) { - String originalName = helper.inferNamedField(arg.getKey()).getRootName(); - String outputName = arg.getValue(); - if (!originalName.equals(outputName)) { - fieldNameMapping.put(originalName, outputName); + var namedField = helper.inferNamedField(arg.getKey()); + if (namedField != null) { + String originalName = namedField.getRootName(); + String outputName = arg.getValue(); + if (!originalName.equals(outputName)) { + fieldNameMapping.put(originalName, outputName); + } } } } Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a potential `NullPointerException` if `helper.inferNamedField(arg.getKey())` returns null. Adding a null check improves robustness, though the actual likelihood of this occurring depends on the implementation of `inferNamedField`.	Medium
General	Detect field name mapping collisions If multiple original field names map to the same output name, later entries will silently overwrite earlier ones, potentially causing data loss. Consider detecting and handling this collision scenario to prevent unexpected behavior. opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java [159-169] private Map<String, Object> applyFieldNameMapping(Map<String, Object> map) { if (fieldNameMapping.isEmpty()) { return map; } Map<String, Object> result = new LinkedHashMap<>(); for (Map.Entry<String, Object> entry : map.entrySet()) { String mappedName = fieldNameMapping.getOrDefault(entry.getKey(), entry.getKey()); + if (result.containsKey(mappedName)) { + throw new IllegalStateException( + String.format("Field name collision detected: multiple fields map to '%s'", mappedName)); + } result.put(mappedName, entry.getValue()); } return result; } Suggestion importance[1-10]: 6 __ Why: The suggestion identifies a potential issue where multiple fields mapping to the same output name could cause data loss. However, this scenario seems unlikely given the context (field renaming via `rename` command), and throwing an exception might be too aggressive without understanding the broader system behavior.	Low

github-actions · 2026-02-27T00:26:33Z

Persistent review updated to latest commit eddc6b9

…g renamed fields When the DedupPushdownRule converts a dedup to an aggregation-based top_hits query, fields that were renamed (via rename or eval) would return null values. This happened because the TopHitsParser returned results using original OpenSearch field names, but the output schema expected the renamed names. Added a field name mapping to TopHitsParser so it can translate original OS field names to their renamed output names in the LITERAL_AGG (dedup) aggregation response path. Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>

github-actions · 2026-02-27T01:46:14Z

Persistent review updated to latest commit 6eb5ac5

The testNoMvBasic and testNoMvWithEval failures in CalciteExplainIT are unrelated to this PR's dedup pushdown changes and only occur on Java 25. Re-triggering CI. Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>

github-actions · 2026-02-27T03:25:44Z

Persistent review updated to latest commit d82a0a1

opensearchpplteam · 2026-02-27T04:34:55Z

opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java

+            String originalName = helper.inferNamedField(arg.getKey()).getRootName();
+            String outputName = arg.getValue();
+            if (!originalName.equals(outputName)) {
+              fieldNameMapping.put(originalName, outputName);


When two project args reference the same original field with different output names (e.g., eval pay2 = salary | rename salary as pay | dedup dept_id), the second HashMap.put silently overwrites the first mapping. One of the renamed fields would still get null values. This is an edge case from issue #5150 differential testing. Consider either documenting this as a known limitation or using a multimap / detecting the collision.

Added a warning log when a collision is detected (when the same original field maps to multiple output names). The last mapping wins, which is documented in a code comment as a known limitation. This edge case requires two derived fields from the same original field before dedup, which is rare in practice.

opensearchpplteam · 2026-02-27T04:35:04Z

opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java

                        : new LinkedHashMap<>(hit.getSourceAsMap());
                hit.getFields().values().forEach(f -> map.put(f.getName(), f.getValue()));
-                return map;
+                return applyFieldNameMapping(map);


The field name mapping is applied only in the multi-row return path (the else branch at line 140). The returnSingleValue path (used by ARG_MAX/ARG_MIN pushdown) and returnMergeValue path skip mapping entirely. If those paths are ever used with renamed fields via a future pushdown rule, they would have the same null-value bug. Worth confirming this is an intentional scope limitation for now, or if those branches should also apply the mapping defensively.

Confirmed this is intentional. The returnSingleValue and returnMergeValue paths use agg.getName() as the map key (not field names), so field name mapping is irrelevant for those paths. Added a clarifying comment in the code explaining why mapping is only applied in the multi-row return path.

…apping limitations - Add warning log when HashMap collision is detected in AggregateAnalyzer field name mapping (when same original field maps to multiple output names) - Add clarifying comments in TopHitsParser explaining why field name mapping is only needed in the multi-row return path Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>

github-actions · 2026-02-27T04:53:24Z

Persistent review updated to latest commit e9a24ad

penghuo · 2026-02-27T21:41:34Z

integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5150.yml

+  - is_true: datarows.0.1
+  - is_true: datarows.1.1


compare with real value, not just null value

Updated — replaced is_true with match assertions for actual expected values (category and val columns).

penghuo · 2026-02-27T21:41:41Z

integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5150.yml

+  - is_true: datarows.0.2
+  - is_true: datarows.1.2


compare with real value, not just null value

Updated — replaced is_true with match assertions for actual expected values (category, value, and doubled columns).

…project#5150 Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>

github-actions · 2026-02-27T21:57:42Z

Persistent review updated to latest commit e8a2e3a

penghuo added PPL Piped processing language bugFix labels Feb 27, 2026

opensearchpplteam force-pushed the fix/issue-5150 branch from eddc6b9 to 6eb5ac5 Compare February 27, 2026 01:45

Trigger CI rerun for flaky Java 25 integration test

d82a0a1

The testNoMvBasic and testNoMvWithEval failures in CalciteExplainIT are unrelated to this PR's dedup pushdown changes and only occur on Java 25. Re-triggering CI. Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>

opensearchpplteam commented Feb 27, 2026

View reviewed changes

penghuo reviewed Feb 27, 2026

View reviewed changes

Address review: assert real values in YAML REST tests for opensearch-…

e8a2e3a

…project#5150 Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>

Conversation

opensearchpplteam commented Feb 27, 2026

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit e8a2e3a)

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Feb 27, 2026 •

edited

Loading

(Review updated until commit `e8a2e3a`)

github-actions bot commented Feb 27, 2026 •

edited

Loading