Explicit head command should override fetch_size in PPL by ahkcs · Pull Request #5194 · opensearch-project/sql

ahkcs · 2026-02-27T01:08:41Z

Summary

Fixes bug 2 in #5191

When a user's PPL query contains an explicit head command, the fetch_size API parameter should not inject an additional Head node on top. Previously, AstStatementBuilder.visitPplStatement() always wrapped the plan with Head(fetchSize), creating a double-Head that capped results unexpectedly.

Example: source=index | head 100 with fetch_size=5 would only return 5 rows instead of the expected 100 rows, because the outer Head(5) silently overrode the user's explicit head 100.

Test plan

Manual verification via curl on local cluster:
- head 5 + fetch_size=10 → 5 rows ✓
- head 8 + fetch_size=3 → 8 rows ✓ (head overrides fetch_size)
- fetch_size=3 (no head) → 3 rows ✓ (fetch_size still works when no head)

github-actions · 2026-02-27T01:10:04Z

PR Reviewer Guide 🔍

(Review updated until commit `a6e1216`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Core fix - head command overrides fetch_size in PPL AST builder Relevant files: ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java ppl/src/test/java/org/opensearch/sql/ppl/parser/AstStatementBuilderTest.java Sub-PR theme: Integration tests and expected output updates for head/fetch_size behavior Relevant files: integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java integ-test/src/test/resources/expectedOutput/calcite/explain_fetch_size_smaller_than_head_push.yaml integ-test/src/test/resources/expectedOutput/calcite/explain_fetch_size_with_head_push.yaml integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_fetch_size_smaller_than_head_push.yaml integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_fetch_size_with_head_push.yaml Sub-PR theme: Documentation update for head command and fetch_size interaction Relevant files: docs/user/interfaces/endpoint.rst
⚡ Recommended focus areas for review Shallow Head Detection The `containsHead` method recursively traverses the entire AST to find any `Head` node. However, in a pipeline like `source=t \| eval x=1 \| head 5 \| eval y=2`, the `Head` node may appear deep in the tree. More importantly, the check uses `plan.getChild()` which may not cover all node types uniformly. If any AST node type does not properly implement `getChild()` or returns an empty list when it shouldn't, the traversal could miss a `Head` node and incorrectly inject an additional one. The robustness of this traversal depends on all AST node types correctly implementing `getChild()`. private boolean containsHead(UnresolvedPlan plan) { if (plan instanceof Head) { return true; } for (var child : plan.getChild()) { if (child instanceof UnresolvedPlan && containsHead((UnresolvedPlan) child)) { return true; } } return false; } Hardcoded Dataset Size The test `testHeadLargerThanDatasetWithFetchSize` asserts `assertEquals(7, dataRows.length())` assuming `TEST_INDEX_BANK` has exactly 7 rows. This is a fragile assumption that could break if the test dataset changes. Consider using a more robust assertion or documenting the dataset size dependency explicitly. public void testHeadLargerThanDatasetWithFetchSize() throws IOException { // head 1000 on a 7-row index with fetch_size=3: head takes precedence, returns all 7 rows JSONObject result = executeQueryWithFetchSize(String.format("source=%s \| head 1000", TEST_INDEX_BANK), 3); JSONArray dataRows = result.getJSONArray("datarows"); assertEquals(7, dataRows.length()); }

github-actions · 2026-02-27T01:10:18Z

PR Code Suggestions ✨

Latest suggestions up to a6e1216

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Add null safety to recursive plan traversal The `containsHead` method will throw a `NullPointerException` if `plan` is `null` or if `plan.getChild()` returns `null`. A null check should be added at the start of the method and before iterating over children to make the recursive traversal safe. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [77-87] private boolean containsHead(UnresolvedPlan plan) { + if (plan == null) { + return false; + } if (plan instanceof Head) { return true; } - for (var child : plan.getChild()) { + List<?> children = plan.getChild(); + if (children == null) { + return false; + } + for (var child : children) { if (child instanceof UnresolvedPlan && containsHead((UnresolvedPlan) child)) { return true; } } return false; } Suggestion importance[1-10]: 5 __ Why: Adding null checks for `plan` and `plan.getChild()` is a reasonable defensive measure, but in practice `rawPlan` is always non-null (it comes from `astBuilder.visit(ctx)`) and `getChild()` is unlikely to return null in the existing codebase. This is a minor robustness improvement.	Low
General	Verify dataset size assumption in test The test asserts that `head 100` returns exactly 100 rows from `TEST_INDEX_ACCOUNT`, but if the index contains fewer than 100 documents, the assertion will fail. The expected value should be bounded by the actual document count in the index, or the test should use a dataset known to have at least 100 documents. integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [186-194] public void testHeadOverridesFetchSizeWhenLarger() throws IOException { // Explicit head takes precedence over fetch_size - // head 100 should return 100 rows even though fetch_size=5 + // head 100 should return up to 100 rows even though fetch_size=5 JSONObject result = executeQueryWithFetchSize( String.format("source=%s \| head 100 \| fields firstname", TEST_INDEX_ACCOUNT), 5); JSONArray dataRows = result.getJSONArray("datarows"); + // TEST_INDEX_ACCOUNT has 1000 documents, so head 100 returns 100 assertEquals(100, dataRows.length()); } Suggestion importance[1-10]: 4 __ Why: The suggestion correctly identifies that `assertEquals(100, dataRows.length())` assumes `TEST_INDEX_ACCOUNT` has at least 100 documents. However, the `improved_code` only adds a comment rather than actually fixing the assertion logic, making this a documentation-only change with limited impact.	Low

Previous suggestions

Suggestions up to commit e62de53

Category	Suggestion	Impact
Possible issue	Add null safety to recursive plan traversal The `containsHead` method does not handle the case where `plan` is `null`, which could cause a `NullPointerException` if an empty or malformed plan is passed. Add a null check at the start of the method to guard against this. Additionally, `getChild()` itself could potentially return `null`, so that should also be guarded. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [77-87] private boolean containsHead(UnresolvedPlan plan) { + if (plan == null) { + return false; + } if (plan instanceof Head) { return true; } - for (var child : plan.getChild()) { + List<? extends Node> children = plan.getChild(); + if (children == null) { + return false; + } + for (var child : children) { if (child instanceof UnresolvedPlan && containsHead((UnresolvedPlan) child)) { return true; } } return false; } Suggestion importance[1-10]: 4 __ Why: Adding null checks for `plan` and `getChild()` is a reasonable defensive measure, but in practice the `containsHead` method is only called with a non-null `rawPlan` from `visitPplStatement`, and `getChild()` in the AST framework typically returns an empty list rather than null. The risk of NPE is low in this context.	Low
General	Avoid hardcoded dataset size in test assertion The test `testHeadLargerThanDatasetWithFetchSize` hardcodes the expected row count as `7`, which is tightly coupled to the current size of `TEST_INDEX_BANK`. If the test index data changes, this test will fail with a misleading error. Consider asserting that the row count is less than or equal to 1000 (the head limit) and greater than `fetch_size` (3), or document clearly why 7 is the expected value. integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [231-237] public void testHeadLargerThanDatasetWithFetchSize() throws IOException { // head 1000 on a 7-row index with fetch_size=3: head takes precedence, returns all 7 rows + // TEST_INDEX_BANK has exactly 7 documents JSONObject result = executeQueryWithFetchSize(String.format("source=%s \| head 1000", TEST_INDEX_BANK), 3); JSONArray dataRows = result.getJSONArray("datarows"); - assertEquals(7, dataRows.length()); + // Verify that more than fetch_size (3) rows are returned, confirming head takes precedence + assertTrue("Expected more rows than fetch_size=3", dataRows.length() > 3); + assertTrue("Expected at most 1000 rows (head limit)", dataRows.length() <= 1000); } Suggestion importance[1-10]: 4 __ Why: The hardcoded `assertEquals(7, dataRows.length())` is tightly coupled to the current `TEST_INDEX_BANK` dataset size. However, integration test indices typically have fixed, well-known sizes, so this is a common pattern. The suggestion to use range assertions would make the test less precise about the actual behavior being verified.	Low

Suggestions up to commit adea087

Category	Suggestion	Impact
Possible issue	Add null safety guards in recursive check The `containsHead` method may throw a `NullPointerException` if `plan` is `null` or if `plan.getChild()` returns `null`. Add null guards at the start of the method to handle these edge cases safely. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [77-87] private boolean containsHead(UnresolvedPlan plan) { + if (plan == null) { + return false; + } if (plan instanceof Head) { return true; } - for (var child : plan.getChild()) { + List<? extends Node> children = plan.getChild(); + if (children == null) { + return false; + } + for (var child : children) { if (child instanceof UnresolvedPlan && containsHead((UnresolvedPlan) child)) { return true; } } return false; } Suggestion importance[1-10]: 5 __ Why: Adding null guards for `plan` and `plan.getChild()` is a reasonable defensive measure, but in practice `containsHead` is only called with a non-null `rawPlan` from `visitPplStatement`, making this a low-risk edge case. The improvement is valid but has limited practical impact.	Low
General	Avoid hardcoded magic number in test assertion The test hardcodes `7` as the expected row count for `TEST_INDEX_BANK`, which is a fragile assumption. If the test index size changes, this test will fail unexpectedly. Consider fetching the total count dynamically or using a named constant to make the intent clear and the test more maintainable. integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [231-237] public void testHeadLargerThanDatasetWithFetchSize() throws IOException { // head 1000 on a 7-row index with fetch_size=3: head takes precedence, returns all 7 rows + // TEST_INDEX_BANK has exactly 7 documents + final int BANK_INDEX_SIZE = 7; JSONObject result = executeQueryWithFetchSize(String.format("source=%s \| head 1000", TEST_INDEX_BANK), 3); JSONArray dataRows = result.getJSONArray("datarows"); - assertEquals(7, dataRows.length()); + assertEquals(BANK_INDEX_SIZE, dataRows.length()); } Suggestion importance[1-10]: 3 __ Why: Using a named constant instead of the magic number `7` improves readability, but the `improved_code` still hardcodes the value as `7` in a local constant rather than deriving it dynamically, offering only marginal maintainability improvement.	Low

Suggestions up to commit 7f797a9

Category	Suggestion	Impact
General	Limit head detection to linear pipeline chain only The `containsHead` method checks all nodes in the entire AST tree, including nodes that appear after the head command in the pipeline (i.e., nodes that are children of head). This means a `head` command buried deep in a subquery or nested plan would also suppress fetch_size injection at the top level. The check should only look at the top-level pipeline chain (the direct lineage), not all descendants. Consider traversing only the linear chain of plan nodes rather than all children recursively. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [77-87] private boolean containsHead(UnresolvedPlan plan) { - if (plan instanceof Head) { - return true; - } - for (var child : plan.getChild()) { - if (child instanceof UnresolvedPlan && containsHead((UnresolvedPlan) child)) { + UnresolvedPlan current = plan; + while (current != null) { + if (current instanceof Head) { return true; } + List<UnresolvedPlan> children = current.getChild().stream() + .filter(c -> c instanceof UnresolvedPlan) + .map(c -> (UnresolvedPlan) c) + .collect(java.util.stream.Collectors.toList()); + current = children.size() == 1 ? children.get(0) : null; } return false; } Suggestion importance[1-10]: 5 __ Why: The suggestion raises a valid concern about the recursive `containsHead` traversing all descendants rather than just the linear pipeline chain. However, in PPL's AST, plans are typically linear pipelines where each node has exactly one child, so the practical difference may be minimal. The improved code is logically sound but adds complexity with stream operations.	Low
General	Document hardcoded index size assumption in test This test hardcodes the expected row count as `7`, which is the assumed size of `TEST_INDEX_BANK`. If the test index size ever changes, this test will silently fail or give misleading results. Consider either asserting `dataRows.length() <= 1000` (i.e., head limit is respected, not fetch_size) or documenting the assumed index size more explicitly with a constant. integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java [231-237] public void testHeadLargerThanDatasetWithFetchSize() throws IOException { // head 1000 on a 7-row index with fetch_size=3: head takes precedence, returns all 7 rows + // TEST_INDEX_BANK has exactly 7 documents + final int BANK_INDEX_SIZE = 7; JSONObject result = executeQueryWithFetchSize(String.format("source=%s \| head 1000", TEST_INDEX_BANK), 3); JSONArray dataRows = result.getJSONArray("datarows"); - assertEquals(7, dataRows.length()); + // Should return all rows (not limited by fetch_size=3), up to head limit of 1000 + assertEquals(BANK_INDEX_SIZE, dataRows.length()); } Suggestion importance[1-10]: 2 __ Why: The suggestion to extract the hardcoded `7` into a named constant `BANK_INDEX_SIZE` is a minor readability improvement. The comment already mentions "7-row index", and the `improved_code` is essentially the same logic with a local constant, offering minimal practical benefit.	Low

Suggestions up to commit ff1fd32

Category	Suggestion	Impact
General	Add recursion depth limit The recursive traversal may cause a `StackOverflowError` for deeply nested query plans. Consider adding a depth limit or using an iterative approach with an explicit stack to prevent stack overflow in production scenarios with complex queries. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [77-87] private boolean containsHead(UnresolvedPlan plan) { + return containsHead(plan, 0, 100); +} + +private boolean containsHead(UnresolvedPlan plan, int depth, int maxDepth) { + if (depth > maxDepth) { + return false; + } if (plan instanceof Head) { return true; } for (var child : plan.getChild()) { - if (child instanceof UnresolvedPlan && containsHead((UnresolvedPlan) child)) { + if (child instanceof UnresolvedPlan && containsHead((UnresolvedPlan) child, depth + 1, maxDepth)) { return true; } } return false; } Suggestion importance[1-10]: 4 __ Why: While adding a depth limit could prevent potential `StackOverflowError` in extreme cases, query plans are typically not deeply nested enough to cause this issue in practice. The suggestion is valid but addresses a low-probability edge case, making it a minor defensive improvement rather than a critical fix.	Low

Suggestions up to commit d2b35ee

Category	Suggestion	Impact
General	Add depth limit to recursive traversal The recursive traversal may cause a `StackOverflowError` for deeply nested query plans. Consider adding a depth limit or converting to an iterative approach using a stack to prevent potential stack overflow issues in production environments with complex queries. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [77-87] private boolean containsHead(UnresolvedPlan plan) { + return containsHead(plan, 0, 100); +} + +private boolean containsHead(UnresolvedPlan plan, int depth, int maxDepth) { + if (depth > maxDepth) { + return false; + } if (plan instanceof Head) { return true; } for (var child : plan.getChild()) { - if (child instanceof UnresolvedPlan && containsHead((UnresolvedPlan) child)) { + if (child instanceof UnresolvedPlan && containsHead((UnresolvedPlan) child, depth + 1, maxDepth)) { return true; } } return false; } Suggestion importance[1-10]: 4 __ Why: While adding a depth limit is a reasonable defensive practice, the suggestion assumes a risk that may not be realistic for typical query plans. Most query ASTs are shallow, and a `StackOverflowError` is unlikely in practice. The arbitrary limit of 100 could also cause legitimate queries to fail. This is a minor improvement for edge cases rather than a critical fix.	Low

When a user's PPL query contains an explicit head command, the fetch_size parameter should not inject an additional Head node on top. Previously, fetch_size always wrapped the plan with Head(fetchSize), creating a double-Head that could cap results unexpectedly (e.g., head 100 with fetch_size=5 would only return 5 rows). This adds a containsHead() check in AstStatementBuilder to skip the Head injection when the user's query already has a head command, letting the user's explicit intent take precedence. Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-02-27T01:11:32Z

Persistent review updated to latest commit ff1fd32

Remove the outer LogicalSort(fetch=[fetchSize]) from expected explain plans when an explicit head command is present, since fetch_size no longer injects a Head node in that case. Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-02-27T03:42:28Z

Persistent review updated to latest commit 7f797a9

dai-chen · 2026-02-27T18:54:15Z

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java

+   * Recursively checks if the AST contains a {@link Head} node. When the user's query already
+   * includes an explicit {@code head} command, we should not inject an additional Head for
+   * fetch_size so that the user's explicit limit takes precedence.


So if a query ends with ... | head 3 | head 500, the limit in final physical plan is Limit 500?

It should be 3, added a corresponding test case

So if a query ends with ... | head 3 | head 500, the limit in final physical plan is Limit 500?

The new logic only decides if fetch_size will inject a new head node or not, so it won't affect the original head command behavior(which is that the smaller head command will take effect)

Sorry I meant without current change, it will be LIMIT 500? If so, is there bug elsewhere or this is Calcite planner expected behavior?

Here's an example physical plan:
https://github.com/opensearch-project/sql/pull/5194/changes#diff-0c3c28474cebca090844af02cd01618a9ec578f69a272774934c7b3fb1c020aeR8

If without the current change, it will be LIMIT 3 in the final OpenSearchRequestBuilder(sourceBuilder
I think it's the expected behavior

Verifies that when a query has multiple heads (e.g., head 3 | head 500), fetch_size does not inject yet another Head node. The existing Head nodes are preserved as-is, with the inner head 3 limiting first. Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-02-27T18:58:50Z

Persistent review updated to latest commit adea087

Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-02-27T19:05:47Z

Persistent review updated to latest commit e62de53

Signed-off-by: ahkcs <austinhkcs@gmail.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-02-27T21:55:09Z

Persistent review updated to latest commit a6e1216

ahkcs force-pushed the fix/fetch-size-head-total-hits branch from d2b35ee to ff1fd32 Compare February 27, 2026 01:10

penghuo added the PPL Piped processing language label Feb 27, 2026

ahkcs changed the title ~~[Bug Fix] Explicit head command should override fetch_size in PPL~~ Explicit head command should override fetch_size in PPL Feb 27, 2026

dai-chen added the bugFix label Feb 27, 2026

dai-chen reviewed Feb 27, 2026

View reviewed changes

spotless

e62de53

Signed-off-by: Kai Huang <ahkcs@amazon.com>

ahkcs requested a review from dai-chen February 27, 2026 21:09

Add documentation for fetch_size and head command interaction

a6e1216

Signed-off-by: ahkcs <austinhkcs@gmail.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicit head command should override fetch_size in PPL#5194

Explicit head command should override fetch_size in PPL#5194
ahkcs wants to merge 5 commits intoopensearch-project:mainfrom
ahkcs:fix/fetch-size-head-total-hits

ahkcs commented Feb 27, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

dai-chen Feb 27, 2026

Uh oh!

ahkcs Feb 27, 2026

Uh oh!

ahkcs Feb 27, 2026

Uh oh!

dai-chen Feb 27, 2026

Uh oh!

ahkcs Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ahkcs commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit a6e1216)

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

dai-chen Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

ahkcs Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

ahkcs Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

dai-chen Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

ahkcs Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ahkcs commented Feb 27, 2026 •

edited

Loading

github-actions bot commented Feb 27, 2026 •

edited

Loading

(Review updated until commit `a6e1216`)

github-actions bot commented Feb 27, 2026 •

edited

Loading