Skip to content

[Feature] Support bi-directional graph traversal command graphlookup#5138

Merged
qianheng-aws merged 10 commits intomainfrom
feature/graphlookup
Feb 25, 2026
Merged

[Feature] Support bi-directional graph traversal command graphlookup#5138
qianheng-aws merged 10 commits intomainfrom
feature/graphlookup

Conversation

@LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Feb 11, 2026

Description

The graphlookup command performs recursive graph traversal on OpenSearch indices using a breadth-first search (BFS) algorithm.

┌───────────────────────────────────────────────────────────────┐
│              CalciteEnumerableGraphLookup                     │
│  ┌─────────────┐     ┌──────────────────────────────────────┐ │
│  │   Source    │────▶│         BFS Enumerator               │ │
│  │  Scannable  │     │  ┌────────────────────────────────┐  │ │
│  └─────────────┘     │  │  For each source row:          │  │ │
│                      │  │  1. Extract startField value   │  │ │
│  ┌─────────────┐     │  │  2. Query: toField = value     │  │ │
│  │   Lookup    │◀────│  │  3. Collect fromField values   │  │ │
│  │ IndexScan   │     │  │  4. Repeat until maxDepth      │  │ │
│  └─────────────┘     │  └────────────────────────────────┘  │ │
│                      └──────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────┘

Syntax

Option 1

graphLookup <lookupIndex> startField=<field> fromField=<field> toField=<field> [maxDepth=<maxDepth>] [depthField=<field>] [direction=(uni | bi)] [supportArray=(true | false)] [batchMode=(true | false)] [usePIT=(true | false)] [filter=(<condition>)] as <outputField>

Example Query

  source = employees
    | graphLookup employees
      startField=reportsTo
      fromField=reportsTo
      toField=name
      as reportingHierarchy

Option 2 (follow MongoDB naming convention)

graphLookup <lookupIndex> startWith=<field> connectFromField=<field> connectToField=<field> [maxDepth=<maxDepth>] [depthField=<field>] [direction=(uni | bi)] [supportArray=(true | false)] [batchMode=(true | false)] [usePIT=(true | false)] [filter=(<condition>)] as <outputField>

Example Query

  source = employees
    | graphLookup employees
      startWith=reportsTo
      connectFromField=reportsTo
      connectToField=name
      as reportingHierarchy

Performance

OpenSearch graphlookup command has implemented several features, which make it more applicable on RAG scenarios as well as suitable for different requirements.

  • Using params of direction=uni and usePIT=true, OpenSearch graphlookup achieves the same result as MongoDB's $graphlookup aggregation command with 5-6x faster on SNB SF-1 dataset, 29x faster on SNB SF-10. And OpenSearch can finish all benchmark on SNB SF-10 in 734ms, SNB SF-30 in 8000ms while MongoDB fails to complete.
  • In multiple start value scenarios, OpenSearch supports batchMode, which is more compatible with the requirements of RAG. Apart from already deduped on the final results, it also gets 58x faster than MongoDB on SNB SF-1 dataset. And also OpenSearch can finish multiple start value benchmark on SNB SF-10 in 3508ms, on SNB SF-30 in 10992ms while MongoDB fails to complete.
  • OpenSearch supports both uni-direction and bi-direction traversal. The latter will cost 4x on query time than the former but also can retrieve 4x more vertices.
  • OpenSearch also support PIT and non-PIT mode, the latter will get better performance on large dataset by sacrificing the accuracy. It gets 5x faster than the using PIT mode on SNB SF-30 but getting 27% nodes missed.
image (13) image (14)

Related Issues

Resolves #5088

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

qianheng-aws and others added 6 commits February 10, 2026 11:10
* succeed to graph lookup single index

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Implement graph lookup RelNode

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Refine - remove depth from BFS node

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Support bidirectional mode

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Support anonymize graph lookup

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix UT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add IT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add limitation for GraphLookup

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Simplify GraphLookup param names

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Refine IT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Support value of list; Support retrieve circle edges also

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add documentation for graph lookup

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Don't include loop edges

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Refine code

Signed-off-by: Heng Qian <qianheng@amazon.com>

* spotlessApply

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Refine code

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Filter visited nodes in search query

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix UT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add parameter supportArray for handling fields with array values

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Remove unused code

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Support batch mode

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Close lookup table scan

Signed-off-by: Heng Qian <qianheng@amazon.com>

* refine code

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add param usePIT

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
Co-authored-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
* Struct return array value instead of string

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Support filter in GraphLookup

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix IT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add experimental tag in doc

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 11, 2026

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced the graphLookup command for performing recursive graph traversal on collections using breadth-first search, with support for bidirectional traversal, configurable depth limits, array field handling, and optional filtering.
  • Documentation

    • Added comprehensive documentation for the graphLookup command, including syntax, parameters, usage examples, and feature limitations.

Walkthrough

Introduces a new graphLookup command for multi-hop graph traversal queries using breadth-first search. Includes AST node definitions, Calcite relational algebra nodes, PPL grammar extensions, executable Enumerable implementation, planner rules for optimization, comprehensive documentation, and test infrastructure with sample datasets and test cases.

Changes

Cohort / File(s) Summary
AST Node & Visitor Support
core/src/main/java/org/opensearch/sql/ast/tree/GraphLookup.java, core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java, core/src/main/java/org/opensearch/sql/analysis/Analyzer.java
Adds GraphLookup AST node with Direction enum (UNI/BI) and configuration fields for graph traversal parameters. Implements visitor pattern across AbstractNodeVisitor and Analyzer with delegation to Calcite-only exception path.
Calcite Relational Algebra
core/src/main/java/org/opensearch/sql/calcite/plan/rel/GraphLookup.java, core/src/main/java/org/opensearch/sql/calcite/plan/rel/LogicalGraphLookup.java, core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Defines abstract GraphLookup BiRel node with row type derivation and estimation logic. Implements LogicalGraphLookup factory and copy methods. CalciteRelNodeVisitor extracts parameters, materializes sources, and constructs LogicalGraphLookup RelNode with filter support.
PPL Grammar & Parser
ppl/src/main/antlr/OpenSearchPPLLexer.g4, ppl/src/main/antlr/OpenSearchPPLParser.g4, ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
Adds lexer tokens (GRAPHLOOKUP, START_FIELD, FROM_FIELD, etc.) and parser rules for graphLookupCommand with options. AstBuilder visitGraphLookupCommand validates required fields (fromField, toField) and constructs GraphLookup AST node.
OpenSearch Execution & Optimization
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java, opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableGraphLookupRule.java, opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
Implements high-performance BFS-based GraphLookupEnumerable with frontier traversal, depth tracking, optional bidirectional mode, and push-down filter support. EnumerableGraphLookupRule converts LogicalGraphLookup to CalciteEnumerableGraphLookup with OpenSearchIndex extraction. Registers rule in planner.
Support Classes & Accessors
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java, opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java, opensearch/src/main/java/org/opensearch/sql/opensearch/storage/OpenSearchIndex.java, opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java
Maps MULTISET SQL type to STRING ExprCoreType. Adds StructImpl value handling in processValue. Exposes indexName via getter. Increases visibility of PredicateAnalyzer.QueryExpression.create and LiteralExpression for external usage.
PPL Utilities & Tests
ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java, ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java, ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLGraphLookupTest.java, ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
Adds GraphLookup anonymization support. Includes unit tests for AST parsing, semantic validation (missing fields), and Calcite integration with custom EmployeeTable for logical plan verification.
Integration Tests
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java
Comprehensive integration test suite with 23+ test cases covering employee hierarchy, airport connectivity, bidirectional traversal, filtering, batch mode, and edge cases with nested array assertions.
Test Infrastructure
integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java, integ-test/src/test/java/org/opensearch/sql/legacy/TestUtils.java, integ-test/src/test/java/org/opensearch/sql/legacy/TestsConstants.java
Adds three test index definitions: GRAPH_EMPLOYEES, GRAPH_TRAVELERS, GRAPH_AIRPORTS with associated helper methods and constants.
Test Data & Mappings
integ-test/src/test/resources/graph_*.json, integ-test/src/test/resources/indexDefinitions/graph_*_index_mapping.json
Provides sample datasets (employees with hierarchy, travelers with nearest airports, airports with connections) and Elasticsearch index mappings defining field structures.
Documentation
docs/user/ppl/cmd/graphlookup.md, docs/user/ppl/index.md
User-facing documentation for graphLookup command with syntax, parameters, traversal logic, batch mode behavior, examples (employee hierarchy, airport networks), and limitations. Updates index to reference new command.

Sequence Diagram

sequenceDiagram
    participant User as User/Client
    participant Parser as PPL Parser
    participant AstBuilder as AstBuilder
    participant Analyzer as Analyzer
    participant CalciteRel as CalciteRelNodeVisitor
    participant Planner as Planner/Rules
    participant Engine as OpenSearchExecutionEngine
    participant ES as OpenSearch

    User->>Parser: graphLookup command (PPL)
    Parser->>AstBuilder: parse graphLookupCommand context
    AstBuilder->>AstBuilder: validate fromField, toField required
    AstBuilder->>Parser: return GraphLookup AST node
    
    Parser->>Analyzer: visit GraphLookup node
    Analyzer->>Analyzer: delegate to Calcite-only path
    Analyzer->>CalciteRel: LogicalPlan result
    
    CalciteRel->>CalciteRel: extract graph parameters
    CalciteRel->>CalciteRel: materialize source & lookup tables
    CalciteRel->>CalciteRel: construct LogicalGraphLookup RelNode
    CalciteRel->>Planner: LogicalGraphLookup plan
    
    Planner->>Planner: match EnumerableGraphLookupRule
    Planner->>Planner: extract OpenSearchIndex from lookup
    Planner->>Planner: convert to CalciteEnumerableGraphLookup
    Planner->>Engine: optimized executable plan
    
    Engine->>Engine: scan source rows
    Engine->>ES: query frontier values from lookup
    ES-->>Engine: lookup results
    
    Engine->>Engine: BFS traversal iteration
    Engine->>ES: query next frontier (visited tracking)
    ES-->>Engine: next level results
    
    Engine->>Engine: accumulate results / apply batchMode
    Engine->>User: return enriched rows with graph results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • PR #4929: Modifies CalciteRelNodeVisitor join and subsearch limit handling in the same visitor class where graphLookup visitor method is added, indicating architectural alignment.

Suggested labels

calcite, enhancement

Suggested reviewers

  • LantaoJin
  • penghuo
  • anirudha
  • ps48
  • joshuali925
  • derek-ho
  • GumpacG
  • Swiddis
  • vamsimanohar
  • mengweieric
  • Yury-Fridlyand
  • MaxKsyunz
  • YANG-DB
  • dai-chen
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.31% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly describes the main feature being added - support for a bi-directional graph traversal command called 'graphlookup', which directly aligns with the primary changes across multiple files.
Description check ✅ Passed The PR description is comprehensive and directly related to the changeset, providing syntax, examples, architectural diagrams, and performance metrics for the graphlookup command being implemented.
Linked Issues check ✅ Passed The PR implements the core objectives from issue #5088: adds a PPL graphlookup command semantically equivalent to MongoDB's $graphLookup supporting multi-hop queries with BFS, configurable parameters, and high-performance Enumerable-based traversal.
Out of Scope Changes check ✅ Passed All changes are within scope and directly support graphlookup implementation: AST nodes, Calcite plan nodes, parser/lexer updates, execution engine implementation, test data/mappings, documentation, and integration/unit tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/graphlookup

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.41.0)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java (1)

14-23: ⚠️ Potential issue | 🔴 Critical

StructImpl attributes should be recursively converted to handle nested structures.

StructImpl may contain nested maps, lists, or geo points. The current implementation at line 250 returns raw attributes without passing them through processValue, unlike the recursive handling for Maps (line 245) and Lists (line 256). This can leak unconverted types.

Replace the current logic with iteration and recursive conversion:

🛠️ Proposed fix
    if (value instanceof StructImpl) {
-     return Arrays.asList(((StructImpl) value).getAttributes());
+     Object[] attributes = ((StructImpl) value).getAttributes();
+     List<Object> converted = new ArrayList<>(attributes.length);
+     for (Object attribute : attributes) {
+       converted.add(processValue(attribute));
+     }
+     return converted;
    }

Note: ArrayList is already imported; only the logic needs updating.

🤖 Fix all issues with AI agents
In `@core/src/main/java/org/opensearch/sql/analysis/Analyzer.java`:
- Around line 550-553: Add missing JavaDoc to the public method
Analyzer.visitGraphLookup(GraphLookup node, AnalysisContext context) documenting
`@param` node, `@param` context, `@return` (LogicalPlan), and `@throws` for the
Calcite-only exception thrown by getOnlyForCalciteException("graphlookup"); then
add a unit test in the AnalyzerTest (or equivalent test class) that calls
visitGraphLookup with a mocked/constructed GraphLookup and AnalysisContext and
asserts that the getOnlyForCalciteException is thrown (verify exception type and
message contains "graphlookup"); ensure the test and JavaDoc are committed
together.

In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 2534-2536: The hard-coded builder.limit(0, 100) in
CalciteRelNodeVisitor (using context.relBuilder / builder) unconditionally
truncates graphLookup sources; remove this unconditional limit or replace it
with a configurable cap: delete the builder.limit(0, 100) call (or wrap it
behind a configuration check), add a new planner/config option (e.g.,
graphLookupSourceRowLimit) accessed from the visitor/context, and only apply
builder.limit(0, limit) when that config is set to a positive value so the
default preserves full source rows.

In
`@core/src/main/java/org/opensearch/sql/calcite/plan/rel/LogicalGraphLookup.java`:
- Around line 129-146: The public method copy() in class LogicalGraphLookup
lacks the required JavaDoc and there are no accompanying unit tests for this new
logical node; add a JavaDoc block above copy() that documents the method purpose
and includes `@param` traitSet, `@param` inputs, `@return` RelNode and any `@throws` (if
applicable), then add unit tests for LogicalGraphLookup exercising copy(),
equality, and basic plan behavior (e.g., when given two inputs and various flags
like bidirectional/supportArray/batchMode) in the same commit to satisfy core
guidelines.

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java`:
- Around line 115-152: Add boundary and invalid-parameter tests for graphLookup
by creating new test methods alongside testEmployeeHierarchyWithMaxDepth that
call executeQuery with invalid/edge values and assert proper validation errors:
(1) maxDepth=0 to verify minimum depth rejection, (2) negative maxDepth (e.g.,
-1) to verify numeric validation, (3) invalid direction value to verify
enum/domain validation, and (4) queries missing required fields (e.g., omit
startField or fromField) to verify required-field validation; use the same
helpers (executeQuery, verifySchema, verifyDataRows) where success is expected
and assert the returned JSONObject contains the appropriate error
messages/status when validation should fail.
- Around line 54-82: Add explicit NULL-input tests by creating two new test
methods in CalcitePPLGraphLookupIT: testEmployeeHierarchyNullStartValue and
testEmployeeHierarchyNullFieldParameters. For
testEmployeeHierarchyNullStartValue call executeQuery with the same graphLookup
but ensure it exercises documents whose startField value is NULL (use reportsTo
which already has a null row) and verify the reportingHierarchy for that row is
an empty array via verifyDataRows (use the existing rows/verify helpers). For
testEmployeeHierarchyNullFieldParameters call executeQuery with
invalid/empty/missing startField/fromField/toField names (e.g., empty string or
a non-existent field) and assert the current behavior: if the implementation
throws an exception assert that exception is raised, otherwise assert the query
returns rows with empty reportingHierarchy; reference the existing
testEmployeeHierarchyBasicTraversal, executeQuery, verifyDataRows, and schema
helpers to implement these checks.
- Around line 239-262: The graphLookup invocation in
testAirportConnectionsWithDepthField uses array-valued fields
(startField=connects and fromField=connects) but missing the supportArray=true
flag; update the query built in testAirportConnectionsWithDepthField (the
graphLookup operator call) to include supportArray=true so the graphLookup
optimization/visited-filter behavior is correct and consistent with other tests
that handle array-valued fields.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`:
- Around line 371-429: The loop increments currentLevelDepth and unconditionally
compares it to graphLookup.maxDepth, but graphLookup.maxDepth uses -1 to mean
"unlimited", so change the break condition in CalciteEnumerableGraphLookup (the
BFS loop using currentLevelDepth and graphLookup.maxDepth) to only perform the
comparison when graphLookup.maxDepth is non-negative (e.g., if
(graphLookup.maxDepth >= 0 && ++currentLevelDepth > graphLookup.maxDepth)
break), ensuring negative maxDepth skips the depth check and preserves existing
increment semantics.
- Around line 393-425: The loop over forwardResults in
CalciteEnumerableGraphLookup currently only adds rows when nextValues is
non-empty, dropping matched rows that don't expand traversal; change it to
always add the current row to results (respecting graphLookup.depthField by
appending currentLevelDepth when needed) and then use nextValues solely to
enqueue new nodes and mark visitedNodes — i.e., move the results.add(...) /
rowWithDepth logic out of the if (!nextValues.isEmpty()) block so rows are
unconditionally preserved while still only offering non-null, unvisited
nextValues to queue.offer and visitedNodes.add.

In `@ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java`:
- Around line 1498-1555: The visitGraphLookupCommand method currently only
enforces fromField/toField but not startField, so add a validation check before
building the GraphLookup plan: if startField is null, throw a
SemanticCheckException with a clear message (e.g., "startField must be specified
for graphLookup"); update the same validation block where fromField/toField are
checked in visitGraphLookupCommand to include startField to ensure GraphLookup
(or any later use of startField) cannot receive a null value.

In
`@ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLGraphLookupTest.java`:
- Around line 132-148: Add null, boundary, and error-condition test cases to
CalcitePPLGraphLookupTest alongside the existing testGraphLookupBidirectional:
create new `@Test` methods (e.g., testGraphLookupNullStartField,
testGraphLookupMissingRequiredField, testGraphLookupInvalidDirection,
testGraphLookupMaxDepthZero) that call getRelNode with crafted PPL strings
(null/empty startField, omit required fields, invalid direction value, explicit
maxDepth=0) and assert expected failures or specific logical plans using
verifyLogical or assertThrows as appropriate; reuse the existing helper methods
getRelNode and verifyLogical to validate correct error handling and boundary
behavior.
- Around line 150-170: The current CalcitePPLGraphLookupTest.config(...) only
sets up the schema and logical plan inputs but lacks assertions for SQL
generation and optimization; add test cases that exercise Calcite's SQL
generation and optimizer for the graphLookup scenario by invoking the
planner/validator to produce the generated SQL and the optimized plan from the
same setup (use the test class CalcitePPLGraphLookupTest and its config method
to build the SchemaPlus with the "employee" EmployeeTable), then assert the
generated SQL contains the expected graphLookup/recursive patterns and assert
the optimized plan (planner output or Programs.heuristicJoinOrder result)
contains expected optimization transformations (e.g., join order or pushed
predicates); ensure these assertions are added alongside existing logical-plan
checks so the test covers SQL generation and optimization paths.
🟡 Minor comments (11)
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java-1816-1822 (1)

1816-1822: ⚠️ Potential issue | 🟡 Minor

Add a null guard now that LiteralExpression is public.

With a public constructor, passing null will fail later in less obvious places; fail fast here. Line 1820.

🔧 Suggested fix
 public LiteralExpression(RexLiteral literal) {
-  this.literal = literal;
+  this.literal = requireNonNull(literal, "literal");
 }
integ-test/src/test/resources/indexDefinitions/graph_travelers_index_mapping.json-1-12 (1)

1-12: ⚠️ Potential issue | 🟡 Minor

Address schema mismatch and incomplete edge case coverage in test data.

The test data file (graph_travelers.json) contains an extra index field not defined in the mapping. Additionally, while the data correctly includes null values for nearestAirport (3 of 6 documents), it lacks multi-valued entries to test array handling. Either update the mapping to include the index field or remove it from the test data, and add at least one document with an array value for nearestAirport to exercise boundary conditions.

integ-test/src/test/resources/graph_travelers.json-1-6 (1)

1-6: ⚠️ Potential issue | 🟡 Minor

Add an edge-case traveler for missing/unknown nearestAirport.

This dataset only includes valid single values, so null/missing start values aren’t exercised. Consider adding a record that lacks nearestAirport (or sets it to null) to cover boundary behavior.

➕ Suggested NDJSON addition
 {"index":{"_id":"3"}}
 {"name":"Jeff","nearestAirport":"BOS"}
+{"index":{"_id":"4"}}
+{"name":"Ava","nearestAirport":null}

As per coding guidelines, Ensure test data covers edge cases and boundary conditions.

core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java-480-482 (1)

480-482: ⚠️ Potential issue | 🟡 Minor

Add JavaDoc for the new public visitor method.

The new public visitGraphLookup lacks the required JavaDoc with @param/@return.

✍️ Suggested JavaDoc
+  /**
+   * Visit GraphLookup node.
+   *
+   * `@param` node GraphLookup node
+   * `@param` context Context
+   * `@return` Return Type
+   */
   public T visitGraphLookup(GraphLookup node, C context) {
     return visitChildren(node, context);
   }

As per coding guidelines, Public methods MUST have JavaDoc with @param, @return, and @throws.

integ-test/src/test/resources/graph_airports.json-1-10 (1)

1-10: ⚠️ Potential issue | 🟡 Minor

Add a leaf airport with an empty connects list.

All airports currently have at least one connection; adding a no‑outbound node helps test traversal boundaries.

➕ Suggested NDJSON addition
 {"index":{"_id":"5"}}
 {"airport":"LHR","connects":["PWM"]}
+{"index":{"_id":"6"}}
+{"airport":"SFO","connects":[]}

As per coding guidelines, Ensure test data covers edge cases and boundary conditions.

integ-test/src/test/resources/graph_employees.json-1-12 (1)

1-12: ⚠️ Potential issue | 🟡 Minor

Add a cyclic/self-loop record to cover visited-node edge cases.

The dataset is a simple chain; adding a self-loop (or small cycle) will exercise visited-node/loop-edge handling in graph traversal tests.

💡 Suggested test data addition
@@
 {"index":{"_id":"6"}}
 {"id":6,"name":"Dan","reportsTo":"Andrew"}
+{"index":{"_id":"7"}}
+{"id":7,"name":"Loop","reportsTo":"Loop"}
As per coding guidelines: Verify test data covers edge cases and boundary conditions; Verify test data is realistic and representative.
docs/user/ppl/cmd/graphlookup.md-33-36 (1)

33-36: ⚠️ Potential issue | 🟡 Minor

Fix typos/grammar in the parameter descriptions.

Spelling and hyphenation issues reduce clarity in the parameter table.

✍️ Suggested edits
@@
-| `startField=<startField>` | Required | The field in the source documents whose value is used to start the recursive search. The value of this field is matched against `toField` in the lookup index. We support both single value and array values as starting points.   |
+| `startField=<startField>` | Required | The field in the source documents whose value is used to start the recursive search. The value of this field is matched against `toField` in the lookup index. We support both single-value and array values as starting points.   |
@@
-| `maxDepth=<maxDepth>` | Optional | The maximum recursion depth of hops. Default is `0`. A value of `0` means only the direct connections to the statr values are returned. A value of `1` means 1 hop connections (initial match plus one recursive step), and so on. |
+| `maxDepth=<maxDepth>` | Optional | The maximum recursion depth of hops. Default is `0`. A value of `0` means only the direct connections to the start values are returned. A value of `1` means 1 hop connections (initial match plus one recursive step), and so on. |
As per coding guidelines: Check for consistent formatting and style.
core/src/main/java/org/opensearch/sql/calcite/plan/rel/GraphLookup.java-109-190 (1)

109-190: ⚠️ Potential issue | 🟡 Minor

Add JavaDoc tags for GraphLookup public APIs.

getSource/getLookup/copy/estimateRowCount/explainTerms are public but lack @param/@return/@throws tags required by core guidelines.
As per coding guidelines: Public methods MUST have JavaDoc with @param, @return, and @throws.

ppl/src/main/antlr/OpenSearchPPLParser.g4-94-95 (1)

94-95: ⚠️ Potential issue | 🟡 Minor

Add missing graphLookup option keywords to searchableKeyWord.

START_FIELD, SUPPORT_ARRAY, BATCH_MODE, and USE_PIT are defined as lexer tokens and used in graphLookupOption (lines 636–651) but are not added to searchableKeyWord, unlike FROM_FIELD and TO_FIELD (lines 1716–1723). Since the lexer tokenizes them as keywords, field names with these values will fail or require quoting. Add all four to searchableKeyWord for consistency.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java-392-422 (1)

392-422: ⚠️ Potential issue | 🟡 Minor

Update Javadoc to match asserted behavior.

The comment says allConnections is empty due to array type, but the assertions expect non-empty results. Please align the comment with the actual expectation.

📝 Suggested comment update
-  /**
-   * Test 12: Bidirectional airport connections for ORD. Note: Currently returns empty
-   * allConnections array because the connects field is an array type.
-   */
+  /**
+   * Test 12: Bidirectional airport connections for ORD.
+   */
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java-666-700 (1)

666-700: ⚠️ Potential issue | 🟡 Minor

Fix maxDepth comment mismatch.

The comment says maxDepth=1, but the query uses maxDepth=3. Please update the comment to reflect the actual parameter.

📝 Suggested comment update
-   * travelers' nearest airports: JFK (Dev, Eliot), BOS (Jeff) Unified BFS from {JFK, BOS} with
-   * maxDepth=1 finds connected airports.
+   * travelers' nearest airports: JFK (Dev, Eliot), BOS (Jeff) Unified BFS from {JFK, BOS} with
+   * maxDepth=3 finds connected airports.
🧹 Nitpick comments (4)
core/src/main/java/org/opensearch/sql/ast/tree/GraphLookup.java (1)

29-83: Prefer AST immutability by removing class-level @Setter.
attach already mutates the child; exposing setChild weakens immutability without clear need.

♻️ Suggested change
-import lombok.Setter;
@@
-@Setter
 `@ToString`
As per coding guidelines: `core/src/main/java/org/opensearch/sql/ast/**/*.java` requires AST nodes to be immutable where possible.
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1)

646-712: Add anonymizer coverage for Mongo-style graphLookup and usePIT.

The test only exercises fromField/toField syntax. The alternate startWith/connectFromField/connectToField form and the usePIT flag aren’t covered yet.

✅ Suggested test additions
     // graphLookup with compound filter
     assertEquals(
         "source=table | graphlookup table fromField=identifier toField=identifier"
             + " direction=uni filter=(identifier = *** and identifier > ***) as identifier",
         anonymize(
             "source=t | graphLookup employees fromField=manager toField=name"
                 + " filter=(status = 'active' AND id > 2) as reportingHierarchy"));
+
+    // graphLookup with Mongo-style syntax
+    assertEquals(
+        "source=table | graphlookup table startWith=*** connectFromField=identifier"
+            + " connectToField=identifier direction=uni as identifier",
+        anonymize(
+            "source=t | graphLookup employees startWith='CEO' connectFromField=manager"
+                + " connectToField=name as reportingHierarchy"));
+
+    // graphLookup with usePIT
+    assertEquals(
+        "source=table | graphlookup table fromField=identifier toField=identifier"
+            + " direction=uni usePIT=true as identifier",
+        anonymize(
+            "source=t | graphLookup employees fromField=manager toField=name"
+                + " usePIT=true as reportingHierarchy"));
ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java (1)

1648-1718: Add parser coverage for Mongo-style syntax and optional flags.

The test validates required fields and some options, but doesn’t cover the startWith/connectFromField/connectToField variant (or flags like supportArray, batchMode, usePIT). Adding at least one positive case for that syntax would better exercise the parser path.

As per coding guidelines, Include edge cases and boundary conditions in parser and grammar tests.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

2530-2589: Extract visitGraphLookup into helper methods.

This method is now >50 lines; splitting parameter extraction, source/lookup materialization, and RelNode construction would keep the visitor readable.
As per coding guidelines: core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java: Flag methods >50 lines - this file is known to be hard to read; Suggest extracting complex logic into helper methods.

Comment on lines +2534 to +2536
RelBuilder builder = context.relBuilder;
// TODO: Limit the number of source rows to 100 for now, make it configurable.
builder.limit(0, 100);
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove the hard-coded 100-row cap on graphLookup sources.

The unconditional limit truncates the start set and will silently drop traversal results once there are more than 100 source rows. Please make this configurable or remove it to preserve correctness.

🛠️ Suggested change
-    // TODO: Limit the number of source rows to 100 for now, make it configurable.
-    builder.limit(0, 100);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RelBuilder builder = context.relBuilder;
// TODO: Limit the number of source rows to 100 for now, make it configurable.
builder.limit(0, 100);
RelBuilder builder = context.relBuilder;
🤖 Prompt for AI Agents
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`
around lines 2534 - 2536, The hard-coded builder.limit(0, 100) in
CalciteRelNodeVisitor (using context.relBuilder / builder) unconditionally
truncates graphLookup sources; remove this unconditional limit or replace it
with a configurable cap: delete the builder.limit(0, 100) call (or wrap it
behind a configuration check), add a new planner/config option (e.g.,
graphLookupSourceRowLimit) accessed from the visitor/context, and only apply
builder.limit(0, limit) when that config is set to a positive value so the
default preserves full source rows.

👋 Leave emoji reaction (👍/👎) to track effectiveness of CodeRabbit.

@LantaoJin
Copy link
Member Author

LantaoJin commented Feb 13, 2026

@penghuo can you review and merge this?

| [addcoltotals command](cmd/addcoltotals.md) | 3.5 | stable (since 3.5) | Adds column values and appends a totals row. |
| [transpose command](cmd/transpose.md) | 3.5 | stable (since 3.5) | Transpose rows to columns. |
| [mvcombine command](cmd/mvcombine.md) | 3.5 | stable (since 3.4) | Combines values of a specified field across rows identical on all other fields. |
| [graphlookup command](cmd/graphlookup.md) | 3.5 | experimental (since 3.5) | Performs recursive graph traversal on a collection using a BFS algorithm.|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since 3.6

The `graphLookup` command has the following syntax:

```syntax
graphLookup <lookupIndex> startField=<startField> fromField=<fromField> toField=<toField> [maxDepth=<maxDepth>] [depthField=<depthField>] [direction=(uni | bi)] [supportArray=(true | false)] [batchMode=(true | false)] [usePIT=(true | false)] [filter=(<condition>)] as <outputField>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we align on syntax? #5136 (comment)

cc: @qianheng-aws @model-collapse

rows(
"JFK",
List.of("BOS", "ORD"),
List.of(List.of("JFK", List.of("BOS", "ORD")), List.of("BOS", List.of("JFK", "PWM")))));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected expectation would be:

  rows("JFK", List.of("BOS", "ORD"),
      List.of(
          List.of("JFK", List.of("BOS", "ORD")),
          List.of("BOS", List.of("JFK", "PWM")),
          List.of("ORD", List.of("JFK"))))

The point is: ORD is discovered by the BFS at Level 1, but then line 408's if (!nextValues.isEmpty()) drops it because ORD's only connection (JFK) was already visited. It that dropping is intentional?

Copy link
Collaborator

@qianheng-aws qianheng-aws Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's intentional not to include loop edges in the result in the current implementation. And The edge ORD -> JFK is a loop edge, which won't contribute to discover new vertices. Actually this edge should be dropped in the search query, before line 408.

There are some reasons dong that:

  1. To exclude loop edges, we can add another filter target no in visited nodes directly in our BFS search query(line 462 & 474). It will be more efficient and that somehow explains why we are faster than mongoDB.
  2. If we really want to include loop edge in the results, we also have to distinguish that from duplicated edges based on unique id like doc_id.

// Get the source enumerator
if (graphLookup.getSource() instanceof Scannable scannable) {
Enumerable<?> sourceEnum = scannable.scan();
this.sourceEnumerator = (Enumerator<@Nullable Object>) sourceEnum.enumerator();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an exception is thrown after line 209 but before the constructor finishes, the object is never fully constructed, so close() is never called by the caller. The sourceEnumerator leaks.

Signed-off-by: Heng Qian <qianheng@amazon.com>
…aphlookup

# Conflicts:
#	docs/user/ppl/index.md
#	opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java
…hlookup

# Conflicts:
#	core/src/main/java/org/opensearch/sql/analysis/Analyzer.java
#	core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
#	docs/user/ppl/index.md
#	ppl/src/main/antlr/OpenSearchPPLParser.g4
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
scripts/build.sh (1)

71-81: ⚠️ Potential issue | 🟠 Major

Hardcoded FIPS-140-3 with no override mechanism will break Maven publishing in strict FIPS environments.

Two interconnected concerns:

  1. No CLI override: Every other tunable (VERSION, QUALIFIER, SNAPSHOT, OUTPUT) is exposed as a getopts flag. -Pcrypto.standard=FIPS-140-3 is hardcoded on all three Gradle invocations with no way to disable or customize it, preventing non-FIPS builds or selective use.

  2. Gradle Maven publish incompatibility: Gradle publishes MD5 and SHA-1 checksum sidecars by default (no supported knob to disable them). When the JVM is configured for FIPS-140-3 with a strict FIPS provider (e.g., Oracle's JVM FIPS or Bouncy Castle FIPS), MD5 is disabled in approved mode. This causes publishToMavenLocal and publishPluginZipPublicationToZipStagingRepository on lines 80–81 to fail when attempting checksum generation—a documented issue in FIPS-enabled CI environments.

Make crypto standard configurable with an optional flag and default to no crypto enforcement:

 echo -e "-o OUTPUT\t[Optional] Output path, default is 'artifacts'."
+echo -e "-c CRYPTO_STANDARD\t[Optional] Gradle crypto standard (e.g. FIPS-140-3). Omitted by default."
 echo -e "-h help"
-while getopts ":h:v:q:s:o:p:a:" arg; do
+while getopts ":h:v:q:s:o:p:a:c:" arg; do
     case $arg in
+        c)
+            CRYPTO_STANDARD=$OPTARG
+            ;;
+[[ ! -z "$CRYPTO_STANDARD" ]] && CRYPTO_FLAG="-Pcrypto.standard=$CRYPTO_STANDARD"

-./gradlew assemble ... -Pcrypto.standard=FIPS-140-3
+./gradlew assemble ... $CRYPTO_FLAG

-./gradlew publishToMavenLocal ... -Pcrypto.standard=FIPS-140-3
+./gradlew publishToMavenLocal ... $CRYPTO_FLAG

-./gradlew publishPluginZipPublicationToZipStagingRepository ... -Pcrypto.standard=FIPS-140-3
+./gradlew publishPluginZipPublicationToZipStagingRepository ... $CRYPTO_FLAG
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/build.sh` around lines 71 - 81, The script hardcodes
-Pcrypto.standard=FIPS-140-3 into all Gradle invocations; add a CLI flag (e.g.,
-c/--crypto or CRYPTO_STANDARD) with a default of empty and update getopts to
populate CRYPTO_STANDARD, then only append "-Pcrypto.standard=$CRYPTO_STANDARD"
to the ./gradlew commands (assemble, publishToMavenLocal,
publishPluginZipPublicationToZipStagingRepository) if CRYPTO_STANDARD is
non-empty so callers can disable or change the crypto standard and avoid
FIPS-related checksum failures.
core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java (1)

286-293: ⚠️ Potential issue | 🟡 Minor

Fix IndexOutOfBoundsException in resolveFieldAccess when accessing remaining path parts

Line 291 calculates the count of remaining path parts incorrectly. The third parameter to joinParts should account for the start offset:

Current vs. corrected code
- String itemName = joinParts(parts, length + start, parts.size() - length);
+ String itemName = joinParts(parts, length + start, parts.size() - start - length);

When start=1 and parts = ["alias", "field", "subfield"] with only length=1 matched, the current code calls joinParts(parts, 2, 2), which attempts to access parts[3]—out of bounds. The corrected count parts.size() - start - length ensures the range stays within valid indices.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java`
around lines 286 - 293, In resolveFieldAccess, the call to joinParts computes
the count of remaining parts incorrectly causing IndexOutOfBoundsException;
change the third argument so it uses the remaining parts count accounting for
start (i.e. replace parts.size() - length with parts.size() - start - length) so
itemName = joinParts(parts, length + start, parts.size() - start - length)
before calling createItemAccess(field, itemName, context).
docs/category.json (1)

10-69: ⚠️ Potential issue | 🟡 Minor

Add graphlookup.md to the ppl_cli_calcite array in docs/category.json.

The file docs/user/ppl/cmd/graphlookup.md exists but is not registered in docs/category.json. It should be added to the ppl_cli_calcite array to be included in the doc integration test suite. Insert it in alphabetical order between grok.md and head.md.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/category.json` around lines 10 - 69, Add "user/ppl/cmd/graphlookup.md"
to the ppl_cli_calcite array in docs/category.json (the array named
ppl_cli_calcite) so the new entry appears alphabetically between
"user/ppl/cmd/grok.md" and "user/ppl/cmd/head.md"; ensure you insert it as a
proper JSON string with correct commas and preserve valid JSON formatting.
♻️ Duplicate comments (4)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java (2)

433-434: ⚠️ Potential issue | 🟠 Major

Handle maxDepth = -1 (unlimited) correctly.
The current break condition stops after the first level when maxDepth is negative. Skip the depth check for negative values.

🛠️ Suggested change
-        if (++currentLevelDepth > graphLookup.maxDepth) break;
+        currentLevelDepth++;
+        if (graphLookup.maxDepth >= 0 && currentLevelDepth > graphLookup.maxDepth) {
+          break;
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`
around lines 433 - 434, The loop increments currentLevelDepth and compares it to
graphLookup.maxDepth, but when maxDepth is negative (e.g., -1 for unlimited) the
current check (++currentLevelDepth > graphLookup.maxDepth) will always break;
update the condition in CalciteEnumerableGraphLookup so the depth check is
skipped when graphLookup.maxDepth < 0 — e.g., only increment/compare when
graphLookup.maxDepth >= 0 (or guard the comparison with graphLookup.maxDepth >=
0) so unlimited depth is handled correctly by allowing the loop to continue.

412-430: ⚠️ Potential issue | 🟠 Major

Include matched rows even when traversal doesn’t expand.
Rows are only added when nextValues is non-empty, which drops reachable docs when multiple starts or converging edges yield already-visited nodes. Add the row unconditionally; use nextValues only for queueing.

🛠️ Suggested change
-          // Add row to results if the nextValues is not empty
-          if (!nextValues.isEmpty()) {
-            if (graphLookup.depthField != null) {
-              Object[] rowWithDepth = new Object[rowArray.length + 1];
-              System.arraycopy(rowArray, 0, rowWithDepth, 0, rowArray.length);
-              rowWithDepth[rowArray.length] = currentLevelDepth;
-              results.add(rowWithDepth);
-            } else {
-              results.add(rowArray);
-            }
-
-            // Add unvisited non-null values to queue for next level traversal
-            for (Object val : nextValues) {
-              if (val != null) {
-                visitedNodes.add(val);
-                queue.offer(val);
-              }
-            }
-          }
+          // Always include the matched row; use nextValues only for expansion.
+          if (graphLookup.depthField != null) {
+            Object[] rowWithDepth = new Object[rowArray.length + 1];
+            System.arraycopy(rowArray, 0, rowWithDepth, 0, rowArray.length);
+            rowWithDepth[rowArray.length] = currentLevelDepth;
+            results.add(rowWithDepth);
+          } else {
+            results.add(rowArray);
+          }
+
+          // Add unvisited non-null values to queue for next level traversal
+          for (Object val : nextValues) {
+            if (val != null) {
+              visitedNodes.add(val);
+              queue.offer(val);
+            }
+          }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`
around lines 412 - 430, CalciteEnumerableGraphLookup currently only adds rows to
results when nextValues is non-empty, which drops matched rows when traversal
doesn't expand; always add the constructed row (use graphLookup.depthField to
decide between rowArray and rowWithDepth using rowArray/currentLevelDepth) to
results unconditionally, and keep the subsequent loop that iterates over
nextValues only for enqueuing unvisited non-null values into visitedNodes and
queue (do not gate the results.add(...) behind nextValues.isEmpty()).
core/src/main/java/org/opensearch/sql/analysis/Analyzer.java (1)

552-565: ⚠️ Potential issue | 🟠 Major

Add required JavaDoc and unit tests for the new Calcite‑only visitor methods.

These new public overrides lack JavaDoc and accompanying unit tests that assert the Calcite‑only exception path (nomv/mvexpand/graphlookup).

📚 Example JavaDoc pattern (apply similarly to each method)
+  /**
+   * Reject nomv in the legacy analyzer (Calcite-only).
+   *
+   * `@param` node nomv AST node
+   * `@param` context analysis context
+   * `@return` never returns (throws)
+   * `@throws` RuntimeException when nomv is used in the legacy analyzer
+   */
  `@Override`
  public LogicalPlan visitNoMv(NoMv node, AnalysisContext context) {
    throw getOnlyForCalciteException("nomv");
  }
As per coding guidelines: Public methods MUST have JavaDoc with `@param`, `@return`, and `@throws`; New functions MUST have unit tests in the same commit.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/opensearch/sql/analysis/Analyzer.java` around lines
552 - 565, The three new public visitor overrides (visitNoMv, visitMvExpand,
visitGraphLookup) are missing JavaDoc and unit tests; add JavaDoc to each method
following project pattern including `@param` AnalysisContext context, `@return`
LogicalPlan, and `@throws` for the Calcite-only exception, and add unit tests (in
the same commit) that invoke Analyzer.visitNoMv/visitMvExpand/visitGraphLookup
with a mock or real NoMv/MvExpand/GraphLookup node and assert that
getOnlyForCalciteException is thrown (and that the exception message matches
"nomv", "mvexpand", "graphlookup" respectively) to cover the Calcite-only path.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

2545-2551: Hard-coded 100-row limit on graphLookup source is still present.

This was flagged in a prior review. The unconditional builder.limit(0, 100) silently truncates the start set and will drop traversal results for larger inputs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`
around lines 2545 - 2551, The visitGraphLookup method in CalciteRelNodeVisitor
currently applies an unconditional hard-coded start-set limit via
builder.limit(0, 100) which truncates the graph lookup input; remove this
unconditional limit and instead respect a configurable setting or absence
thereof (e.g., read a graph lookup start-limit from the CalcitePlanContext or a
global config exposed to CalciteRelNodeVisitor such as
getGraphLookupStartLimit()), and only apply builder.limit when a positive
configured limit exists; update visitGraphLookup to use that config (or no limit
if not configured) and remove the silent 100-row truncation.
🧹 Nitpick comments (13)
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5060.yml (1)

15-27: Add pre-test cleanup to avoid index-already-exists flakiness.

If a prior run fails before teardown, indices.create will 400 and this test will abort. Consider deleting the index (ignoring 404) before creating it.

♻️ Suggested setup tweak
 setup:
   - do:
       query.settings:
         body:
           transient:
             plugins.calcite.enabled: true
+  - do:
+      indices.delete:
+        index: test_join_ad_error_5133
+        ignore: 404
   - do:
       indices.create:
         index: test_join_ad_error_5133
         body:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5060.yml`
around lines 15 - 27, The test currently always calls indices.create for index
"test_join_ad_error_5133", which can fail if a previous run left the index
behind; add a pre-test cleanup step that calls indices.delete for
"test_join_ad_error_5133" and ignores a 404 before the indices.create step so
the create will not 400; place this delete call immediately before the existing
indices.create block (use the same index name) and ensure the delete is
configured to ignore-not-found/404 errors.
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)

1100-1111: Verify that the permissive IGNORE + CHARACTER overload doesn't cause unintended type-check bypasses.

The new overload at lines 1108–1111 uses SqlTypeFamily.IGNORE for the first operand, meaning it will match any type (e.g., INTEGER, BOOLEAN) combined with a CHARACTER key. Since the existing overload (lines 1100–1107) is more specific and is checked first, this acts as a catch-all fallback — which is fine for struct/object field access. However, it also silently accepts semantically invalid combinations like INTEGER_FIELD["key"], deferring the failure to runtime.

If the intent is to support struct/object field access by name, consider restricting the first operand to SqlTypeFamily.ANY or documenting the rationale inline so future maintainers understand why IGNORE was chosen over a narrower family.

#!/bin/bash
# Find all usages of INTERNAL_ITEM in the codebase to understand which types are passed as the first operand
rg -n --type=java -C3 'INTERNAL_ITEM'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java`
around lines 1100 - 1111, The permissive overload registered via
registerOperator(INTERNAL_ITEM, SqlStdOperatorTable.ITEM,
PPLTypeChecker.family(SqlTypeFamily.IGNORE, SqlTypeFamily.CHARACTER)) can match
any left-hand type and silently defer invalid cases to runtime; change the first
operand family from SqlTypeFamily.IGNORE to a narrower family such as
SqlTypeFamily.ANY (or a more specific family representing structs/objects if
available) so that type-checking fails earlier, and add an inline comment near
the registerOperator call documenting why this family was chosen to prevent
future regressions.
core/src/test/java/org/opensearch/sql/executor/QueryServiceTest.java (1)

110-112: Prefer JUnit 5's assertInstanceOf over assertNotNull + assertTrue(e instanceof ...)

assertNotNull(e) is redundant: a null reference would already make the instanceof check return false, failing the assertTrue. JUnit 5's assertInstanceOf consolidates both checks and produces a descriptive failure message (e.g., "expected instance of X but was null").

♻️ Proposed refactor
-assertNotNull(e);
-assertTrue(e instanceof UnsupportedOperationException);
-assertTrue(e.getMessage().contains("Calcite error"));
+UnsupportedOperationException uoe = assertInstanceOf(UnsupportedOperationException.class, e);
+assertTrue(uoe.getMessage().contains("Calcite error"));
-assertNotNull(e);
-assertTrue(e instanceof org.opensearch.sql.exception.CalciteUnsupportedException);
-assertTrue(e.getCause() instanceof AssertionError);
-assertTrue(e.getMessage().contains("Calcite assertion failed"));
+org.opensearch.sql.exception.CalciteUnsupportedException cue =
+    assertInstanceOf(org.opensearch.sql.exception.CalciteUnsupportedException.class, e);
+assertInstanceOf(AssertionError.class, cue.getCause());
+assertTrue(cue.getMessage().contains("Calcite assertion failed"));

Also applies to: 139-141, 168-172, 197-201

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/org/opensearch/sql/executor/QueryServiceTest.java` around
lines 110 - 112, Replace the redundant null + instanceof checks in
QueryServiceTest by using JUnit 5's assertInstanceOf: for each occurrence (e.g.,
the block using assertNotNull(e); assertTrue(e instanceof
UnsupportedOperationException); assertTrue(e.getMessage().contains("Calcite
error")) in QueryServiceTest), remove assertNotNull and the instanceof assert
and call assertInstanceOf(UnsupportedOperationException.class, e) instead
(optionally assign the returned value to a variable to assert on the message),
then keep the existing message assertion (assertTrue on
e.getMessage().contains(...)) or use the assigned instance for that check; apply
the same change to the other similar blocks mentioned (lines around 139-141,
168-172, 197-201).
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5114.yml (1)

54-66: Consider adding datarows assertion — the sort is deterministic for this dataset

All id values (1–6) are positive, so abs(id) + 1 == id + 1 for every document. The sorted output of fields id | head 5 is fully deterministic: [[1],[2],[3],[4],[5]]. Adding a datarows assertion (mirroring the control test at line 97) would make this a stronger regression test — confirming not only that head is preserved but also that the rows are correct.

💡 Suggested addition
   - match: { total: 5 }
   - match: { schema: [ { name: id, type: int } ] }
+  - match: { datarows: [ [ 1 ], [ 2 ], [ 3 ], [ 4 ], [ 5 ] ] }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5114.yml`
around lines 54 - 66, Add a datarows assertion to the "Issue 5114: head should
be preserved for non-order-equivalent deterministic sort expression" test to
verify the exact rows returned by the query `source=issue5114 | eval a = abs(id)
+ 1 | sort a | fields id | head 5`; specifically assert datarows:
[[1],[2],[3],[4],[5]] so the test confirms deterministic ordering in addition to
total and schema checks.
core/src/main/java/org/opensearch/sql/ast/tree/NoMv.java (2)

10-10: @Nullable import source is inconsistent with the rest of the module.

SPath.java (the closest analogous file in this package) uses org.checkerframework.checker.nullness.qual.Nullable. Using javax.annotation.Nullable here diverges from that convention. Consider aligning to the project standard.

♻️ Proposed fix
-import javax.annotation.Nullable;
+import org.checkerframework.checker.nullness.qual.Nullable;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/opensearch/sql/ast/tree/NoMv.java` at line 10, Replace
the inconsistent nullable import in NoMv.java: change the import of
javax.annotation.Nullable to use
org.checkerframework.checker.nullness.qual.Nullable (the same source used in
SPath.java) and update any `@Nullable` annotations in the NoMv class to rely on
that Checker Framework annotation to keep nullability imports consistent across
the module.

37-40: Missing @Override on attach() — inconsistent with sibling AST nodes.

MvExpand.attach() (and other AST node classes in this package) annotate the attach() override with @Override. Without it the compiler won't catch accidental signature drift.

♻️ Proposed fix
+  `@Override`
   public NoMv attach(UnresolvedPlan child) {
     this.child = child;
     return this;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/opensearch/sql/ast/tree/NoMv.java` around lines 37 -
40, The attach() method in NoMv should be annotated with `@Override` to match
sibling AST nodes like MvExpand.attach() and ensure the compiler catches
signature drift; update the NoMv.attach(UnresolvedPlan child) method to add the
`@Override` annotation (keeping the same parameter type UnresolvedPlan and return
type NoMv) so it properly overrides the superclass/interface method.
core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java (1)

481-489: Javadoc @param/@return tags are escaped as literal text.

The backtick-wrapped `@param` and `@return` won't be parsed by Javadoc tooling. They should be standard Javadoc tags.

📝 Proposed fix
   /**
    * Build an MVEXPAND plan node and attach it to the input plan.
    *
-   * <p>`@param` input input plan `@param` field field to expand `@param` limit optional
-   * per-document limit `@return` MvExpand plan attached to the input
+   * `@param` input input plan
+   * `@param` field field to expand
+   * `@param` limit optional per-document limit
+   * `@return` MvExpand plan attached to the input
    */
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java` around lines 481 -
489, The Javadoc for mvexpand incorrectly uses backtick-escaped literals like
`@param` and `@return`; update the comment on the mvexpand(UnresolvedPlan input,
Field field, Integer limit) method (and mention of
MvExpand/UnresolvedPlan/Field) to use proper Javadoc tags: add real `@param` tags
for input, field, and limit and a proper `@return` describing the returned
UnresolvedPlan (MvExpand attached to input), ensuring each tag is on its own
line directly under the summary.
ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java (1)

523-531: Inconsistent field anonymization approach compared to sibling visitors.

visitMvExpand uses MASK_COLUMN directly, while the adjacent visitNoMv, visitExpand, and visitMvCombine all use visitExpression(node.getField()). The output is equivalent for simple fields, but visitExpression correctly handles meta-fields (e.g., _idmeta_identifier). For consistency and correctness with meta-fields:

♻️ Proposed fix
   `@Override`
   public String visitMvExpand(MvExpand node, String context) {
     String child = node.getChild().get(0).accept(this, context);
-    String field = MASK_COLUMN; // Always anonymize field names
+    String field = visitExpression(node.getField());
     if (node.getLimit() != null) {
       return StringUtils.format("%s | mvexpand %s limit=%s", child, field, MASK_LITERAL);
     }
     return StringUtils.format("%s | mvexpand %s", child, field);
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java`
around lines 523 - 531, visitMvExpand currently hardcodes field anonymization
using MASK_COLUMN which diverges from visitNoMv/visitExpand/visitMvCombine that
call visitExpression(node.getField()) to properly handle meta-fields; update
visitMvExpand to compute the anonymized field via
visitExpression(node.getField()) and use that variable in both the limit and
non-limit return paths (keep MASK_LITERAL usage for limit value), ensuring
meta-field names like _id are converted to their masked forms consistently.
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1)

646-712: Good comprehensive test coverage for graphLookup anonymization.

The 8 scenarios cover the key parameter combinations well. One minor gap: there's no test case exercising the usePIT=true anonymization path (the anonymizer conditionally appends usePIT=true at lines 253-255 of PPLQueryDataAnonymizer), though supportArray and batchMode each have dedicated cases.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java`
around lines 646 - 712, Add a test in PPLQueryDataAnonymizerTest that exercises
the anonymizer path which appends usePIT=true; call anonymize with a graphLookup
clause including "usePIT=true" (similar to other cases) and assert the output
contains "usePIT=true" in the anonymized graphlookup string, referencing the
anonymize helper used across this file so the test triggers the branch in
PPLQueryDataAnonymizer (the code around lines 253-255 that conditionally appends
usePIT=true).
integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java (1)

689-693: Prefer the shared test constant for the MVEXPAND index name.
Keeps the index name centralized and avoids drift if the constant changes.

♻️ Suggested update
-    MVEXPAND_EDGE_CASES(
-        "mvexpand_edge_cases",
+    MVEXPAND_EDGE_CASES(
+        TestsConstants.TEST_INDEX_MVEXPAND_EDGE_CASES,
         "mvexpand_edge_cases",
         getMappingFile("mvexpand_edge_cases_mapping.json"),
         "src/test/resources/mvexpand_edge_cases.json"),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java`
around lines 689 - 693, Replace the hard-coded index name "mvexpand_edge_cases"
in the MVEXPAND_EDGE_CASES enum entry with the project's shared test constant
for the MVEXPAND index (use the centralized test constant from the test
constants/class used across tests) so the enum MVEXPAND_EDGE_CASES(...) uses
that constant for the index name; update the second constructor argument
(currently the string literal) to reference the shared constant so index names
remain centralized and consistent with other tests.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java (1)

262-290: Deduplicate identical all‑null array tests.

testNoMvWithAllNulls and testNoMvArrayWithAllNulls are the same query/assertions, so one can be removed or repurposed to cover a distinct case.

♻️ Suggested cleanup
-  `@Test`
-  public void testNoMvArrayWithAllNulls() throws IOException {
-    String q =
-        "source="
-            + TEST_INDEX_BANK_WITH_NULL_VALUES
-            + " | where account_number = 25 | eval arr = array(age, age, age) | nomv arr | fields"
-            + " account_number, arr";
-
-    JSONObject result = executeQuery(q);
-
-    verifySchema(result, schema("account_number", null, "bigint"), schema("arr", null, "string"));
-
-    verifyDataRows(result, rows(25, ""));
-  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java`
around lines 262 - 290, Two test methods in
CalciteNoMvCommandIT—testNoMvWithAllNulls and testNoMvArrayWithAllNulls—are
identical; remove or repurpose one to avoid duplication. Locate the two test
methods (testNoMvWithAllNulls, testNoMvArrayWithAllNulls) and either delete one
of them or change its query/assertions to cover a different scenario (e.g.,
mixed nulls or single-element null array) so each test verifies a unique
behavior while keeping verifySchema/verifyDataRows usage consistent.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

2545-2604: Missing JavaDoc on visitGraphLookup.

Per coding guidelines, public methods in core/src/main/java/**/*.java must have JavaDoc with @param, @return, and @throws. The other new visitor methods (visitNoMv, visitMvExpand) have proper JavaDoc, but visitGraphLookup does not. As per coding guidelines: "Public methods MUST have JavaDoc with @param, @return, and @throws".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`
around lines 2545 - 2604, Add JavaDoc to the public method
visitGraphLookup(GraphLookup node, CalcitePlanContext context) describing its
behavior and include `@param` tags for "node" and "context", an `@return` tag
describing the returned RelNode, and `@throws` tags for any runtime/checked
exceptions the method may propagate (e.g., exceptions from rexVisitor.analyze(),
analyze(), or builder.build()). Ensure the JavaDoc style and wording match
existing visitor methods like visitNoMv and visitMvExpand so it conforms to the
project's Javadoc conventions.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java (1)

168-177: Misleading comment: "partial results" but expected output is empty {}

The comment on line 174 states "Malformed JSON returns partial results parsed before the error", but the expected result is {} — an empty map meaning nothing was successfully extracted. "Partial results" implies some fields were parsed; that description contradicts the assertion. Consider tightening the comment, e.g. "Malformed JSON: parser abandons extraction, returns empty map".

✏️ Suggested comment fix
-    // Malformed JSON returns partial results parsed before the error
+    // Malformed JSON: parser cannot complete extraction and returns an empty map
     verifySchema(result, schema("result", "struct"));
     verifyDataRows(result, rows(new JSONObject("{}")));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java`
around lines 168 - 177, The inline comment in testSpathAutoExtractMalformedJson
incorrectly says "partial results parsed before the error" while the assertion
expects an empty map; update the comment near the call to executeQuery / the
test method testSpathAutoExtractMalformedJson to accurately describe behavior
(e.g., "Malformed JSON: parser abandons extraction and returns empty map {}") so
the comment matches the verifyDataRows expectation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/src/main/java/org/opensearch/sql/executor/QueryService.java`:
- Around line 67-83: Add unit tests for QueryService.propagateCalciteError to
cover the missing branches: (1) verify that passing a VirtualMachineError (e.g.,
new OutOfMemoryError()) causes propagateCalciteError to rethrow the
VirtualMachineError (test should assert the error is thrown and not passed to
the ResponseListener); (2) verify that passing an ExceptionInInitializerError
whose getException() returns an Exception results in listener.onFailure being
called with that underlying Exception (mock ResponseListener and assert
onFailure received the unwrapped Exception); (3) verify that passing an
ExceptionInInitializerError whose getException() returns a non-Exception (e.g.,
an Error) falls through to the final branch and results in listener.onFailure
being called with a CalciteUnsupportedException (assert the listener receives a
CalciteUnsupportedException and that its cause/message reflect the original
Throwable). Ensure tests reference propagateCalciteError via QueryService and
mock or spy ResponseListener to assert interactions.

In `@core/src/test/java/org/opensearch/sql/executor/QueryServiceTest.java`:
- Around line 94-121: Add boundary tests that pass Optional.empty() for the
calcite-error parameter to cover the no-prior-Calcite-error case: in
QueryServiceTest create equivalent test methods to the existing ones (e.g., for
executeWithLegacy and explainWithLegacy) that call
QueryService.executeWithLegacy and QueryService.explainWithLegacy with
Optional.empty() and assert the expected fallback behavior (whether it executes
V2, returns a specific error, or invokes onFailure with the V2 exception).
Reference the same mocked setup used in the current tests (analyzer,
executionEngine, planner, settings) and the existing responseListener assertions
but adjust expectations to match the empty-Optional path; add similar boundary
tests corresponding to the other test blocks noted (lines 123-151, 153-180,
182-210).
- Around line 116-120: Replace the lenient stub for settings/analyzer with a
strict stub and add an explicit verification that the legacy analysis path is
invoked: stop using lenient() for
settings.getSettingValue(Key.CALCITE_ENGINE_ENABLED) and change the analyzer
stub to throw the v2Exception via doThrow when analyzer.analyze(any(), any()) is
called, then after calling QueryService.executeWithLegacy(...) add
verify(analyzer).analyze(any(), any()) to assert the legacy analysis was
exercised; apply the same pattern to the other test blocks (around the other
mentioned ranges) and consider replacing the combined assertNotNull/instanceof
checks with assertInstanceOf and adding a test for Optional.empty() boundary
case.

In `@docs/user/ppl/cmd/mvexpand.md`:
- Around line 133-136: Update the "Expected output" example in
docs/user/ppl/cmd/mvexpand.md to use valid JSON (double-quoted keys and string
values) instead of Python-style single-quoted dicts; specifically replace the
Python-like "{'reason': ...}" object with a proper JSON object (e.g., {"reason":
"...", "details": "...", "type": "..."}) and ensure the subsequent error line
matches JSON formatting as needed so the example reflects actual OpenSearch JSON
error responses.

In `@docs/user/ppl/cmd/spath.md`:
- Around line 161-163: Example output is inconsistent: the row with
"tags":["python"] renders as scalar "python" while other rows show arrays like
"[java, sql]"; update the docs so example matches the documented flattening
rule. Either change the example's Row 2 to display [python] (so doc.tags{} shows
[python]) or update the auto-extract description to explicitly state
single-element arrays are flattened to scalars; adjust the example and the
auto-extract mode sentence that currently reads "arrays become strings like
'[java, sql]'" and ensure doc.tags{} examples and the flattening rule are
consistent.
- Around line 162-163: The example table in spath.md hides the difference
between a JSON null value and an absent key because both render as "null";
update the example to make the distinction explicit by adding an extra column or
an inline note that shows the underlying map value for doc.active (e.g., a
column that prints the raw map value or a cast/compare expression such as
doc.active IS NULL vs doc.active == "null"), and update the surrounding prose
that references doc.active, Line 38 and Line 173 behavior to call out that the
displayed "null" is visually identical but the map stores a string "null" when
the key existed with null versus a true absent key which returns null.
- Around line 148-152: The example uses a non-existent field doc_auto for the
spath input and also lists doc_auto in the fields clause; change the spath
invocation to use input=n and remove doc_auto from the fields list so the spath
call and fields reference the actual test-data field n (update the spath
parameter input=... and the fields clause to reference doc.user.name,
doc.user.age, doc.`tags{}`, doc.active instead of doc_auto).
- Around line 36-39: The docs for spath's flattening rules don't mention that in
auto-extract mode single-element arrays are unwrapped to scalars; update the
paragraph around the "Arrays use `{}` suffix" / "All values are stringified"
lines to add a rule stating that when auto-extract is enabled (spath
auto-extract mode), arrays with exactly one element are converted to the scalar
element (e.g., ["python"] -> "python") rather than preserved as a bracketed
string, and reference the example demonstrating this behavior (Example 5 Row 2)
so readers understand this exception to the array/stringification rule.

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java`:
- Around line 42-70: The comment about malformed JSON handling is misleading:
the test data uses the field "malformed_doc" in the Request autoExtractDoc
(index test_spath_auto) but the expected behavior is to return an empty JSON
object {} rather than "partial results parsed before the error"; update the
comment to state that malformed JSON yields an empty object, or if you intended
the former behavior adjust the test expectation/assertion accordingly; locate
the setup in CalcitePPLSpathCommandIT where autoExtractDoc and the
"malformed_doc" field are created and either change the comment text to describe
"malformed JSON returns empty result {}" or change the test assertions that
reference test_spath_auto/malformed_doc to expect partial parse if you actually
want that behavior.

In `@integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java`:
- Around line 259-273: Add a new NULL-input coverage case for the nomv function
by extending or adding a test in NewAddedCommandsIT (near
testNoMvUnsupportedInV2) that calls executeQuery with a query that sources the
mvexpand_edge_cases index (or the nullskills document) and applies "nomv" to a
field that is NULL to assert the expected behavior; update the test method or
add a new `@Test` (e.g., testNomvHandlesNullInput) to build the query string,
catch ResponseException the same way the existing tests do, and call
verifyQuery/result assertions to validate NULL handling for nomv.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`:
- Around line 542-551: The reset() implementation must also clear batch-mode
state so re-iteration can produce results; update reset() to reset any
batch-mode flags and iterators (e.g., set batchCompleted/batchModeCompleted =
false, clear currentBatch or batchIterator, reset batchIndex/offset counters)
and ensure close() likewise nulls/cleans batch resources; locate these symbols
in CalciteEnumerableGraphLookup (reset(), close(), current, and any fields named
batchCompleted, batchIterator, currentBatch, batchIndex or similar) and reset
them when reset()/close() is called.

In `@ppl/src/main/antlr/OpenSearchPPLParser.g4`:
- Around line 1728-1734: The searchableKeyWord rule currently lists only
FROM_FIELD, TO_FIELD, MAX_DEPTH, DEPTH_FIELD, DIRECTION, UNI, and BI; add the
missing graphLookup tokens START_FIELD, SUPPORT_ARRAY, BATCH_MODE, and USE_PIT
to searchableKeyWord so they are accepted as keywords that can be used as
identifiers via keywordsCanBeId. Update the OpenSearchPPLParser.g4
searchableKeyWord alternative list to include START_FIELD, SUPPORT_ARRAY,
BATCH_MODE, and USE_PIT alongside the existing tokens.

In `@ppl/src/test/java/org/opensearch/sql/ppl/utils/SPathRewriteTest.java`:
- Around line 80-86: Add tests in SPathRewriteTest to cover null/empty/error
cases for auto-extract mode: extend testSpathAutoExtractMode (or add new tests)
to assert behavior when the input field is null and when it is an empty string,
and assert that sp.rewriteAsEval() (the SPath auto-extract path using
json_extract_all) throws the expected exception type for malformed JSON; include
assertions for the exact exception class or message you expect (or adjust if the
parser throws a different exception) and use the same helpers used in the file
(plan(), relation("t"), let(), field("a"), function("json_extract_all",
field("a"))) so the tests exercise null, empty, and invalid-json/error paths.

---

Outside diff comments:
In `@core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java`:
- Around line 286-293: In resolveFieldAccess, the call to joinParts computes the
count of remaining parts incorrectly causing IndexOutOfBoundsException; change
the third argument so it uses the remaining parts count accounting for start
(i.e. replace parts.size() - length with parts.size() - start - length) so
itemName = joinParts(parts, length + start, parts.size() - start - length)
before calling createItemAccess(field, itemName, context).

In `@docs/category.json`:
- Around line 10-69: Add "user/ppl/cmd/graphlookup.md" to the ppl_cli_calcite
array in docs/category.json (the array named ppl_cli_calcite) so the new entry
appears alphabetically between "user/ppl/cmd/grok.md" and
"user/ppl/cmd/head.md"; ensure you insert it as a proper JSON string with
correct commas and preserve valid JSON formatting.

In `@scripts/build.sh`:
- Around line 71-81: The script hardcodes -Pcrypto.standard=FIPS-140-3 into all
Gradle invocations; add a CLI flag (e.g., -c/--crypto or CRYPTO_STANDARD) with a
default of empty and update getopts to populate CRYPTO_STANDARD, then only
append "-Pcrypto.standard=$CRYPTO_STANDARD" to the ./gradlew commands (assemble,
publishToMavenLocal, publishPluginZipPublicationToZipStagingRepository) if
CRYPTO_STANDARD is non-empty so callers can disable or change the crypto
standard and avoid FIPS-related checksum failures.

---

Duplicate comments:
In `@core/src/main/java/org/opensearch/sql/analysis/Analyzer.java`:
- Around line 552-565: The three new public visitor overrides (visitNoMv,
visitMvExpand, visitGraphLookup) are missing JavaDoc and unit tests; add JavaDoc
to each method following project pattern including `@param` AnalysisContext
context, `@return` LogicalPlan, and `@throws` for the Calcite-only exception, and
add unit tests (in the same commit) that invoke
Analyzer.visitNoMv/visitMvExpand/visitGraphLookup with a mock or real
NoMv/MvExpand/GraphLookup node and assert that getOnlyForCalciteException is
thrown (and that the exception message matches "nomv", "mvexpand", "graphlookup"
respectively) to cover the Calcite-only path.

In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 2545-2551: The visitGraphLookup method in CalciteRelNodeVisitor
currently applies an unconditional hard-coded start-set limit via
builder.limit(0, 100) which truncates the graph lookup input; remove this
unconditional limit and instead respect a configurable setting or absence
thereof (e.g., read a graph lookup start-limit from the CalcitePlanContext or a
global config exposed to CalciteRelNodeVisitor such as
getGraphLookupStartLimit()), and only apply builder.limit when a positive
configured limit exists; update visitGraphLookup to use that config (or no limit
if not configured) and remove the silent 100-row truncation.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`:
- Around line 433-434: The loop increments currentLevelDepth and compares it to
graphLookup.maxDepth, but when maxDepth is negative (e.g., -1 for unlimited) the
current check (++currentLevelDepth > graphLookup.maxDepth) will always break;
update the condition in CalciteEnumerableGraphLookup so the depth check is
skipped when graphLookup.maxDepth < 0 — e.g., only increment/compare when
graphLookup.maxDepth >= 0 (or guard the comparison with graphLookup.maxDepth >=
0) so unlimited depth is handled correctly by allowing the loop to continue.
- Around line 412-430: CalciteEnumerableGraphLookup currently only adds rows to
results when nextValues is non-empty, which drops matched rows when traversal
doesn't expand; always add the constructed row (use graphLookup.depthField to
decide between rowArray and rowWithDepth using rowArray/currentLevelDepth) to
results unconditionally, and keep the subsequent loop that iterates over
nextValues only for enqueuing unvisited non-null values into visitedNodes and
queue (do not gate the results.add(...) behind nextValues.isEmpty()).

---

Nitpick comments:
In `@core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java`:
- Around line 481-489: The Javadoc for mvexpand incorrectly uses
backtick-escaped literals like `@param` and `@return`; update the comment on the
mvexpand(UnresolvedPlan input, Field field, Integer limit) method (and mention
of MvExpand/UnresolvedPlan/Field) to use proper Javadoc tags: add real `@param`
tags for input, field, and limit and a proper `@return` describing the returned
UnresolvedPlan (MvExpand attached to input), ensuring each tag is on its own
line directly under the summary.

In `@core/src/main/java/org/opensearch/sql/ast/tree/NoMv.java`:
- Line 10: Replace the inconsistent nullable import in NoMv.java: change the
import of javax.annotation.Nullable to use
org.checkerframework.checker.nullness.qual.Nullable (the same source used in
SPath.java) and update any `@Nullable` annotations in the NoMv class to rely on
that Checker Framework annotation to keep nullability imports consistent across
the module.
- Around line 37-40: The attach() method in NoMv should be annotated with
`@Override` to match sibling AST nodes like MvExpand.attach() and ensure the
compiler catches signature drift; update the NoMv.attach(UnresolvedPlan child)
method to add the `@Override` annotation (keeping the same parameter type
UnresolvedPlan and return type NoMv) so it properly overrides the
superclass/interface method.

In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 2545-2604: Add JavaDoc to the public method
visitGraphLookup(GraphLookup node, CalcitePlanContext context) describing its
behavior and include `@param` tags for "node" and "context", an `@return` tag
describing the returned RelNode, and `@throws` tags for any runtime/checked
exceptions the method may propagate (e.g., exceptions from rexVisitor.analyze(),
analyze(), or builder.build()). Ensure the JavaDoc style and wording match
existing visitor methods like visitNoMv and visitMvExpand so it conforms to the
project's Javadoc conventions.

In
`@core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java`:
- Around line 1100-1111: The permissive overload registered via
registerOperator(INTERNAL_ITEM, SqlStdOperatorTable.ITEM,
PPLTypeChecker.family(SqlTypeFamily.IGNORE, SqlTypeFamily.CHARACTER)) can match
any left-hand type and silently defer invalid cases to runtime; change the first
operand family from SqlTypeFamily.IGNORE to a narrower family such as
SqlTypeFamily.ANY (or a more specific family representing structs/objects if
available) so that type-checking fails earlier, and add an inline comment near
the registerOperator call documenting why this family was chosen to prevent
future regressions.

In `@core/src/test/java/org/opensearch/sql/executor/QueryServiceTest.java`:
- Around line 110-112: Replace the redundant null + instanceof checks in
QueryServiceTest by using JUnit 5's assertInstanceOf: for each occurrence (e.g.,
the block using assertNotNull(e); assertTrue(e instanceof
UnsupportedOperationException); assertTrue(e.getMessage().contains("Calcite
error")) in QueryServiceTest), remove assertNotNull and the instanceof assert
and call assertInstanceOf(UnsupportedOperationException.class, e) instead
(optionally assign the returned value to a variable to assert on the message),
then keep the existing message assertion (assertTrue on
e.getMessage().contains(...)) or use the assigned instance for that check; apply
the same change to the other similar blocks mentioned (lines around 139-141,
168-172, 197-201).

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java`:
- Around line 262-290: Two test methods in
CalciteNoMvCommandIT—testNoMvWithAllNulls and testNoMvArrayWithAllNulls—are
identical; remove or repurpose one to avoid duplication. Locate the two test
methods (testNoMvWithAllNulls, testNoMvArrayWithAllNulls) and either delete one
of them or change its query/assertions to cover a different scenario (e.g.,
mixed nulls or single-element null array) so each test verifies a unique
behavior while keeping verifySchema/verifyDataRows usage consistent.

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java`:
- Around line 168-177: The inline comment in testSpathAutoExtractMalformedJson
incorrectly says "partial results parsed before the error" while the assertion
expects an empty map; update the comment near the call to executeQuery / the
test method testSpathAutoExtractMalformedJson to accurately describe behavior
(e.g., "Malformed JSON: parser abandons extraction and returns empty map {}") so
the comment matches the verifyDataRows expectation.

In `@integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java`:
- Around line 689-693: Replace the hard-coded index name "mvexpand_edge_cases"
in the MVEXPAND_EDGE_CASES enum entry with the project's shared test constant
for the MVEXPAND index (use the centralized test constant from the test
constants/class used across tests) so the enum MVEXPAND_EDGE_CASES(...) uses
that constant for the index name; update the second constructor argument
(currently the string literal) to reference the shared constant so index names
remain centralized and consistent with other tests.

In `@integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5060.yml`:
- Around line 15-27: The test currently always calls indices.create for index
"test_join_ad_error_5133", which can fail if a previous run left the index
behind; add a pre-test cleanup step that calls indices.delete for
"test_join_ad_error_5133" and ignores a 404 before the indices.create step so
the create will not 400; place this delete call immediately before the existing
indices.create block (use the same index name) and ensure the delete is
configured to ignore-not-found/404 errors.

In `@integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5114.yml`:
- Around line 54-66: Add a datarows assertion to the "Issue 5114: head should be
preserved for non-order-equivalent deterministic sort expression" test to verify
the exact rows returned by the query `source=issue5114 | eval a = abs(id) + 1 |
sort a | fields id | head 5`; specifically assert datarows:
[[1],[2],[3],[4],[5]] so the test confirms deterministic ordering in addition to
total and schema checks.

In `@ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java`:
- Around line 523-531: visitMvExpand currently hardcodes field anonymization
using MASK_COLUMN which diverges from visitNoMv/visitExpand/visitMvCombine that
call visitExpression(node.getField()) to properly handle meta-fields; update
visitMvExpand to compute the anonymized field via
visitExpression(node.getField()) and use that variable in both the limit and
non-limit return paths (keep MASK_LITERAL usage for limit value), ensuring
meta-field names like _id are converted to their masked forms consistently.

In
`@ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java`:
- Around line 646-712: Add a test in PPLQueryDataAnonymizerTest that exercises
the anonymizer path which appends usePIT=true; call anonymize with a graphLookup
clause including "usePIT=true" (similar to other cases) and assert the output
contains "usePIT=true" in the anonymized graphlookup string, referencing the
anonymize helper used across this file so the test triggers the branch in
PPLQueryDataAnonymizer (the code around lines 253-255 that conditionally appends
usePIT=true).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5eadcc5 and ef94e4f.

📒 Files selected for processing (64)
  • async-query-core/build.gradle
  • build.gradle
  • common/build.gradle
  • core/src/main/java/org/opensearch/sql/analysis/Analyzer.java
  • core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java
  • core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java
  • core/src/main/java/org/opensearch/sql/ast/tree/NoMv.java
  • core/src/main/java/org/opensearch/sql/ast/tree/SPath.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java
  • core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
  • core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractAllFunctionImpl.java
  • core/src/test/java/org/opensearch/sql/executor/QueryServiceTest.java
  • core/src/test/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImplTest.java
  • core/src/test/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractAllFunctionImplTest.java
  • datasources/build.gradle
  • docs/category.json
  • docs/user/ppl/cmd/mvexpand.md
  • docs/user/ppl/cmd/nomv.md
  • docs/user/ppl/cmd/spath.md
  • docs/user/ppl/index.md
  • doctest/test_data/structured.json
  • doctest/test_mapping/structured.json
  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/standalone/JsonExtractAllFunctionIT.java
  • integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java
  • integ-test/src/test/java/org/opensearch/sql/legacy/TestsConstants.java
  • integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java
  • integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java
  • integ-test/src/test/java/org/opensearch/sql/security/CrossClusterTestBase.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_issue_5114_sort_expr_head_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_nomv.yaml
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_mvexpand.yaml
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_nomv.yaml
  • integ-test/src/test/resources/indexDefinitions/mvexpand_edge_cases_mapping.json
  • integ-test/src/test/resources/mvexpand_edge_cases.json
  • integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5060.yml
  • integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5114.yml
  • legacy/build.gradle
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortExprIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java
  • ppl/build.gradle
  • ppl/src/main/antlr/OpenSearchPPLLexer.g4
  • ppl/src/main/antlr/OpenSearchPPLParser.g4
  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
  • ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLFunctionTypeTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLMvExpandTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLNoMvTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLSpathTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/SPathRewriteTest.java
  • scripts/build.sh
  • sql/build.gradle
✅ Files skipped from review due to trivial changes (4)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml
  • integ-test/src/test/resources/indexDefinitions/mvexpand_edge_cases_mapping.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_nomv.yaml
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_nomv.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • integ-test/src/test/java/org/opensearch/sql/legacy/TestsConstants.java
  • docs/user/ppl/index.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (5)
ppl/src/main/antlr/OpenSearchPPLParser.g4 (1)

1728-1734: Add START_FIELD, SUPPORT_ARRAY, BATCH_MODE, and USE_PIT to searchableKeyWord.

Only 7 of the 11 new graphLookup tokens are included in searchableKeyWord. The missing four (START_FIELD, SUPPORT_ARRAY, BATCH_MODE, USE_PIT) cannot be used as field identifiers, which is inconsistent with how the other graphLookup tokens are handled.

Proposed fix
    | FROM_FIELD
    | TO_FIELD
    | MAX_DEPTH
    | DEPTH_FIELD
    | DIRECTION
    | UNI
    | BI
+   | START_FIELD
+   | SUPPORT_ARRAY
+   | BATCH_MODE
+   | USE_PIT
    ;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ppl/src/main/antlr/OpenSearchPPLParser.g4` around lines 1728 - 1734, The
parser rule searchableKeyWord is missing four graphLookup tokens (START_FIELD,
SUPPORT_ARRAY, BATCH_MODE, USE_PIT) so they cannot be used as field identifiers;
update the searchableKeyWord alternative list in OpenSearchPPLParser.g4 (the
searchableKeyWord rule) to include START_FIELD, SUPPORT_ARRAY, BATCH_MODE, and
USE_PIT alongside FROM_FIELD, TO_FIELD, MAX_DEPTH, DEPTH_FIELD, DIRECTION, UNI,
and BI so these tokens are treated as searchable identifiers.
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java (3)

542-546: ⚠️ Potential issue | 🟡 Minor

reset() does not clear batchModeCompleted, making re-iteration in batch mode impossible.

After reset(), batchModeCompleted remains true, so moveNextBatchMode() immediately returns false.

🛠️ Proposed fix
   `@Override`
   public void reset() {
     sourceEnumerator.reset();
     current = null;
+    batchModeCompleted = false;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`
around lines 542 - 546, The reset() method currently resets sourceEnumerator and
current but fails to clear the batchModeCompleted flag, so subsequent iterations
in batch mode immediately stop; update reset() to also set batchModeCompleted =
false (or otherwise reinitialize batchModeCompleted) so moveNextBatchMode() can
run again after reset() — locate the reset() method in
CalciteEnumerableGraphLookup and update it alongside the existing references to
sourceEnumerator, current, and the batchModeCompleted flag.

433-433: ⚠️ Potential issue | 🔴 Critical

maxDepth = -1 (unlimited) is broken — BFS stops after the first level.

When maxDepth is -1 (documented as unlimited), ++currentLevelDepth > graphLookup.maxDepth evaluates to 1 > -1true, breaking immediately after one level.

🛠️ Proposed fix
-        if (++currentLevelDepth > graphLookup.maxDepth) break;
+        currentLevelDepth++;
+        if (graphLookup.maxDepth >= 0 && currentLevelDepth > graphLookup.maxDepth) {
+          break;
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`
at line 433, The BFS depth check in CalciteEnumerableGraphLookup incorrectly
stops when graphLookup.maxDepth == -1 (unlimited); change the condition in the
loop to only compare and potentially break when maxDepth is non-negative (e.g.
if (graphLookup.maxDepth >= 0 && ++currentLevelDepth > graphLookup.maxDepth)
break;) so that a negative maxDepth (like -1) is treated as unlimited; update
the check that uses currentLevelDepth and graphLookup.maxDepth accordingly.

398-430: ⚠️ Potential issue | 🔴 Critical

Leaf nodes are silently dropped from BFS results.

When all fromField values of a matched document are already visited, nextValues is empty and the row is discarded (line 413). This loses valid reachable documents that happen to not expand the frontier further — e.g., at the maximum depth level or in converging graph structures.

The row should always be added to results; nextValues should only control queue expansion.

🛠️ Proposed fix
-          // Add row to results if the nextValues is not empty
-          if (!nextValues.isEmpty()) {
-            if (graphLookup.depthField != null) {
-              Object[] rowWithDepth = new Object[rowArray.length + 1];
-              System.arraycopy(rowArray, 0, rowWithDepth, 0, rowArray.length);
-              rowWithDepth[rowArray.length] = currentLevelDepth;
-              results.add(rowWithDepth);
-            } else {
-              results.add(rowArray);
-            }
-
-            // Add unvisited non-null values to queue for next level traversal
-            for (Object val : nextValues) {
-              if (val != null) {
-                visitedNodes.add(val);
-                queue.offer(val);
-              }
+          // Always add matched row to results
+          if (graphLookup.depthField != null) {
+            Object[] rowWithDepth = new Object[rowArray.length + 1];
+            System.arraycopy(rowArray, 0, rowWithDepth, 0, rowArray.length);
+            rowWithDepth[rowArray.length] = currentLevelDepth;
+            results.add(rowWithDepth);
+          } else {
+            results.add(rowArray);
+          }
+
+          // Add unvisited non-null values to queue for next level traversal
+          for (Object val : nextValues) {
+            if (val != null) {
+              visitedNodes.add(val);
+              queue.offer(val);
             }
           }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`
around lines 398 - 430, The loop over forwardResults currently drops rows when
collectValues produces an empty nextValues (thus losing leaf nodes); change the
logic in CalciteEnumerableGraphLookup so that each processed row (from
forwardResults) is always added to results (respecting graphLookup.depthField by
appending currentLevelDepth via the same rowWithDepth logic) and then use
nextValues only to decide which values to add to visitedNodes and queue for
further BFS expansion; keep existing uses of collectValues, visitedNodes, queue,
results, forwardResults, and graphLookup.depthField but move the
results.add(...) block out of the if (!nextValues.isEmpty()) conditional so
queue expansion and result collection are handled independently.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

2550-2551: Hard-coded 100-row cap on source rows already noted.
This matches a prior review concern; please address that feedback.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`
around lines 2550 - 2551, Replace the hard-coded limit call in
CalciteRelNodeVisitor (builder.limit(0, 100)) with a configurable source-row
cap: introduce or read a configuration/query option (e.g., getSourceRowLimit()
on the existing session/config object or a new constant
DEFAULT_SOURCE_ROW_LIMIT) and call builder.limit(0, sourceRowLimit) using that
value; ensure the value is validated (non-negative) and falls back to the
default when unset so the limit becomes configurable rather than fixed at 100.
🧹 Nitpick comments (4)
ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java (1)

1658-1728: Consider adding test cases for optional graphLookup parameters.

The positive tests cover fromField, toField, maxDepth, startField, depthField, and direction=bi. The error cases are solid (missing fromField, missing table, missing toField). However, per coding guidelines, parser tests should include edge cases and boundary conditions. Consider adding:

  1. A case with supportArray=true, batchMode=true, or filter=(...) to verify the AST builder correctly processes these options.
  2. A negative test for a missing as clause, since AS outputField is mandatory in the grammar rule.

As per coding guidelines: "Include boundary condition tests (min/max values, empty inputs) for all new functions" and "Include error condition tests (invalid inputs, exceptions) for all new functions."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java` around
lines 1658 - 1728, Add positive and negative graphLookup test cases in
testGraphLookupCommand: include at least one positive case using
GraphLookup.builder()/plan(...) that sets optional flags supportArray=true,
batchMode=true and a filter=(...) expression to assert the AST contains those
options, and include a negative assertThrows(SyntaxCheckException.class or
SemanticCheckException.class, () -> plan(...)) for a graphLookup invocation
missing the mandatory "as" clause to validate parser error handling; locate
tests around the existing testGraphLookupCommand method and add assertions
similar to the existing ones so they exercise parsing and AstBuilder behavior
for these options.
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1)

646-712: Good coverage of graphLookup anonymization variants.

The test covers basic, maxDepth, depthField, bidirectional, startField, supportArray, batchMode, filter, and compound filter scenarios. One option is missing: usePit. Consider adding a case for it for completeness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java`
around lines 646 - 712, Add a missing test case to testGraphLookup that verifies
anonymization of the graphLookup usePit option: call anonymize on a query like
"source=t | graphLookup employees fromField=manager toField=name usePit=true as
reportingHierarchy" and assert the expected anonymized string contains
"usePit=true" (e.g., "source=table | graphlookup table fromField=identifier
toField=identifier direction=uni usePit=true as identifier"); place the new
assert alongside the other graphLookup asserts in the testGraphLookup method and
use the existing anonymize helper to produce the output.
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java (1)

351-437: Unbounded BFS can exhaust heap on dense graphs.

results, visitedNodes, and queue grow without bound. Once the maxDepth = -1 bug is fixed to allow truly unlimited traversal, a dense graph could produce millions of results and OOM the node. The TODO at line 356 acknowledges this, but consider also:

  • Adding a configurable maxResults cap as a safety valve.
  • Adding a traversal timeout to prevent runaway queries.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`
around lines 351 - 437, performBfs can run unbounded and OOM on dense graphs;
add protective limits: introduce a configurable maxResults cap and a traversal
timeout check inside performBfs (use the existing results, visitedNodes, queue
and currentLevelDepth scope) and enforce them after each batch/queryLookupTable
call and before enqueuing new nodes; when maxResults is reached or the timeout
expires, stop traversal (break the loop), log a warning and return the partial
results (or throw a specific exception if caller should handle truncation), and
expose the new config fields (e.g., graphLookup.maxResults and a timeoutMillis)
so callers can tune safety limits.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

2545-2604: Extract helper methods to keep visitGraphLookup under 50 lines.
The new method is now >50 lines; consider extracting parameter parsing and RelNode construction helpers for readability and consistency.

As per coding guidelines: Flag methods >50 lines - Suggest extracting complex logic into helper methods.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`
around lines 2545 - 2604, The visitGraphLookup method exceeds 50 lines; extract
the parameter parsing and RelNode construction into small helpers to reduce
complexity: create a helper like parseGraphLookupParameters(GraphLookup node,
CalcitePlanContext context) that performs all Rex/Literal analysis and returns a
small DTO (startFieldName, fromFieldName, toFieldName, outputFieldName,
depthFieldName, maxDepthValue, bidirectional, supportArray, batchMode, usePIT,
filterRex), and a builder helper like buildGraphLookupRelNode(RelNode
sourceTable, RelNode lookupTable, ParsedGraphLookupParams params) that calls
LogicalGraphLookup.create(...) and pushes/returns the result; keep existing
calls to tryToRemoveMetaFields(context, true), visitChildren(node, context),
analyze(node.getFromTable(), context), rexVisitor.analyze(...) but move their
parsing logic into the new helpers and then have visitGraphLookup only
orchestrate the steps and delegate to parseGraphLookupParameters and
buildGraphLookupRelNode to bring visitGraphLookup under 50 lines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 2545-2546: Add a JavaDoc block to the public method
visitGraphLookup(GraphLookup node, CalcitePlanContext context) in class
CalciteRelNodeVisitor documenting both parameters with `@param`, the returned
RelNode with `@return`, and any runtime or checked exceptions the method can throw
using `@throws` (list the actual exceptions thrown by the implementation),
following the file's existing JavaDoc style; while here, refactor the long
method by extracting clear helpers such as extractGraphLookupParameters(),
buildSourceTable(), and buildLookupTable() to encapsulate parameter extraction
and table processing and update visitGraphLookup to call those helpers for
improved readability.
- Around line 2565-2568: The parser default for maxDepth should be changed from
Literal.ZERO to an explicit unbounded sentinel and the visitor should stop
coercing null to 0: update AstBuilder to set the default maxDepth to
Literal.of(-1) instead of Literal.ZERO, and in CalciteRelNodeVisitor (around
rexVisitor.analyze(node.getMaxDepth(), context) / node.getMaxDepth()) remove or
change the null-to-0 coercion (the line that does maxDepthValue = maxDepthValue
== null ? 0 : maxDepthValue) so that -1 is treated as the unbounded sentinel (or
simply assume non-null after the AstBuilder change); ensure references to
node.getMaxDepth, rexVisitor.analyze, and the maxDepthValue handling reflect the
new -1 semantics used by CalciteEnumerableGraphLookup.

In `@docs/user/ppl/index.md`:
- Line 87: Fix the markdown table row for the graphlookup command by adding a
single space before the closing pipe so it matches surrounding entries; locate
the row containing "[graphlookup command](cmd/graphlookup.md) | 3.6 |
experimental (since 3.6) | Performs recursive graph traversal on a collection
using a BFS algorithm." and ensure there is " ... BFS algorithm. |" (space
before final |) to keep table formatting consistent.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`:
- Around line 452-477: The current query logic in CalciteEnumerableGraphLookup
is excluding whole documents by adding mustNot(getQueryBuilder(...,
visitedValues)) when graphLookup.supportArray is false; remove these mustNot
exclusions (both in the forward query using fromFieldIdx/visitedValues and in
the bidirectional backQuery using toFieldIdx/visitedValues) so matched documents
are returned even if their expansion would revisit nodes, and rely on the
in-memory visitedNodes logic in performBfs to prevent re-queuing; update
getQueryBuilder usage accordingly (leave calls to getQueryBuilder(toFieldIdx,
values) and getQueryBuilder(fromFieldIdx, values) intact) and add a brief
comment near graphLookup.supportArray handling explaining that de-duplication is
handled in performBfs.
- Around line 190-242: The constructor GraphLookupEnumerator currently mutates a
shared CalciteEnumerableIndexScan (lookupScan) by calling
lookupScan.pushDownContext.add for LIMIT and FILTER, which will accumulate
across enumerator instances; to fix, stop mutating the shared RelNode in the
enumerator: either move the push-down logic from GraphLookupEnumerator to the
owning Enumerable (GraphLookupEnumerable.enumerator() or its initializer) so
pushdowns are applied once per RelNode, or clone/copy the lookupScan at the
start of GraphLookupEnumerator (create a fresh CalciteEnumerableIndexScan
instance or use the same copy strategy as queryLookupTable) and apply
lookupScan.pushDownContext.add to that copy only; ensure references to
graphLookup.usePIT, lookupScan.pushDownContext.add, graphLookup.filter,
PredicateAnalyzer.analyze, and queryLookupTable are used to locate and implement
the change.
- Around line 215-221: The code does not validate that
lookupFields.indexOf(graphLookup.fromField) and indexOf(graphLookup.toField)
return valid indices; add checks after computing fromFieldIdx and toFieldIdx in
CalciteEnumerableGraphLookup to ensure each is >= 0 (and optionally <
lookupFields.size()), and if not throw a clear exception (e.g.,
IllegalArgumentException or a domain-specific validation exception) including
the missing field name and available lookupFields; mirror the existing
validation pattern used for startFieldIndex so later array access at lines that
use fromFieldIdx and toFieldIdx cannot cause ArrayIndexOutOfBoundsException.

In `@ppl/src/main/antlr/OpenSearchPPLParser.g4`:
- Around line 648-663: The visitor for graphLookupCommand
(visitGraphLookupCommand) currently allows duplicate options to silently
overwrite; mirror the pattern used in visitBinCommand by adding a Set<String>
seenParams in visitGraphLookupCommand, for each graphLookupOption extract a
unique key (e.g., option token name like START_FIELD, FROM_FIELD, TO_FIELD,
MAX_DEPTH, DEPTH_FIELD, DIRECTION, SUPPORT_ARRAY, BATCH_MODE, USE_PIT, FILTER)
and call checkParamDuplication(seenParams, paramName) before inserting it into
the command map/structure so duplicates cause the same IllegalArgumentException
as in visitBinCommand.

---

Duplicate comments:
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 2550-2551: Replace the hard-coded limit call in
CalciteRelNodeVisitor (builder.limit(0, 100)) with a configurable source-row
cap: introduce or read a configuration/query option (e.g., getSourceRowLimit()
on the existing session/config object or a new constant
DEFAULT_SOURCE_ROW_LIMIT) and call builder.limit(0, sourceRowLimit) using that
value; ensure the value is validated (non-negative) and falls back to the
default when unset so the limit becomes configurable rather than fixed at 100.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`:
- Around line 542-546: The reset() method currently resets sourceEnumerator and
current but fails to clear the batchModeCompleted flag, so subsequent iterations
in batch mode immediately stop; update reset() to also set batchModeCompleted =
false (or otherwise reinitialize batchModeCompleted) so moveNextBatchMode() can
run again after reset() — locate the reset() method in
CalciteEnumerableGraphLookup and update it alongside the existing references to
sourceEnumerator, current, and the batchModeCompleted flag.
- Line 433: The BFS depth check in CalciteEnumerableGraphLookup incorrectly
stops when graphLookup.maxDepth == -1 (unlimited); change the condition in the
loop to only compare and potentially break when maxDepth is non-negative (e.g.
if (graphLookup.maxDepth >= 0 && ++currentLevelDepth > graphLookup.maxDepth)
break;) so that a negative maxDepth (like -1) is treated as unlimited; update
the check that uses currentLevelDepth and graphLookup.maxDepth accordingly.
- Around line 398-430: The loop over forwardResults currently drops rows when
collectValues produces an empty nextValues (thus losing leaf nodes); change the
logic in CalciteEnumerableGraphLookup so that each processed row (from
forwardResults) is always added to results (respecting graphLookup.depthField by
appending currentLevelDepth via the same rowWithDepth logic) and then use
nextValues only to decide which values to add to visitedNodes and queue for
further BFS expansion; keep existing uses of collectValues, visitedNodes, queue,
results, forwardResults, and graphLookup.depthField but move the
results.add(...) block out of the if (!nextValues.isEmpty()) conditional so
queue expansion and result collection are handled independently.

In `@ppl/src/main/antlr/OpenSearchPPLParser.g4`:
- Around line 1728-1734: The parser rule searchableKeyWord is missing four
graphLookup tokens (START_FIELD, SUPPORT_ARRAY, BATCH_MODE, USE_PIT) so they
cannot be used as field identifiers; update the searchableKeyWord alternative
list in OpenSearchPPLParser.g4 (the searchableKeyWord rule) to include
START_FIELD, SUPPORT_ARRAY, BATCH_MODE, and USE_PIT alongside FROM_FIELD,
TO_FIELD, MAX_DEPTH, DEPTH_FIELD, DIRECTION, UNI, and BI so these tokens are
treated as searchable identifiers.

---

Nitpick comments:
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 2545-2604: The visitGraphLookup method exceeds 50 lines; extract
the parameter parsing and RelNode construction into small helpers to reduce
complexity: create a helper like parseGraphLookupParameters(GraphLookup node,
CalcitePlanContext context) that performs all Rex/Literal analysis and returns a
small DTO (startFieldName, fromFieldName, toFieldName, outputFieldName,
depthFieldName, maxDepthValue, bidirectional, supportArray, batchMode, usePIT,
filterRex), and a builder helper like buildGraphLookupRelNode(RelNode
sourceTable, RelNode lookupTable, ParsedGraphLookupParams params) that calls
LogicalGraphLookup.create(...) and pushes/returns the result; keep existing
calls to tryToRemoveMetaFields(context, true), visitChildren(node, context),
analyze(node.getFromTable(), context), rexVisitor.analyze(...) but move their
parsing logic into the new helpers and then have visitGraphLookup only
orchestrate the steps and delegate to parseGraphLookupParameters and
buildGraphLookupRelNode to bring visitGraphLookup under 50 lines.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`:
- Around line 351-437: performBfs can run unbounded and OOM on dense graphs; add
protective limits: introduce a configurable maxResults cap and a traversal
timeout check inside performBfs (use the existing results, visitedNodes, queue
and currentLevelDepth scope) and enforce them after each batch/queryLookupTable
call and before enqueuing new nodes; when maxResults is reached or the timeout
expires, stop traversal (break the loop), log a warning and return the partial
results (or throw a specific exception if caller should handle truncation), and
expose the new config fields (e.g., graphLookup.maxResults and a timeoutMillis)
so callers can tune safety limits.

In `@ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java`:
- Around line 1658-1728: Add positive and negative graphLookup test cases in
testGraphLookupCommand: include at least one positive case using
GraphLookup.builder()/plan(...) that sets optional flags supportArray=true,
batchMode=true and a filter=(...) expression to assert the AST contains those
options, and include a negative assertThrows(SyntaxCheckException.class or
SemanticCheckException.class, () -> plan(...)) for a graphLookup invocation
missing the mandatory "as" clause to validate parser error handling; locate
tests around the existing testGraphLookupCommand method and add assertions
similar to the existing ones so they exercise parsing and AstBuilder behavior
for these options.

In
`@ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java`:
- Around line 646-712: Add a missing test case to testGraphLookup that verifies
anonymization of the graphLookup usePit option: call anonymize on a query like
"source=t | graphLookup employees fromField=manager toField=name usePit=true as
reportingHierarchy" and assert the expected anonymized string contains
"usePit=true" (e.g., "source=table | graphlookup table fromField=identifier
toField=identifier direction=uni usePit=true as identifier"); place the new
assert alongside the other graphLookup asserts in the testGraphLookup method and
use the existing anonymize helper to produce the output.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef94e4f and cd84d74.

📒 Files selected for processing (13)
  • core/src/main/java/org/opensearch/sql/analysis/Analyzer.java
  • core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • docs/user/ppl/index.md
  • integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java
  • integ-test/src/test/java/org/opensearch/sql/legacy/TestsConstants.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java
  • ppl/src/main/antlr/OpenSearchPPLLexer.g4
  • ppl/src/main/antlr/OpenSearchPPLParser.g4
  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
  • ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
  • ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
🚧 Files skipped from review as they are similar to previous changes (5)
  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
  • integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java
  • integ-test/src/test/java/org/opensearch/sql/legacy/TestsConstants.java
  • core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/analysis/Analyzer.java

Comment on lines +215 to +221
try {
List<String> sourceFields = graphLookup.getSource().getRowType().getFieldNames();
this.lookupFields = graphLookup.getLookup().getRowType().getFieldNames();
this.startFieldIndex = sourceFields.indexOf(graphLookup.getStartField());
this.fromFieldIdx = lookupFields.indexOf(graphLookup.fromField);
this.toFieldIdx = lookupFields.indexOf(graphLookup.toField);

Copy link
Contributor

@coderabbitai coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

No validation that fromFieldIdx / toFieldIdx resolve to valid indices.

lookupFields.indexOf(...) returns -1 if the field is not found. Unlike startFieldIndex (which is bounds-checked at lines 289/330), fromFieldIdx and toFieldIdx are used directly as array indices at lines 400 and 408, which would throw ArrayIndexOutOfBoundsException at runtime.

🛠️ Proposed fix
       this.fromFieldIdx = lookupFields.indexOf(graphLookup.fromField);
       this.toFieldIdx = lookupFields.indexOf(graphLookup.toField);
+      if (this.fromFieldIdx < 0) {
+        throw new IllegalArgumentException(
+            "fromField '" + graphLookup.fromField + "' not found in lookup schema: " + lookupFields);
+      }
+      if (this.toFieldIdx < 0) {
+        throw new IllegalArgumentException(
+            "toField '" + graphLookup.toField + "' not found in lookup schema: " + lookupFields);
+      }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try {
List<String> sourceFields = graphLookup.getSource().getRowType().getFieldNames();
this.lookupFields = graphLookup.getLookup().getRowType().getFieldNames();
this.startFieldIndex = sourceFields.indexOf(graphLookup.getStartField());
this.fromFieldIdx = lookupFields.indexOf(graphLookup.fromField);
this.toFieldIdx = lookupFields.indexOf(graphLookup.toField);
try {
List<String> sourceFields = graphLookup.getSource().getRowType().getFieldNames();
this.lookupFields = graphLookup.getLookup().getRowType().getFieldNames();
this.startFieldIndex = sourceFields.indexOf(graphLookup.getStartField());
this.fromFieldIdx = lookupFields.indexOf(graphLookup.fromField);
this.toFieldIdx = lookupFields.indexOf(graphLookup.toField);
if (this.fromFieldIdx < 0) {
throw new IllegalArgumentException(
"fromField '" + graphLookup.fromField + "' not found in lookup schema: " + lookupFields);
}
if (this.toFieldIdx < 0) {
throw new IllegalArgumentException(
"toField '" + graphLookup.toField + "' not found in lookup schema: " + lookupFields);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`
around lines 215 - 221, The code does not validate that
lookupFields.indexOf(graphLookup.fromField) and indexOf(graphLookup.toField)
return valid indices; add checks after computing fromFieldIdx and toFieldIdx in
CalciteEnumerableGraphLookup to ensure each is >= 0 (and optionally <
lookupFields.size()), and if not throw a clear exception (e.g.,
IllegalArgumentException or a domain-specific validation exception) including
the missing field name and available lookupFields; mirror the existing
validation pattern used for startFieldIndex so later array access at lines that
use fromFieldIdx and toFieldIdx cannot cause ArrayIndexOutOfBoundsException.

👋 Leave emoji reaction (👍/👎) to track effectiveness of CodeRabbit.

Comment on lines +452 to +477
// Forward direction query
QueryBuilder query;
if (graphLookup.supportArray) {
// When supportArray is true, don't push down visited filter
// because array fields may contain multiple values that need to be checked individually
query = getQueryBuilder(toFieldIdx, values);
} else {
query =
boolQuery()
.must(getQueryBuilder(toFieldIdx, values))
.mustNot(getQueryBuilder(fromFieldIdx, visitedValues));
}

if (graphLookup.bidirectional) {
// Also query fromField for bidirectional traversal
QueryBuilder backQuery;
if (graphLookup.supportArray) {
backQuery = getQueryBuilder(fromFieldIdx, values);
} else {
backQuery =
boolQuery()
.must(getQueryBuilder(fromFieldIdx, values))
.mustNot(getQueryBuilder(toFieldIdx, visitedValues));
}
query = QueryBuilders.boolQuery().should(query).should(backQuery);
}
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

mustNot(fromField IN visitedValues) can exclude valid result documents.

When supportArray is false, the query excludes entire documents whose fromField value has already been visited (lines 460–462). This is an optimization to avoid re-traversing visited nodes, but it also prevents those documents from appearing in the result set at all. A document reachable via toField is a valid BFS result even if its fromField leads back to an already-visited node (i.e., a leaf or converging edge).

For MongoDB $graphLookup–compatible semantics, all matched documents should be returned regardless of whether they expand the frontier. Consider removing the mustNot exclusion (relying on the in-memory visitedNodes set in performBfs to prevent re-queueing) or documenting that supportArray=false intentionally trades result completeness for performance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`
around lines 452 - 477, The current query logic in CalciteEnumerableGraphLookup
is excluding whole documents by adding mustNot(getQueryBuilder(...,
visitedValues)) when graphLookup.supportArray is false; remove these mustNot
exclusions (both in the forward query using fromFieldIdx/visitedValues and in
the bidirectional backQuery using toFieldIdx/visitedValues) so matched documents
are returned even if their expansion would revisit nodes, and rely on the
in-memory visitedNodes logic in performBfs to prevent re-queuing; update
getQueryBuilder usage accordingly (leave calls to getQueryBuilder(toFieldIdx,
values) and getQueryBuilder(fromFieldIdx, values) intact) and add a brief
comment near graphLookup.supportArray handling explaining that de-duplication is
handled in performBfs.

👋 Leave emoji reaction (👍/👎) to track effectiveness of CodeRabbit.

@qianheng-aws qianheng-aws merged commit e30b2f8 into main Feb 25, 2026
135 of 138 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] Implement a graphLookup command to demonstrate multi-hop queries

3 participants