[Feature] Support bi-directional graph traversal command graphlookup#5138
[Feature] Support bi-directional graph traversal command graphlookup#5138qianheng-aws merged 10 commits intomainfrom
graphlookup#5138Conversation
* succeed to graph lookup single index Signed-off-by: Lantao Jin <ltjin@amazon.com> * Implement graph lookup RelNode Signed-off-by: Heng Qian <qianheng@amazon.com> * Refine - remove depth from BFS node Signed-off-by: Heng Qian <qianheng@amazon.com> * Support bidirectional mode Signed-off-by: Heng Qian <qianheng@amazon.com> * Support anonymize graph lookup Signed-off-by: Heng Qian <qianheng@amazon.com> * Fix UT Signed-off-by: Heng Qian <qianheng@amazon.com> * Add IT Signed-off-by: Heng Qian <qianheng@amazon.com> * Add limitation for GraphLookup Signed-off-by: Heng Qian <qianheng@amazon.com> * Simplify GraphLookup param names Signed-off-by: Heng Qian <qianheng@amazon.com> * Refine IT Signed-off-by: Heng Qian <qianheng@amazon.com> * Support value of list; Support retrieve circle edges also Signed-off-by: Heng Qian <qianheng@amazon.com> * Add documentation for graph lookup Signed-off-by: Heng Qian <qianheng@amazon.com> * Don't include loop edges Signed-off-by: Heng Qian <qianheng@amazon.com> * Refine code Signed-off-by: Heng Qian <qianheng@amazon.com> * spotlessApply Signed-off-by: Heng Qian <qianheng@amazon.com> * Refine code Signed-off-by: Heng Qian <qianheng@amazon.com> * Filter visited nodes in search query Signed-off-by: Heng Qian <qianheng@amazon.com> * Fix UT Signed-off-by: Heng Qian <qianheng@amazon.com> * Add parameter supportArray for handling fields with array values Signed-off-by: Heng Qian <qianheng@amazon.com> * Remove unused code Signed-off-by: Heng Qian <qianheng@amazon.com> * Support batch mode Signed-off-by: Heng Qian <qianheng@amazon.com> * Close lookup table scan Signed-off-by: Heng Qian <qianheng@amazon.com> * refine code Signed-off-by: Heng Qian <qianheng@amazon.com> * Add param usePIT Signed-off-by: Heng Qian <qianheng@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> Signed-off-by: Heng Qian <qianheng@amazon.com> Co-authored-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
* Struct return array value instead of string Signed-off-by: Heng Qian <qianheng@amazon.com> * Support filter in GraphLookup Signed-off-by: Heng Qian <qianheng@amazon.com> * Fix IT Signed-off-by: Heng Qian <qianheng@amazon.com> * Add experimental tag in doc Signed-off-by: Heng Qian <qianheng@amazon.com> --------- Signed-off-by: Heng Qian <qianheng@amazon.com>
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughIntroduces a new Changes
Sequence DiagramsequenceDiagram
participant User as User/Client
participant Parser as PPL Parser
participant AstBuilder as AstBuilder
participant Analyzer as Analyzer
participant CalciteRel as CalciteRelNodeVisitor
participant Planner as Planner/Rules
participant Engine as OpenSearchExecutionEngine
participant ES as OpenSearch
User->>Parser: graphLookup command (PPL)
Parser->>AstBuilder: parse graphLookupCommand context
AstBuilder->>AstBuilder: validate fromField, toField required
AstBuilder->>Parser: return GraphLookup AST node
Parser->>Analyzer: visit GraphLookup node
Analyzer->>Analyzer: delegate to Calcite-only path
Analyzer->>CalciteRel: LogicalPlan result
CalciteRel->>CalciteRel: extract graph parameters
CalciteRel->>CalciteRel: materialize source & lookup tables
CalciteRel->>CalciteRel: construct LogicalGraphLookup RelNode
CalciteRel->>Planner: LogicalGraphLookup plan
Planner->>Planner: match EnumerableGraphLookupRule
Planner->>Planner: extract OpenSearchIndex from lookup
Planner->>Planner: convert to CalciteEnumerableGraphLookup
Planner->>Engine: optimized executable plan
Engine->>Engine: scan source rows
Engine->>ES: query frontier values from lookup
ES-->>Engine: lookup results
Engine->>Engine: BFS traversal iteration
Engine->>ES: query next frontier (visited tracking)
ES-->>Engine: next level results
Engine->>Engine: accumulate results / apply batchMode
Engine->>User: return enriched rows with graph results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.41.0)core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javaComment |
Signed-off-by: Lantao Jin <ltjin@amazon.com>
bfd31e0 to
5eadcc5
Compare
7d49e0d to
5eadcc5
Compare
There was a problem hiding this comment.
Actionable comments posted: 11
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java (1)
14-23:⚠️ Potential issue | 🔴 CriticalStructImpl attributes should be recursively converted to handle nested structures.
StructImplmay contain nested maps, lists, or geo points. The current implementation at line 250 returns raw attributes without passing them throughprocessValue, unlike the recursive handling for Maps (line 245) and Lists (line 256). This can leak unconverted types.Replace the current logic with iteration and recursive conversion:
🛠️ Proposed fix
if (value instanceof StructImpl) { - return Arrays.asList(((StructImpl) value).getAttributes()); + Object[] attributes = ((StructImpl) value).getAttributes(); + List<Object> converted = new ArrayList<>(attributes.length); + for (Object attribute : attributes) { + converted.add(processValue(attribute)); + } + return converted; }Note:
ArrayListis already imported; only the logic needs updating.
🤖 Fix all issues with AI agents
In `@core/src/main/java/org/opensearch/sql/analysis/Analyzer.java`:
- Around line 550-553: Add missing JavaDoc to the public method
Analyzer.visitGraphLookup(GraphLookup node, AnalysisContext context) documenting
`@param` node, `@param` context, `@return` (LogicalPlan), and `@throws` for the
Calcite-only exception thrown by getOnlyForCalciteException("graphlookup"); then
add a unit test in the AnalyzerTest (or equivalent test class) that calls
visitGraphLookup with a mocked/constructed GraphLookup and AnalysisContext and
asserts that the getOnlyForCalciteException is thrown (verify exception type and
message contains "graphlookup"); ensure the test and JavaDoc are committed
together.
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 2534-2536: The hard-coded builder.limit(0, 100) in
CalciteRelNodeVisitor (using context.relBuilder / builder) unconditionally
truncates graphLookup sources; remove this unconditional limit or replace it
with a configurable cap: delete the builder.limit(0, 100) call (or wrap it
behind a configuration check), add a new planner/config option (e.g.,
graphLookupSourceRowLimit) accessed from the visitor/context, and only apply
builder.limit(0, limit) when that config is set to a positive value so the
default preserves full source rows.
In
`@core/src/main/java/org/opensearch/sql/calcite/plan/rel/LogicalGraphLookup.java`:
- Around line 129-146: The public method copy() in class LogicalGraphLookup
lacks the required JavaDoc and there are no accompanying unit tests for this new
logical node; add a JavaDoc block above copy() that documents the method purpose
and includes `@param` traitSet, `@param` inputs, `@return` RelNode and any `@throws` (if
applicable), then add unit tests for LogicalGraphLookup exercising copy(),
equality, and basic plan behavior (e.g., when given two inputs and various flags
like bidirectional/supportArray/batchMode) in the same commit to satisfy core
guidelines.
In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java`:
- Around line 115-152: Add boundary and invalid-parameter tests for graphLookup
by creating new test methods alongside testEmployeeHierarchyWithMaxDepth that
call executeQuery with invalid/edge values and assert proper validation errors:
(1) maxDepth=0 to verify minimum depth rejection, (2) negative maxDepth (e.g.,
-1) to verify numeric validation, (3) invalid direction value to verify
enum/domain validation, and (4) queries missing required fields (e.g., omit
startField or fromField) to verify required-field validation; use the same
helpers (executeQuery, verifySchema, verifyDataRows) where success is expected
and assert the returned JSONObject contains the appropriate error
messages/status when validation should fail.
- Around line 54-82: Add explicit NULL-input tests by creating two new test
methods in CalcitePPLGraphLookupIT: testEmployeeHierarchyNullStartValue and
testEmployeeHierarchyNullFieldParameters. For
testEmployeeHierarchyNullStartValue call executeQuery with the same graphLookup
but ensure it exercises documents whose startField value is NULL (use reportsTo
which already has a null row) and verify the reportingHierarchy for that row is
an empty array via verifyDataRows (use the existing rows/verify helpers). For
testEmployeeHierarchyNullFieldParameters call executeQuery with
invalid/empty/missing startField/fromField/toField names (e.g., empty string or
a non-existent field) and assert the current behavior: if the implementation
throws an exception assert that exception is raised, otherwise assert the query
returns rows with empty reportingHierarchy; reference the existing
testEmployeeHierarchyBasicTraversal, executeQuery, verifyDataRows, and schema
helpers to implement these checks.
- Around line 239-262: The graphLookup invocation in
testAirportConnectionsWithDepthField uses array-valued fields
(startField=connects and fromField=connects) but missing the supportArray=true
flag; update the query built in testAirportConnectionsWithDepthField (the
graphLookup operator call) to include supportArray=true so the graphLookup
optimization/visited-filter behavior is correct and consistent with other tests
that handle array-valued fields.
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`:
- Around line 371-429: The loop increments currentLevelDepth and unconditionally
compares it to graphLookup.maxDepth, but graphLookup.maxDepth uses -1 to mean
"unlimited", so change the break condition in CalciteEnumerableGraphLookup (the
BFS loop using currentLevelDepth and graphLookup.maxDepth) to only perform the
comparison when graphLookup.maxDepth is non-negative (e.g., if
(graphLookup.maxDepth >= 0 && ++currentLevelDepth > graphLookup.maxDepth)
break), ensuring negative maxDepth skips the depth check and preserves existing
increment semantics.
- Around line 393-425: The loop over forwardResults in
CalciteEnumerableGraphLookup currently only adds rows when nextValues is
non-empty, dropping matched rows that don't expand traversal; change it to
always add the current row to results (respecting graphLookup.depthField by
appending currentLevelDepth when needed) and then use nextValues solely to
enqueue new nodes and mark visitedNodes — i.e., move the results.add(...) /
rowWithDepth logic out of the if (!nextValues.isEmpty()) block so rows are
unconditionally preserved while still only offering non-null, unvisited
nextValues to queue.offer and visitedNodes.add.
In `@ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java`:
- Around line 1498-1555: The visitGraphLookupCommand method currently only
enforces fromField/toField but not startField, so add a validation check before
building the GraphLookup plan: if startField is null, throw a
SemanticCheckException with a clear message (e.g., "startField must be specified
for graphLookup"); update the same validation block where fromField/toField are
checked in visitGraphLookupCommand to include startField to ensure GraphLookup
(or any later use of startField) cannot receive a null value.
In
`@ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLGraphLookupTest.java`:
- Around line 132-148: Add null, boundary, and error-condition test cases to
CalcitePPLGraphLookupTest alongside the existing testGraphLookupBidirectional:
create new `@Test` methods (e.g., testGraphLookupNullStartField,
testGraphLookupMissingRequiredField, testGraphLookupInvalidDirection,
testGraphLookupMaxDepthZero) that call getRelNode with crafted PPL strings
(null/empty startField, omit required fields, invalid direction value, explicit
maxDepth=0) and assert expected failures or specific logical plans using
verifyLogical or assertThrows as appropriate; reuse the existing helper methods
getRelNode and verifyLogical to validate correct error handling and boundary
behavior.
- Around line 150-170: The current CalcitePPLGraphLookupTest.config(...) only
sets up the schema and logical plan inputs but lacks assertions for SQL
generation and optimization; add test cases that exercise Calcite's SQL
generation and optimizer for the graphLookup scenario by invoking the
planner/validator to produce the generated SQL and the optimized plan from the
same setup (use the test class CalcitePPLGraphLookupTest and its config method
to build the SchemaPlus with the "employee" EmployeeTable), then assert the
generated SQL contains the expected graphLookup/recursive patterns and assert
the optimized plan (planner output or Programs.heuristicJoinOrder result)
contains expected optimization transformations (e.g., join order or pushed
predicates); ensure these assertions are added alongside existing logical-plan
checks so the test covers SQL generation and optimization paths.
🟡 Minor comments (11)
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java-1816-1822 (1)
1816-1822:⚠️ Potential issue | 🟡 MinorAdd a null guard now that LiteralExpression is public.
With a public constructor, passing
nullwill fail later in less obvious places; fail fast here. Line 1820.🔧 Suggested fix
public LiteralExpression(RexLiteral literal) { - this.literal = literal; + this.literal = requireNonNull(literal, "literal"); }integ-test/src/test/resources/indexDefinitions/graph_travelers_index_mapping.json-1-12 (1)
1-12:⚠️ Potential issue | 🟡 MinorAddress schema mismatch and incomplete edge case coverage in test data.
The test data file (graph_travelers.json) contains an extra
indexfield not defined in the mapping. Additionally, while the data correctly includes null values fornearestAirport(3 of 6 documents), it lacks multi-valued entries to test array handling. Either update the mapping to include theindexfield or remove it from the test data, and add at least one document with an array value fornearestAirportto exercise boundary conditions.integ-test/src/test/resources/graph_travelers.json-1-6 (1)
1-6:⚠️ Potential issue | 🟡 MinorAdd an edge-case traveler for missing/unknown nearestAirport.
This dataset only includes valid single values, so null/missing start values aren’t exercised. Consider adding a record that lacks
nearestAirport(or sets it to null) to cover boundary behavior.➕ Suggested NDJSON addition
{"index":{"_id":"3"}} {"name":"Jeff","nearestAirport":"BOS"} +{"index":{"_id":"4"}} +{"name":"Ava","nearestAirport":null}As per coding guidelines, Ensure test data covers edge cases and boundary conditions.
core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java-480-482 (1)
480-482:⚠️ Potential issue | 🟡 MinorAdd JavaDoc for the new public visitor method.
The new public
visitGraphLookuplacks the required JavaDoc with@param/@return.✍️ Suggested JavaDoc
+ /** + * Visit GraphLookup node. + * + * `@param` node GraphLookup node + * `@param` context Context + * `@return` Return Type + */ public T visitGraphLookup(GraphLookup node, C context) { return visitChildren(node, context); }As per coding guidelines, Public methods MUST have JavaDoc with
@param,@return, and@throws.integ-test/src/test/resources/graph_airports.json-1-10 (1)
1-10:⚠️ Potential issue | 🟡 MinorAdd a leaf airport with an empty
connectslist.All airports currently have at least one connection; adding a no‑outbound node helps test traversal boundaries.
➕ Suggested NDJSON addition
{"index":{"_id":"5"}} {"airport":"LHR","connects":["PWM"]} +{"index":{"_id":"6"}} +{"airport":"SFO","connects":[]}As per coding guidelines, Ensure test data covers edge cases and boundary conditions.
integ-test/src/test/resources/graph_employees.json-1-12 (1)
1-12:⚠️ Potential issue | 🟡 MinorAdd a cyclic/self-loop record to cover visited-node edge cases.
The dataset is a simple chain; adding a self-loop (or small cycle) will exercise visited-node/loop-edge handling in graph traversal tests.
As per coding guidelines: Verify test data covers edge cases and boundary conditions; Verify test data is realistic and representative.💡 Suggested test data addition
@@ {"index":{"_id":"6"}} {"id":6,"name":"Dan","reportsTo":"Andrew"} +{"index":{"_id":"7"}} +{"id":7,"name":"Loop","reportsTo":"Loop"}docs/user/ppl/cmd/graphlookup.md-33-36 (1)
33-36:⚠️ Potential issue | 🟡 MinorFix typos/grammar in the parameter descriptions.
Spelling and hyphenation issues reduce clarity in the parameter table.
As per coding guidelines: Check for consistent formatting and style.✍️ Suggested edits
@@ -| `startField=<startField>` | Required | The field in the source documents whose value is used to start the recursive search. The value of this field is matched against `toField` in the lookup index. We support both single value and array values as starting points. | +| `startField=<startField>` | Required | The field in the source documents whose value is used to start the recursive search. The value of this field is matched against `toField` in the lookup index. We support both single-value and array values as starting points. | @@ -| `maxDepth=<maxDepth>` | Optional | The maximum recursion depth of hops. Default is `0`. A value of `0` means only the direct connections to the statr values are returned. A value of `1` means 1 hop connections (initial match plus one recursive step), and so on. | +| `maxDepth=<maxDepth>` | Optional | The maximum recursion depth of hops. Default is `0`. A value of `0` means only the direct connections to the start values are returned. A value of `1` means 1 hop connections (initial match plus one recursive step), and so on. |core/src/main/java/org/opensearch/sql/calcite/plan/rel/GraphLookup.java-109-190 (1)
109-190:⚠️ Potential issue | 🟡 MinorAdd JavaDoc tags for GraphLookup public APIs.
getSource/getLookup/copy/estimateRowCount/explainTerms are public but lack
@param/@return/@throws tags required by core guidelines.
As per coding guidelines: Public methods MUST have JavaDoc with@param,@return, and@throws.ppl/src/main/antlr/OpenSearchPPLParser.g4-94-95 (1)
94-95:⚠️ Potential issue | 🟡 MinorAdd missing graphLookup option keywords to searchableKeyWord.
START_FIELD, SUPPORT_ARRAY, BATCH_MODE, and USE_PIT are defined as lexer tokens and used in graphLookupOption (lines 636–651) but are not added to searchableKeyWord, unlike FROM_FIELD and TO_FIELD (lines 1716–1723). Since the lexer tokenizes them as keywords, field names with these values will fail or require quoting. Add all four to searchableKeyWord for consistency.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java-392-422 (1)
392-422:⚠️ Potential issue | 🟡 MinorUpdate Javadoc to match asserted behavior.
The comment says
allConnectionsis empty due to array type, but the assertions expect non-empty results. Please align the comment with the actual expectation.📝 Suggested comment update
- /** - * Test 12: Bidirectional airport connections for ORD. Note: Currently returns empty - * allConnections array because the connects field is an array type. - */ + /** + * Test 12: Bidirectional airport connections for ORD. + */integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java-666-700 (1)
666-700:⚠️ Potential issue | 🟡 MinorFix maxDepth comment mismatch.
The comment says
maxDepth=1, but the query usesmaxDepth=3. Please update the comment to reflect the actual parameter.📝 Suggested comment update
- * travelers' nearest airports: JFK (Dev, Eliot), BOS (Jeff) Unified BFS from {JFK, BOS} with - * maxDepth=1 finds connected airports. + * travelers' nearest airports: JFK (Dev, Eliot), BOS (Jeff) Unified BFS from {JFK, BOS} with + * maxDepth=3 finds connected airports.
🧹 Nitpick comments (4)
core/src/main/java/org/opensearch/sql/ast/tree/GraphLookup.java (1)
29-83: Prefer AST immutability by removing class-level@Setter.
attachalready mutates the child; exposingsetChildweakens immutability without clear need.As per coding guidelines: `core/src/main/java/org/opensearch/sql/ast/**/*.java` requires AST nodes to be immutable where possible.♻️ Suggested change
-import lombok.Setter; @@ -@Setter `@ToString`ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1)
646-712: Add anonymizer coverage for Mongo-style graphLookup andusePIT.The test only exercises
fromField/toFieldsyntax. The alternatestartWith/connectFromField/connectToFieldform and theusePITflag aren’t covered yet.✅ Suggested test additions
// graphLookup with compound filter assertEquals( "source=table | graphlookup table fromField=identifier toField=identifier" + " direction=uni filter=(identifier = *** and identifier > ***) as identifier", anonymize( "source=t | graphLookup employees fromField=manager toField=name" + " filter=(status = 'active' AND id > 2) as reportingHierarchy")); + + // graphLookup with Mongo-style syntax + assertEquals( + "source=table | graphlookup table startWith=*** connectFromField=identifier" + + " connectToField=identifier direction=uni as identifier", + anonymize( + "source=t | graphLookup employees startWith='CEO' connectFromField=manager" + + " connectToField=name as reportingHierarchy")); + + // graphLookup with usePIT + assertEquals( + "source=table | graphlookup table fromField=identifier toField=identifier" + + " direction=uni usePIT=true as identifier", + anonymize( + "source=t | graphLookup employees fromField=manager toField=name" + + " usePIT=true as reportingHierarchy"));ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java (1)
1648-1718: Add parser coverage for Mongo-style syntax and optional flags.The test validates required fields and some options, but doesn’t cover the
startWith/connectFromField/connectToFieldvariant (or flags likesupportArray,batchMode,usePIT). Adding at least one positive case for that syntax would better exercise the parser path.As per coding guidelines, Include edge cases and boundary conditions in parser and grammar tests.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
2530-2589: Extract visitGraphLookup into helper methods.This method is now >50 lines; splitting parameter extraction, source/lookup materialization, and RelNode construction would keep the visitor readable.
As per coding guidelines: core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java: Flag methods >50 lines - this file is known to be hard to read; Suggest extracting complex logic into helper methods.
| RelBuilder builder = context.relBuilder; | ||
| // TODO: Limit the number of source rows to 100 for now, make it configurable. | ||
| builder.limit(0, 100); |
There was a problem hiding this comment.
Remove the hard-coded 100-row cap on graphLookup sources.
The unconditional limit truncates the start set and will silently drop traversal results once there are more than 100 source rows. Please make this configurable or remove it to preserve correctness.
🛠️ Suggested change
- // TODO: Limit the number of source rows to 100 for now, make it configurable.
- builder.limit(0, 100);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| RelBuilder builder = context.relBuilder; | |
| // TODO: Limit the number of source rows to 100 for now, make it configurable. | |
| builder.limit(0, 100); | |
| RelBuilder builder = context.relBuilder; |
🤖 Prompt for AI Agents
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`
around lines 2534 - 2536, The hard-coded builder.limit(0, 100) in
CalciteRelNodeVisitor (using context.relBuilder / builder) unconditionally
truncates graphLookup sources; remove this unconditional limit or replace it
with a configurable cap: delete the builder.limit(0, 100) call (or wrap it
behind a configuration check), add a new planner/config option (e.g.,
graphLookupSourceRowLimit) accessed from the visitor/context, and only apply
builder.limit(0, limit) when that config is set to a positive value so the
default preserves full source rows.
👋 Leave emoji reaction (👍/👎) to track effectiveness of CodeRabbit.
core/src/main/java/org/opensearch/sql/calcite/plan/rel/LogicalGraphLookup.java
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java
Show resolved
Hide resolved
...h/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java
Show resolved
Hide resolved
...h/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java
Show resolved
Hide resolved
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLGraphLookupTest.java
Show resolved
Hide resolved
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLGraphLookupTest.java
Show resolved
Hide resolved
|
@penghuo can you review and merge this? |
docs/user/ppl/index.md
Outdated
| | [addcoltotals command](cmd/addcoltotals.md) | 3.5 | stable (since 3.5) | Adds column values and appends a totals row. | | ||
| | [transpose command](cmd/transpose.md) | 3.5 | stable (since 3.5) | Transpose rows to columns. | | ||
| | [mvcombine command](cmd/mvcombine.md) | 3.5 | stable (since 3.4) | Combines values of a specified field across rows identical on all other fields. | | ||
| | [graphlookup command](cmd/graphlookup.md) | 3.5 | experimental (since 3.5) | Performs recursive graph traversal on a collection using a BFS algorithm.| |
| The `graphLookup` command has the following syntax: | ||
|
|
||
| ```syntax | ||
| graphLookup <lookupIndex> startField=<startField> fromField=<fromField> toField=<toField> [maxDepth=<maxDepth>] [depthField=<depthField>] [direction=(uni | bi)] [supportArray=(true | false)] [batchMode=(true | false)] [usePIT=(true | false)] [filter=(<condition>)] as <outputField> |
| rows( | ||
| "JFK", | ||
| List.of("BOS", "ORD"), | ||
| List.of(List.of("JFK", List.of("BOS", "ORD")), List.of("BOS", List.of("JFK", "PWM"))))); |
There was a problem hiding this comment.
Corrected expectation would be:
rows("JFK", List.of("BOS", "ORD"),
List.of(
List.of("JFK", List.of("BOS", "ORD")),
List.of("BOS", List.of("JFK", "PWM")),
List.of("ORD", List.of("JFK"))))
The point is: ORD is discovered by the BFS at Level 1, but then line 408's if (!nextValues.isEmpty()) drops it because ORD's only connection (JFK) was already visited. It that dropping is intentional?
There was a problem hiding this comment.
Yes, it's intentional not to include loop edges in the result in the current implementation. And The edge ORD -> JFK is a loop edge, which won't contribute to discover new vertices. Actually this edge should be dropped in the search query, before line 408.
There are some reasons dong that:
- To exclude loop edges, we can add another filter
target no in visited nodesdirectly in our BFS search query(line 462 & 474). It will be more efficient and that somehow explains why we are faster than mongoDB. - If we really want to include loop edge in the results, we also have to distinguish that from duplicated edges based on unique id like
doc_id.
| // Get the source enumerator | ||
| if (graphLookup.getSource() instanceof Scannable scannable) { | ||
| Enumerable<?> sourceEnum = scannable.scan(); | ||
| this.sourceEnumerator = (Enumerator<@Nullable Object>) sourceEnum.enumerator(); |
There was a problem hiding this comment.
If an exception is thrown after line 209 but before the constructor finishes, the object is never fully constructed, so close() is never called by the caller. The sourceEnumerator leaks.
Signed-off-by: Heng Qian <qianheng@amazon.com>
…aphlookup # Conflicts: # docs/user/ppl/index.md # opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java
…hlookup # Conflicts: # core/src/main/java/org/opensearch/sql/analysis/Analyzer.java # core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java # docs/user/ppl/index.md # ppl/src/main/antlr/OpenSearchPPLParser.g4
There was a problem hiding this comment.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
scripts/build.sh (1)
71-81:⚠️ Potential issue | 🟠 MajorHardcoded
FIPS-140-3with no override mechanism will break Maven publishing in strict FIPS environments.Two interconnected concerns:
No CLI override: Every other tunable (VERSION, QUALIFIER, SNAPSHOT, OUTPUT) is exposed as a
getoptsflag.-Pcrypto.standard=FIPS-140-3is hardcoded on all three Gradle invocations with no way to disable or customize it, preventing non-FIPS builds or selective use.Gradle Maven publish incompatibility: Gradle publishes MD5 and SHA-1 checksum sidecars by default (no supported knob to disable them). When the JVM is configured for FIPS-140-3 with a strict FIPS provider (e.g., Oracle's JVM FIPS or Bouncy Castle FIPS), MD5 is disabled in approved mode. This causes
publishToMavenLocalandpublishPluginZipPublicationToZipStagingRepositoryon lines 80–81 to fail when attempting checksum generation—a documented issue in FIPS-enabled CI environments.Make crypto standard configurable with an optional flag and default to no crypto enforcement:
echo -e "-o OUTPUT\t[Optional] Output path, default is 'artifacts'." +echo -e "-c CRYPTO_STANDARD\t[Optional] Gradle crypto standard (e.g. FIPS-140-3). Omitted by default." echo -e "-h help"-while getopts ":h:v:q:s:o:p:a:" arg; do +while getopts ":h:v:q:s:o:p:a:c:" arg; do case $arg in + c) + CRYPTO_STANDARD=$OPTARG + ;;+[[ ! -z "$CRYPTO_STANDARD" ]] && CRYPTO_FLAG="-Pcrypto.standard=$CRYPTO_STANDARD" -./gradlew assemble ... -Pcrypto.standard=FIPS-140-3 +./gradlew assemble ... $CRYPTO_FLAG -./gradlew publishToMavenLocal ... -Pcrypto.standard=FIPS-140-3 +./gradlew publishToMavenLocal ... $CRYPTO_FLAG -./gradlew publishPluginZipPublicationToZipStagingRepository ... -Pcrypto.standard=FIPS-140-3 +./gradlew publishPluginZipPublicationToZipStagingRepository ... $CRYPTO_FLAG🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/build.sh` around lines 71 - 81, The script hardcodes -Pcrypto.standard=FIPS-140-3 into all Gradle invocations; add a CLI flag (e.g., -c/--crypto or CRYPTO_STANDARD) with a default of empty and update getopts to populate CRYPTO_STANDARD, then only append "-Pcrypto.standard=$CRYPTO_STANDARD" to the ./gradlew commands (assemble, publishToMavenLocal, publishPluginZipPublicationToZipStagingRepository) if CRYPTO_STANDARD is non-empty so callers can disable or change the crypto standard and avoid FIPS-related checksum failures.core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java (1)
286-293:⚠️ Potential issue | 🟡 MinorFix
IndexOutOfBoundsExceptioninresolveFieldAccesswhen accessing remaining path partsLine 291 calculates the count of remaining path parts incorrectly. The third parameter to
joinPartsshould account for thestartoffset:Current vs. corrected code
- String itemName = joinParts(parts, length + start, parts.size() - length); + String itemName = joinParts(parts, length + start, parts.size() - start - length);When
start=1andparts = ["alias", "field", "subfield"]with onlylength=1matched, the current code callsjoinParts(parts, 2, 2), which attempts to accessparts[3]—out of bounds. The corrected countparts.size() - start - lengthensures the range stays within valid indices.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java` around lines 286 - 293, In resolveFieldAccess, the call to joinParts computes the count of remaining parts incorrectly causing IndexOutOfBoundsException; change the third argument so it uses the remaining parts count accounting for start (i.e. replace parts.size() - length with parts.size() - start - length) so itemName = joinParts(parts, length + start, parts.size() - start - length) before calling createItemAccess(field, itemName, context).docs/category.json (1)
10-69:⚠️ Potential issue | 🟡 MinorAdd
graphlookup.mdto theppl_cli_calcitearray indocs/category.json.The file
docs/user/ppl/cmd/graphlookup.mdexists but is not registered indocs/category.json. It should be added to theppl_cli_calcitearray to be included in the doc integration test suite. Insert it in alphabetical order betweengrok.mdandhead.md.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/category.json` around lines 10 - 69, Add "user/ppl/cmd/graphlookup.md" to the ppl_cli_calcite array in docs/category.json (the array named ppl_cli_calcite) so the new entry appears alphabetically between "user/ppl/cmd/grok.md" and "user/ppl/cmd/head.md"; ensure you insert it as a proper JSON string with correct commas and preserve valid JSON formatting.
♻️ Duplicate comments (4)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java (2)
433-434:⚠️ Potential issue | 🟠 MajorHandle maxDepth = -1 (unlimited) correctly.
The current break condition stops after the first level when maxDepth is negative. Skip the depth check for negative values.🛠️ Suggested change
- if (++currentLevelDepth > graphLookup.maxDepth) break; + currentLevelDepth++; + if (graphLookup.maxDepth >= 0 && currentLevelDepth > graphLookup.maxDepth) { + break; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java` around lines 433 - 434, The loop increments currentLevelDepth and compares it to graphLookup.maxDepth, but when maxDepth is negative (e.g., -1 for unlimited) the current check (++currentLevelDepth > graphLookup.maxDepth) will always break; update the condition in CalciteEnumerableGraphLookup so the depth check is skipped when graphLookup.maxDepth < 0 — e.g., only increment/compare when graphLookup.maxDepth >= 0 (or guard the comparison with graphLookup.maxDepth >= 0) so unlimited depth is handled correctly by allowing the loop to continue.
412-430:⚠️ Potential issue | 🟠 MajorInclude matched rows even when traversal doesn’t expand.
Rows are only added whennextValuesis non-empty, which drops reachable docs when multiple starts or converging edges yield already-visited nodes. Add the row unconditionally; usenextValuesonly for queueing.🛠️ Suggested change
- // Add row to results if the nextValues is not empty - if (!nextValues.isEmpty()) { - if (graphLookup.depthField != null) { - Object[] rowWithDepth = new Object[rowArray.length + 1]; - System.arraycopy(rowArray, 0, rowWithDepth, 0, rowArray.length); - rowWithDepth[rowArray.length] = currentLevelDepth; - results.add(rowWithDepth); - } else { - results.add(rowArray); - } - - // Add unvisited non-null values to queue for next level traversal - for (Object val : nextValues) { - if (val != null) { - visitedNodes.add(val); - queue.offer(val); - } - } - } + // Always include the matched row; use nextValues only for expansion. + if (graphLookup.depthField != null) { + Object[] rowWithDepth = new Object[rowArray.length + 1]; + System.arraycopy(rowArray, 0, rowWithDepth, 0, rowArray.length); + rowWithDepth[rowArray.length] = currentLevelDepth; + results.add(rowWithDepth); + } else { + results.add(rowArray); + } + + // Add unvisited non-null values to queue for next level traversal + for (Object val : nextValues) { + if (val != null) { + visitedNodes.add(val); + queue.offer(val); + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java` around lines 412 - 430, CalciteEnumerableGraphLookup currently only adds rows to results when nextValues is non-empty, which drops matched rows when traversal doesn't expand; always add the constructed row (use graphLookup.depthField to decide between rowArray and rowWithDepth using rowArray/currentLevelDepth) to results unconditionally, and keep the subsequent loop that iterates over nextValues only for enqueuing unvisited non-null values into visitedNodes and queue (do not gate the results.add(...) behind nextValues.isEmpty()).core/src/main/java/org/opensearch/sql/analysis/Analyzer.java (1)
552-565:⚠️ Potential issue | 🟠 MajorAdd required JavaDoc and unit tests for the new Calcite‑only visitor methods.
These new public overrides lack JavaDoc and accompanying unit tests that assert the Calcite‑only exception path (nomv/mvexpand/graphlookup).
As per coding guidelines: Public methods MUST have JavaDoc with `@param`, `@return`, and `@throws`; New functions MUST have unit tests in the same commit.📚 Example JavaDoc pattern (apply similarly to each method)
+ /** + * Reject nomv in the legacy analyzer (Calcite-only). + * + * `@param` node nomv AST node + * `@param` context analysis context + * `@return` never returns (throws) + * `@throws` RuntimeException when nomv is used in the legacy analyzer + */ `@Override` public LogicalPlan visitNoMv(NoMv node, AnalysisContext context) { throw getOnlyForCalciteException("nomv"); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/opensearch/sql/analysis/Analyzer.java` around lines 552 - 565, The three new public visitor overrides (visitNoMv, visitMvExpand, visitGraphLookup) are missing JavaDoc and unit tests; add JavaDoc to each method following project pattern including `@param` AnalysisContext context, `@return` LogicalPlan, and `@throws` for the Calcite-only exception, and add unit tests (in the same commit) that invoke Analyzer.visitNoMv/visitMvExpand/visitGraphLookup with a mock or real NoMv/MvExpand/GraphLookup node and assert that getOnlyForCalciteException is thrown (and that the exception message matches "nomv", "mvexpand", "graphlookup" respectively) to cover the Calcite-only path.core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
2545-2551: Hard-coded 100-row limit on graphLookup source is still present.This was flagged in a prior review. The unconditional
builder.limit(0, 100)silently truncates the start set and will drop traversal results for larger inputs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` around lines 2545 - 2551, The visitGraphLookup method in CalciteRelNodeVisitor currently applies an unconditional hard-coded start-set limit via builder.limit(0, 100) which truncates the graph lookup input; remove this unconditional limit and instead respect a configurable setting or absence thereof (e.g., read a graph lookup start-limit from the CalcitePlanContext or a global config exposed to CalciteRelNodeVisitor such as getGraphLookupStartLimit()), and only apply builder.limit when a positive configured limit exists; update visitGraphLookup to use that config (or no limit if not configured) and remove the silent 100-row truncation.
🧹 Nitpick comments (13)
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5060.yml (1)
15-27: Add pre-test cleanup to avoid index-already-exists flakiness.If a prior run fails before teardown,
indices.createwill 400 and this test will abort. Consider deleting the index (ignoring 404) before creating it.♻️ Suggested setup tweak
setup: - do: query.settings: body: transient: plugins.calcite.enabled: true + - do: + indices.delete: + index: test_join_ad_error_5133 + ignore: 404 - do: indices.create: index: test_join_ad_error_5133 body:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5060.yml` around lines 15 - 27, The test currently always calls indices.create for index "test_join_ad_error_5133", which can fail if a previous run left the index behind; add a pre-test cleanup step that calls indices.delete for "test_join_ad_error_5133" and ignores a 404 before the indices.create step so the create will not 400; place this delete call immediately before the existing indices.create block (use the same index name) and ensure the delete is configured to ignore-not-found/404 errors.core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
1100-1111: Verify that the permissiveIGNORE+CHARACTERoverload doesn't cause unintended type-check bypasses.The new overload at lines 1108–1111 uses
SqlTypeFamily.IGNOREfor the first operand, meaning it will match any type (e.g.,INTEGER,BOOLEAN) combined with aCHARACTERkey. Since the existing overload (lines 1100–1107) is more specific and is checked first, this acts as a catch-all fallback — which is fine for struct/object field access. However, it also silently accepts semantically invalid combinations likeINTEGER_FIELD["key"], deferring the failure to runtime.If the intent is to support struct/object field access by name, consider restricting the first operand to
SqlTypeFamily.ANYor documenting the rationale inline so future maintainers understand whyIGNOREwas chosen over a narrower family.#!/bin/bash # Find all usages of INTERNAL_ITEM in the codebase to understand which types are passed as the first operand rg -n --type=java -C3 'INTERNAL_ITEM'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java` around lines 1100 - 1111, The permissive overload registered via registerOperator(INTERNAL_ITEM, SqlStdOperatorTable.ITEM, PPLTypeChecker.family(SqlTypeFamily.IGNORE, SqlTypeFamily.CHARACTER)) can match any left-hand type and silently defer invalid cases to runtime; change the first operand family from SqlTypeFamily.IGNORE to a narrower family such as SqlTypeFamily.ANY (or a more specific family representing structs/objects if available) so that type-checking fails earlier, and add an inline comment near the registerOperator call documenting why this family was chosen to prevent future regressions.core/src/test/java/org/opensearch/sql/executor/QueryServiceTest.java (1)
110-112: Prefer JUnit 5'sassertInstanceOfoverassertNotNull+assertTrue(e instanceof ...)
assertNotNull(e)is redundant: anullreference would already make theinstanceofcheck returnfalse, failing theassertTrue. JUnit 5'sassertInstanceOfconsolidates both checks and produces a descriptive failure message (e.g., "expected instance of X but was null").♻️ Proposed refactor
-assertNotNull(e); -assertTrue(e instanceof UnsupportedOperationException); -assertTrue(e.getMessage().contains("Calcite error")); +UnsupportedOperationException uoe = assertInstanceOf(UnsupportedOperationException.class, e); +assertTrue(uoe.getMessage().contains("Calcite error"));-assertNotNull(e); -assertTrue(e instanceof org.opensearch.sql.exception.CalciteUnsupportedException); -assertTrue(e.getCause() instanceof AssertionError); -assertTrue(e.getMessage().contains("Calcite assertion failed")); +org.opensearch.sql.exception.CalciteUnsupportedException cue = + assertInstanceOf(org.opensearch.sql.exception.CalciteUnsupportedException.class, e); +assertInstanceOf(AssertionError.class, cue.getCause()); +assertTrue(cue.getMessage().contains("Calcite assertion failed"));Also applies to: 139-141, 168-172, 197-201
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/test/java/org/opensearch/sql/executor/QueryServiceTest.java` around lines 110 - 112, Replace the redundant null + instanceof checks in QueryServiceTest by using JUnit 5's assertInstanceOf: for each occurrence (e.g., the block using assertNotNull(e); assertTrue(e instanceof UnsupportedOperationException); assertTrue(e.getMessage().contains("Calcite error")) in QueryServiceTest), remove assertNotNull and the instanceof assert and call assertInstanceOf(UnsupportedOperationException.class, e) instead (optionally assign the returned value to a variable to assert on the message), then keep the existing message assertion (assertTrue on e.getMessage().contains(...)) or use the assigned instance for that check; apply the same change to the other similar blocks mentioned (lines around 139-141, 168-172, 197-201).integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5114.yml (1)
54-66: Consider addingdatarowsassertion — the sort is deterministic for this datasetAll
idvalues (1–6) are positive, soabs(id) + 1 == id + 1for every document. The sorted output offields id | head 5is fully deterministic:[[1],[2],[3],[4],[5]]. Adding adatarowsassertion (mirroring the control test at line 97) would make this a stronger regression test — confirming not only thatheadis preserved but also that the rows are correct.💡 Suggested addition
- match: { total: 5 } - match: { schema: [ { name: id, type: int } ] } + - match: { datarows: [ [ 1 ], [ 2 ], [ 3 ], [ 4 ], [ 5 ] ] }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5114.yml` around lines 54 - 66, Add a datarows assertion to the "Issue 5114: head should be preserved for non-order-equivalent deterministic sort expression" test to verify the exact rows returned by the query `source=issue5114 | eval a = abs(id) + 1 | sort a | fields id | head 5`; specifically assert datarows: [[1],[2],[3],[4],[5]] so the test confirms deterministic ordering in addition to total and schema checks.core/src/main/java/org/opensearch/sql/ast/tree/NoMv.java (2)
10-10:@Nullableimport source is inconsistent with the rest of the module.
SPath.java(the closest analogous file in this package) usesorg.checkerframework.checker.nullness.qual.Nullable. Usingjavax.annotation.Nullablehere diverges from that convention. Consider aligning to the project standard.♻️ Proposed fix
-import javax.annotation.Nullable; +import org.checkerframework.checker.nullness.qual.Nullable;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/opensearch/sql/ast/tree/NoMv.java` at line 10, Replace the inconsistent nullable import in NoMv.java: change the import of javax.annotation.Nullable to use org.checkerframework.checker.nullness.qual.Nullable (the same source used in SPath.java) and update any `@Nullable` annotations in the NoMv class to rely on that Checker Framework annotation to keep nullability imports consistent across the module.
37-40: Missing@Overrideonattach()— inconsistent with sibling AST nodes.
MvExpand.attach()(and other AST node classes in this package) annotate theattach()override with@Override. Without it the compiler won't catch accidental signature drift.♻️ Proposed fix
+ `@Override` public NoMv attach(UnresolvedPlan child) { this.child = child; return this; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/opensearch/sql/ast/tree/NoMv.java` around lines 37 - 40, The attach() method in NoMv should be annotated with `@Override` to match sibling AST nodes like MvExpand.attach() and ensure the compiler catches signature drift; update the NoMv.attach(UnresolvedPlan child) method to add the `@Override` annotation (keeping the same parameter type UnresolvedPlan and return type NoMv) so it properly overrides the superclass/interface method.core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java (1)
481-489: Javadoc@param/@returntags are escaped as literal text.The backtick-wrapped
`@param`and`@return`won't be parsed by Javadoc tooling. They should be standard Javadoc tags.📝 Proposed fix
/** * Build an MVEXPAND plan node and attach it to the input plan. * - * <p>`@param` input input plan `@param` field field to expand `@param` limit optional - * per-document limit `@return` MvExpand plan attached to the input + * `@param` input input plan + * `@param` field field to expand + * `@param` limit optional per-document limit + * `@return` MvExpand plan attached to the input */🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java` around lines 481 - 489, The Javadoc for mvexpand incorrectly uses backtick-escaped literals like `@param` and `@return`; update the comment on the mvexpand(UnresolvedPlan input, Field field, Integer limit) method (and mention of MvExpand/UnresolvedPlan/Field) to use proper Javadoc tags: add real `@param` tags for input, field, and limit and a proper `@return` describing the returned UnresolvedPlan (MvExpand attached to input), ensuring each tag is on its own line directly under the summary.ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java (1)
523-531: Inconsistent field anonymization approach compared to sibling visitors.
visitMvExpandusesMASK_COLUMNdirectly, while the adjacentvisitNoMv,visitExpand, andvisitMvCombineall usevisitExpression(node.getField()). The output is equivalent for simple fields, butvisitExpressioncorrectly handles meta-fields (e.g.,_id→meta_identifier). For consistency and correctness with meta-fields:♻️ Proposed fix
`@Override` public String visitMvExpand(MvExpand node, String context) { String child = node.getChild().get(0).accept(this, context); - String field = MASK_COLUMN; // Always anonymize field names + String field = visitExpression(node.getField()); if (node.getLimit() != null) { return StringUtils.format("%s | mvexpand %s limit=%s", child, field, MASK_LITERAL); } return StringUtils.format("%s | mvexpand %s", child, field); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java` around lines 523 - 531, visitMvExpand currently hardcodes field anonymization using MASK_COLUMN which diverges from visitNoMv/visitExpand/visitMvCombine that call visitExpression(node.getField()) to properly handle meta-fields; update visitMvExpand to compute the anonymized field via visitExpression(node.getField()) and use that variable in both the limit and non-limit return paths (keep MASK_LITERAL usage for limit value), ensuring meta-field names like _id are converted to their masked forms consistently.ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1)
646-712: Good comprehensive test coverage forgraphLookupanonymization.The 8 scenarios cover the key parameter combinations well. One minor gap: there's no test case exercising the
usePIT=trueanonymization path (the anonymizer conditionally appendsusePIT=trueat lines 253-255 ofPPLQueryDataAnonymizer), thoughsupportArrayandbatchModeeach have dedicated cases.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java` around lines 646 - 712, Add a test in PPLQueryDataAnonymizerTest that exercises the anonymizer path which appends usePIT=true; call anonymize with a graphLookup clause including "usePIT=true" (similar to other cases) and assert the output contains "usePIT=true" in the anonymized graphlookup string, referencing the anonymize helper used across this file so the test triggers the branch in PPLQueryDataAnonymizer (the code around lines 253-255 that conditionally appends usePIT=true).integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java (1)
689-693: Prefer the shared test constant for the MVEXPAND index name.
Keeps the index name centralized and avoids drift if the constant changes.♻️ Suggested update
- MVEXPAND_EDGE_CASES( - "mvexpand_edge_cases", + MVEXPAND_EDGE_CASES( + TestsConstants.TEST_INDEX_MVEXPAND_EDGE_CASES, "mvexpand_edge_cases", getMappingFile("mvexpand_edge_cases_mapping.json"), "src/test/resources/mvexpand_edge_cases.json"),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java` around lines 689 - 693, Replace the hard-coded index name "mvexpand_edge_cases" in the MVEXPAND_EDGE_CASES enum entry with the project's shared test constant for the MVEXPAND index (use the centralized test constant from the test constants/class used across tests) so the enum MVEXPAND_EDGE_CASES(...) uses that constant for the index name; update the second constructor argument (currently the string literal) to reference the shared constant so index names remain centralized and consistent with other tests.integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java (1)
262-290: Deduplicate identical all‑null array tests.
testNoMvWithAllNullsandtestNoMvArrayWithAllNullsare the same query/assertions, so one can be removed or repurposed to cover a distinct case.♻️ Suggested cleanup
- `@Test` - public void testNoMvArrayWithAllNulls() throws IOException { - String q = - "source=" - + TEST_INDEX_BANK_WITH_NULL_VALUES - + " | where account_number = 25 | eval arr = array(age, age, age) | nomv arr | fields" - + " account_number, arr"; - - JSONObject result = executeQuery(q); - - verifySchema(result, schema("account_number", null, "bigint"), schema("arr", null, "string")); - - verifyDataRows(result, rows(25, "")); - }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java` around lines 262 - 290, Two test methods in CalciteNoMvCommandIT—testNoMvWithAllNulls and testNoMvArrayWithAllNulls—are identical; remove or repurpose one to avoid duplication. Locate the two test methods (testNoMvWithAllNulls, testNoMvArrayWithAllNulls) and either delete one of them or change its query/assertions to cover a different scenario (e.g., mixed nulls or single-element null array) so each test verifies a unique behavior while keeping verifySchema/verifyDataRows usage consistent.core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
2545-2604: Missing JavaDoc onvisitGraphLookup.Per coding guidelines, public methods in
core/src/main/java/**/*.javamust have JavaDoc with@param,@return, and@throws. The other new visitor methods (visitNoMv,visitMvExpand) have proper JavaDoc, butvisitGraphLookupdoes not. As per coding guidelines: "Public methods MUST have JavaDoc with@param,@return, and@throws".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` around lines 2545 - 2604, Add JavaDoc to the public method visitGraphLookup(GraphLookup node, CalcitePlanContext context) describing its behavior and include `@param` tags for "node" and "context", an `@return` tag describing the returned RelNode, and `@throws` tags for any runtime/checked exceptions the method may propagate (e.g., exceptions from rexVisitor.analyze(), analyze(), or builder.build()). Ensure the JavaDoc style and wording match existing visitor methods like visitNoMv and visitMvExpand so it conforms to the project's Javadoc conventions.integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java (1)
168-177: Misleading comment: "partial results" but expected output is empty{}The comment on line 174 states "Malformed JSON returns partial results parsed before the error", but the expected result is
{}— an empty map meaning nothing was successfully extracted. "Partial results" implies some fields were parsed; that description contradicts the assertion. Consider tightening the comment, e.g. "Malformed JSON: parser abandons extraction, returns empty map".✏️ Suggested comment fix
- // Malformed JSON returns partial results parsed before the error + // Malformed JSON: parser cannot complete extraction and returns an empty map verifySchema(result, schema("result", "struct")); verifyDataRows(result, rows(new JSONObject("{}")));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java` around lines 168 - 177, The inline comment in testSpathAutoExtractMalformedJson incorrectly says "partial results parsed before the error" while the assertion expects an empty map; update the comment near the call to executeQuery / the test method testSpathAutoExtractMalformedJson to accurately describe behavior (e.g., "Malformed JSON: parser abandons extraction and returns empty map {}") so the comment matches the verifyDataRows expectation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@core/src/main/java/org/opensearch/sql/executor/QueryService.java`:
- Around line 67-83: Add unit tests for QueryService.propagateCalciteError to
cover the missing branches: (1) verify that passing a VirtualMachineError (e.g.,
new OutOfMemoryError()) causes propagateCalciteError to rethrow the
VirtualMachineError (test should assert the error is thrown and not passed to
the ResponseListener); (2) verify that passing an ExceptionInInitializerError
whose getException() returns an Exception results in listener.onFailure being
called with that underlying Exception (mock ResponseListener and assert
onFailure received the unwrapped Exception); (3) verify that passing an
ExceptionInInitializerError whose getException() returns a non-Exception (e.g.,
an Error) falls through to the final branch and results in listener.onFailure
being called with a CalciteUnsupportedException (assert the listener receives a
CalciteUnsupportedException and that its cause/message reflect the original
Throwable). Ensure tests reference propagateCalciteError via QueryService and
mock or spy ResponseListener to assert interactions.
In `@core/src/test/java/org/opensearch/sql/executor/QueryServiceTest.java`:
- Around line 94-121: Add boundary tests that pass Optional.empty() for the
calcite-error parameter to cover the no-prior-Calcite-error case: in
QueryServiceTest create equivalent test methods to the existing ones (e.g., for
executeWithLegacy and explainWithLegacy) that call
QueryService.executeWithLegacy and QueryService.explainWithLegacy with
Optional.empty() and assert the expected fallback behavior (whether it executes
V2, returns a specific error, or invokes onFailure with the V2 exception).
Reference the same mocked setup used in the current tests (analyzer,
executionEngine, planner, settings) and the existing responseListener assertions
but adjust expectations to match the empty-Optional path; add similar boundary
tests corresponding to the other test blocks noted (lines 123-151, 153-180,
182-210).
- Around line 116-120: Replace the lenient stub for settings/analyzer with a
strict stub and add an explicit verification that the legacy analysis path is
invoked: stop using lenient() for
settings.getSettingValue(Key.CALCITE_ENGINE_ENABLED) and change the analyzer
stub to throw the v2Exception via doThrow when analyzer.analyze(any(), any()) is
called, then after calling QueryService.executeWithLegacy(...) add
verify(analyzer).analyze(any(), any()) to assert the legacy analysis was
exercised; apply the same pattern to the other test blocks (around the other
mentioned ranges) and consider replacing the combined assertNotNull/instanceof
checks with assertInstanceOf and adding a test for Optional.empty() boundary
case.
In `@docs/user/ppl/cmd/mvexpand.md`:
- Around line 133-136: Update the "Expected output" example in
docs/user/ppl/cmd/mvexpand.md to use valid JSON (double-quoted keys and string
values) instead of Python-style single-quoted dicts; specifically replace the
Python-like "{'reason': ...}" object with a proper JSON object (e.g., {"reason":
"...", "details": "...", "type": "..."}) and ensure the subsequent error line
matches JSON formatting as needed so the example reflects actual OpenSearch JSON
error responses.
In `@docs/user/ppl/cmd/spath.md`:
- Around line 161-163: Example output is inconsistent: the row with
"tags":["python"] renders as scalar "python" while other rows show arrays like
"[java, sql]"; update the docs so example matches the documented flattening
rule. Either change the example's Row 2 to display [python] (so doc.tags{} shows
[python]) or update the auto-extract description to explicitly state
single-element arrays are flattened to scalars; adjust the example and the
auto-extract mode sentence that currently reads "arrays become strings like
'[java, sql]'" and ensure doc.tags{} examples and the flattening rule are
consistent.
- Around line 162-163: The example table in spath.md hides the difference
between a JSON null value and an absent key because both render as "null";
update the example to make the distinction explicit by adding an extra column or
an inline note that shows the underlying map value for doc.active (e.g., a
column that prints the raw map value or a cast/compare expression such as
doc.active IS NULL vs doc.active == "null"), and update the surrounding prose
that references doc.active, Line 38 and Line 173 behavior to call out that the
displayed "null" is visually identical but the map stores a string "null" when
the key existed with null versus a true absent key which returns null.
- Around line 148-152: The example uses a non-existent field doc_auto for the
spath input and also lists doc_auto in the fields clause; change the spath
invocation to use input=n and remove doc_auto from the fields list so the spath
call and fields reference the actual test-data field n (update the spath
parameter input=... and the fields clause to reference doc.user.name,
doc.user.age, doc.`tags{}`, doc.active instead of doc_auto).
- Around line 36-39: The docs for spath's flattening rules don't mention that in
auto-extract mode single-element arrays are unwrapped to scalars; update the
paragraph around the "Arrays use `{}` suffix" / "All values are stringified"
lines to add a rule stating that when auto-extract is enabled (spath
auto-extract mode), arrays with exactly one element are converted to the scalar
element (e.g., ["python"] -> "python") rather than preserved as a bracketed
string, and reference the example demonstrating this behavior (Example 5 Row 2)
so readers understand this exception to the array/stringification rule.
In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java`:
- Around line 42-70: The comment about malformed JSON handling is misleading:
the test data uses the field "malformed_doc" in the Request autoExtractDoc
(index test_spath_auto) but the expected behavior is to return an empty JSON
object {} rather than "partial results parsed before the error"; update the
comment to state that malformed JSON yields an empty object, or if you intended
the former behavior adjust the test expectation/assertion accordingly; locate
the setup in CalcitePPLSpathCommandIT where autoExtractDoc and the
"malformed_doc" field are created and either change the comment text to describe
"malformed JSON returns empty result {}" or change the test assertions that
reference test_spath_auto/malformed_doc to expect partial parse if you actually
want that behavior.
In `@integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java`:
- Around line 259-273: Add a new NULL-input coverage case for the nomv function
by extending or adding a test in NewAddedCommandsIT (near
testNoMvUnsupportedInV2) that calls executeQuery with a query that sources the
mvexpand_edge_cases index (or the nullskills document) and applies "nomv" to a
field that is NULL to assert the expected behavior; update the test method or
add a new `@Test` (e.g., testNomvHandlesNullInput) to build the query string,
catch ResponseException the same way the existing tests do, and call
verifyQuery/result assertions to validate NULL handling for nomv.
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`:
- Around line 542-551: The reset() implementation must also clear batch-mode
state so re-iteration can produce results; update reset() to reset any
batch-mode flags and iterators (e.g., set batchCompleted/batchModeCompleted =
false, clear currentBatch or batchIterator, reset batchIndex/offset counters)
and ensure close() likewise nulls/cleans batch resources; locate these symbols
in CalciteEnumerableGraphLookup (reset(), close(), current, and any fields named
batchCompleted, batchIterator, currentBatch, batchIndex or similar) and reset
them when reset()/close() is called.
In `@ppl/src/main/antlr/OpenSearchPPLParser.g4`:
- Around line 1728-1734: The searchableKeyWord rule currently lists only
FROM_FIELD, TO_FIELD, MAX_DEPTH, DEPTH_FIELD, DIRECTION, UNI, and BI; add the
missing graphLookup tokens START_FIELD, SUPPORT_ARRAY, BATCH_MODE, and USE_PIT
to searchableKeyWord so they are accepted as keywords that can be used as
identifiers via keywordsCanBeId. Update the OpenSearchPPLParser.g4
searchableKeyWord alternative list to include START_FIELD, SUPPORT_ARRAY,
BATCH_MODE, and USE_PIT alongside the existing tokens.
In `@ppl/src/test/java/org/opensearch/sql/ppl/utils/SPathRewriteTest.java`:
- Around line 80-86: Add tests in SPathRewriteTest to cover null/empty/error
cases for auto-extract mode: extend testSpathAutoExtractMode (or add new tests)
to assert behavior when the input field is null and when it is an empty string,
and assert that sp.rewriteAsEval() (the SPath auto-extract path using
json_extract_all) throws the expected exception type for malformed JSON; include
assertions for the exact exception class or message you expect (or adjust if the
parser throws a different exception) and use the same helpers used in the file
(plan(), relation("t"), let(), field("a"), function("json_extract_all",
field("a"))) so the tests exercise null, empty, and invalid-json/error paths.
---
Outside diff comments:
In `@core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java`:
- Around line 286-293: In resolveFieldAccess, the call to joinParts computes the
count of remaining parts incorrectly causing IndexOutOfBoundsException; change
the third argument so it uses the remaining parts count accounting for start
(i.e. replace parts.size() - length with parts.size() - start - length) so
itemName = joinParts(parts, length + start, parts.size() - start - length)
before calling createItemAccess(field, itemName, context).
In `@docs/category.json`:
- Around line 10-69: Add "user/ppl/cmd/graphlookup.md" to the ppl_cli_calcite
array in docs/category.json (the array named ppl_cli_calcite) so the new entry
appears alphabetically between "user/ppl/cmd/grok.md" and
"user/ppl/cmd/head.md"; ensure you insert it as a proper JSON string with
correct commas and preserve valid JSON formatting.
In `@scripts/build.sh`:
- Around line 71-81: The script hardcodes -Pcrypto.standard=FIPS-140-3 into all
Gradle invocations; add a CLI flag (e.g., -c/--crypto or CRYPTO_STANDARD) with a
default of empty and update getopts to populate CRYPTO_STANDARD, then only
append "-Pcrypto.standard=$CRYPTO_STANDARD" to the ./gradlew commands (assemble,
publishToMavenLocal, publishPluginZipPublicationToZipStagingRepository) if
CRYPTO_STANDARD is non-empty so callers can disable or change the crypto
standard and avoid FIPS-related checksum failures.
---
Duplicate comments:
In `@core/src/main/java/org/opensearch/sql/analysis/Analyzer.java`:
- Around line 552-565: The three new public visitor overrides (visitNoMv,
visitMvExpand, visitGraphLookup) are missing JavaDoc and unit tests; add JavaDoc
to each method following project pattern including `@param` AnalysisContext
context, `@return` LogicalPlan, and `@throws` for the Calcite-only exception, and
add unit tests (in the same commit) that invoke
Analyzer.visitNoMv/visitMvExpand/visitGraphLookup with a mock or real
NoMv/MvExpand/GraphLookup node and assert that getOnlyForCalciteException is
thrown (and that the exception message matches "nomv", "mvexpand", "graphlookup"
respectively) to cover the Calcite-only path.
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 2545-2551: The visitGraphLookup method in CalciteRelNodeVisitor
currently applies an unconditional hard-coded start-set limit via
builder.limit(0, 100) which truncates the graph lookup input; remove this
unconditional limit and instead respect a configurable setting or absence
thereof (e.g., read a graph lookup start-limit from the CalcitePlanContext or a
global config exposed to CalciteRelNodeVisitor such as
getGraphLookupStartLimit()), and only apply builder.limit when a positive
configured limit exists; update visitGraphLookup to use that config (or no limit
if not configured) and remove the silent 100-row truncation.
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`:
- Around line 433-434: The loop increments currentLevelDepth and compares it to
graphLookup.maxDepth, but when maxDepth is negative (e.g., -1 for unlimited) the
current check (++currentLevelDepth > graphLookup.maxDepth) will always break;
update the condition in CalciteEnumerableGraphLookup so the depth check is
skipped when graphLookup.maxDepth < 0 — e.g., only increment/compare when
graphLookup.maxDepth >= 0 (or guard the comparison with graphLookup.maxDepth >=
0) so unlimited depth is handled correctly by allowing the loop to continue.
- Around line 412-430: CalciteEnumerableGraphLookup currently only adds rows to
results when nextValues is non-empty, which drops matched rows when traversal
doesn't expand; always add the constructed row (use graphLookup.depthField to
decide between rowArray and rowWithDepth using rowArray/currentLevelDepth) to
results unconditionally, and keep the subsequent loop that iterates over
nextValues only for enqueuing unvisited non-null values into visitedNodes and
queue (do not gate the results.add(...) behind nextValues.isEmpty()).
---
Nitpick comments:
In `@core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java`:
- Around line 481-489: The Javadoc for mvexpand incorrectly uses
backtick-escaped literals like `@param` and `@return`; update the comment on the
mvexpand(UnresolvedPlan input, Field field, Integer limit) method (and mention
of MvExpand/UnresolvedPlan/Field) to use proper Javadoc tags: add real `@param`
tags for input, field, and limit and a proper `@return` describing the returned
UnresolvedPlan (MvExpand attached to input), ensuring each tag is on its own
line directly under the summary.
In `@core/src/main/java/org/opensearch/sql/ast/tree/NoMv.java`:
- Line 10: Replace the inconsistent nullable import in NoMv.java: change the
import of javax.annotation.Nullable to use
org.checkerframework.checker.nullness.qual.Nullable (the same source used in
SPath.java) and update any `@Nullable` annotations in the NoMv class to rely on
that Checker Framework annotation to keep nullability imports consistent across
the module.
- Around line 37-40: The attach() method in NoMv should be annotated with
`@Override` to match sibling AST nodes like MvExpand.attach() and ensure the
compiler catches signature drift; update the NoMv.attach(UnresolvedPlan child)
method to add the `@Override` annotation (keeping the same parameter type
UnresolvedPlan and return type NoMv) so it properly overrides the
superclass/interface method.
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 2545-2604: Add JavaDoc to the public method
visitGraphLookup(GraphLookup node, CalcitePlanContext context) describing its
behavior and include `@param` tags for "node" and "context", an `@return` tag
describing the returned RelNode, and `@throws` tags for any runtime/checked
exceptions the method may propagate (e.g., exceptions from rexVisitor.analyze(),
analyze(), or builder.build()). Ensure the JavaDoc style and wording match
existing visitor methods like visitNoMv and visitMvExpand so it conforms to the
project's Javadoc conventions.
In
`@core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java`:
- Around line 1100-1111: The permissive overload registered via
registerOperator(INTERNAL_ITEM, SqlStdOperatorTable.ITEM,
PPLTypeChecker.family(SqlTypeFamily.IGNORE, SqlTypeFamily.CHARACTER)) can match
any left-hand type and silently defer invalid cases to runtime; change the first
operand family from SqlTypeFamily.IGNORE to a narrower family such as
SqlTypeFamily.ANY (or a more specific family representing structs/objects if
available) so that type-checking fails earlier, and add an inline comment near
the registerOperator call documenting why this family was chosen to prevent
future regressions.
In `@core/src/test/java/org/opensearch/sql/executor/QueryServiceTest.java`:
- Around line 110-112: Replace the redundant null + instanceof checks in
QueryServiceTest by using JUnit 5's assertInstanceOf: for each occurrence (e.g.,
the block using assertNotNull(e); assertTrue(e instanceof
UnsupportedOperationException); assertTrue(e.getMessage().contains("Calcite
error")) in QueryServiceTest), remove assertNotNull and the instanceof assert
and call assertInstanceOf(UnsupportedOperationException.class, e) instead
(optionally assign the returned value to a variable to assert on the message),
then keep the existing message assertion (assertTrue on
e.getMessage().contains(...)) or use the assigned instance for that check; apply
the same change to the other similar blocks mentioned (lines around 139-141,
168-172, 197-201).
In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java`:
- Around line 262-290: Two test methods in
CalciteNoMvCommandIT—testNoMvWithAllNulls and testNoMvArrayWithAllNulls—are
identical; remove or repurpose one to avoid duplication. Locate the two test
methods (testNoMvWithAllNulls, testNoMvArrayWithAllNulls) and either delete one
of them or change its query/assertions to cover a different scenario (e.g.,
mixed nulls or single-element null array) so each test verifies a unique
behavior while keeping verifySchema/verifyDataRows usage consistent.
In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java`:
- Around line 168-177: The inline comment in testSpathAutoExtractMalformedJson
incorrectly says "partial results parsed before the error" while the assertion
expects an empty map; update the comment near the call to executeQuery / the
test method testSpathAutoExtractMalformedJson to accurately describe behavior
(e.g., "Malformed JSON: parser abandons extraction and returns empty map {}") so
the comment matches the verifyDataRows expectation.
In `@integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java`:
- Around line 689-693: Replace the hard-coded index name "mvexpand_edge_cases"
in the MVEXPAND_EDGE_CASES enum entry with the project's shared test constant
for the MVEXPAND index (use the centralized test constant from the test
constants/class used across tests) so the enum MVEXPAND_EDGE_CASES(...) uses
that constant for the index name; update the second constructor argument
(currently the string literal) to reference the shared constant so index names
remain centralized and consistent with other tests.
In `@integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5060.yml`:
- Around line 15-27: The test currently always calls indices.create for index
"test_join_ad_error_5133", which can fail if a previous run left the index
behind; add a pre-test cleanup step that calls indices.delete for
"test_join_ad_error_5133" and ignores a 404 before the indices.create step so
the create will not 400; place this delete call immediately before the existing
indices.create block (use the same index name) and ensure the delete is
configured to ignore-not-found/404 errors.
In `@integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5114.yml`:
- Around line 54-66: Add a datarows assertion to the "Issue 5114: head should be
preserved for non-order-equivalent deterministic sort expression" test to verify
the exact rows returned by the query `source=issue5114 | eval a = abs(id) + 1 |
sort a | fields id | head 5`; specifically assert datarows:
[[1],[2],[3],[4],[5]] so the test confirms deterministic ordering in addition to
total and schema checks.
In `@ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java`:
- Around line 523-531: visitMvExpand currently hardcodes field anonymization
using MASK_COLUMN which diverges from visitNoMv/visitExpand/visitMvCombine that
call visitExpression(node.getField()) to properly handle meta-fields; update
visitMvExpand to compute the anonymized field via
visitExpression(node.getField()) and use that variable in both the limit and
non-limit return paths (keep MASK_LITERAL usage for limit value), ensuring
meta-field names like _id are converted to their masked forms consistently.
In
`@ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java`:
- Around line 646-712: Add a test in PPLQueryDataAnonymizerTest that exercises
the anonymizer path which appends usePIT=true; call anonymize with a graphLookup
clause including "usePIT=true" (similar to other cases) and assert the output
contains "usePIT=true" in the anonymized graphlookup string, referencing the
anonymize helper used across this file so the test triggers the branch in
PPLQueryDataAnonymizer (the code around lines 253-255 that conditionally appends
usePIT=true).
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (64)
async-query-core/build.gradlebuild.gradlecommon/build.gradlecore/src/main/java/org/opensearch/sql/analysis/Analyzer.javacore/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.javacore/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.javacore/src/main/java/org/opensearch/sql/ast/tree/MvExpand.javacore/src/main/java/org/opensearch/sql/ast/tree/NoMv.javacore/src/main/java/org/opensearch/sql/ast/tree/SPath.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javacore/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.javacore/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.javacore/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractAllFunctionImpl.javacore/src/test/java/org/opensearch/sql/executor/QueryServiceTest.javacore/src/test/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImplTest.javacore/src/test/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractAllFunctionImplTest.javadatasources/build.gradledocs/category.jsondocs/user/ppl/cmd/mvexpand.mddocs/user/ppl/cmd/nomv.mddocs/user/ppl/cmd/spath.mddocs/user/ppl/index.mddoctest/test_data/structured.jsondoctest/test_mapping/structured.jsoninteg-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/standalone/JsonExtractAllFunctionIT.javainteg-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.javainteg-test/src/test/java/org/opensearch/sql/legacy/TestsConstants.javainteg-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.javainteg-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.javainteg-test/src/test/java/org/opensearch/sql/security/CrossClusterTestBase.javainteg-test/src/test/resources/expectedOutput/calcite/explain_issue_5114_sort_expr_head_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_nomv.yamlinteg-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_mvexpand.yamlinteg-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_nomv.yamlinteg-test/src/test/resources/indexDefinitions/mvexpand_edge_cases_mapping.jsoninteg-test/src/test/resources/mvexpand_edge_cases.jsoninteg-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5060.ymlinteg-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5114.ymllegacy/build.gradleopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortExprIndexScanRule.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.javappl/build.gradleppl/src/main/antlr/OpenSearchPPLLexer.g4ppl/src/main/antlr/OpenSearchPPLParser.g4ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.javappl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLFunctionTypeTest.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLMvExpandTest.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLNoMvTest.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLSpathTest.javappl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.javappl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.javappl/src/test/java/org/opensearch/sql/ppl/utils/SPathRewriteTest.javascripts/build.shsql/build.gradle
✅ Files skipped from review due to trivial changes (4)
- integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml
- integ-test/src/test/resources/indexDefinitions/mvexpand_edge_cases_mapping.json
- integ-test/src/test/resources/expectedOutput/calcite/explain_nomv.yaml
- integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_nomv.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
- integ-test/src/test/java/org/opensearch/sql/legacy/TestsConstants.java
- docs/user/ppl/index.md
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java
Show resolved
Hide resolved
...h/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java
Show resolved
Hide resolved
ef94e4f to
cd84d74
Compare
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (5)
ppl/src/main/antlr/OpenSearchPPLParser.g4 (1)
1728-1734: AddSTART_FIELD,SUPPORT_ARRAY,BATCH_MODE, andUSE_PITtosearchableKeyWord.Only 7 of the 11 new graphLookup tokens are included in
searchableKeyWord. The missing four (START_FIELD,SUPPORT_ARRAY,BATCH_MODE,USE_PIT) cannot be used as field identifiers, which is inconsistent with how the other graphLookup tokens are handled.Proposed fix
| FROM_FIELD | TO_FIELD | MAX_DEPTH | DEPTH_FIELD | DIRECTION | UNI | BI + | START_FIELD + | SUPPORT_ARRAY + | BATCH_MODE + | USE_PIT ;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ppl/src/main/antlr/OpenSearchPPLParser.g4` around lines 1728 - 1734, The parser rule searchableKeyWord is missing four graphLookup tokens (START_FIELD, SUPPORT_ARRAY, BATCH_MODE, USE_PIT) so they cannot be used as field identifiers; update the searchableKeyWord alternative list in OpenSearchPPLParser.g4 (the searchableKeyWord rule) to include START_FIELD, SUPPORT_ARRAY, BATCH_MODE, and USE_PIT alongside FROM_FIELD, TO_FIELD, MAX_DEPTH, DEPTH_FIELD, DIRECTION, UNI, and BI so these tokens are treated as searchable identifiers.opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java (3)
542-546:⚠️ Potential issue | 🟡 Minor
reset()does not clearbatchModeCompleted, making re-iteration in batch mode impossible.After
reset(),batchModeCompletedremainstrue, somoveNextBatchMode()immediately returnsfalse.🛠️ Proposed fix
`@Override` public void reset() { sourceEnumerator.reset(); current = null; + batchModeCompleted = false; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java` around lines 542 - 546, The reset() method currently resets sourceEnumerator and current but fails to clear the batchModeCompleted flag, so subsequent iterations in batch mode immediately stop; update reset() to also set batchModeCompleted = false (or otherwise reinitialize batchModeCompleted) so moveNextBatchMode() can run again after reset() — locate the reset() method in CalciteEnumerableGraphLookup and update it alongside the existing references to sourceEnumerator, current, and the batchModeCompleted flag.
433-433:⚠️ Potential issue | 🔴 Critical
maxDepth = -1(unlimited) is broken — BFS stops after the first level.When
maxDepthis-1(documented as unlimited),++currentLevelDepth > graphLookup.maxDepthevaluates to1 > -1→true, breaking immediately after one level.🛠️ Proposed fix
- if (++currentLevelDepth > graphLookup.maxDepth) break; + currentLevelDepth++; + if (graphLookup.maxDepth >= 0 && currentLevelDepth > graphLookup.maxDepth) { + break; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java` at line 433, The BFS depth check in CalciteEnumerableGraphLookup incorrectly stops when graphLookup.maxDepth == -1 (unlimited); change the condition in the loop to only compare and potentially break when maxDepth is non-negative (e.g. if (graphLookup.maxDepth >= 0 && ++currentLevelDepth > graphLookup.maxDepth) break;) so that a negative maxDepth (like -1) is treated as unlimited; update the check that uses currentLevelDepth and graphLookup.maxDepth accordingly.
398-430:⚠️ Potential issue | 🔴 CriticalLeaf nodes are silently dropped from BFS results.
When all
fromFieldvalues of a matched document are already visited,nextValuesis empty and the row is discarded (line 413). This loses valid reachable documents that happen to not expand the frontier further — e.g., at the maximum depth level or in converging graph structures.The row should always be added to
results;nextValuesshould only control queue expansion.🛠️ Proposed fix
- // Add row to results if the nextValues is not empty - if (!nextValues.isEmpty()) { - if (graphLookup.depthField != null) { - Object[] rowWithDepth = new Object[rowArray.length + 1]; - System.arraycopy(rowArray, 0, rowWithDepth, 0, rowArray.length); - rowWithDepth[rowArray.length] = currentLevelDepth; - results.add(rowWithDepth); - } else { - results.add(rowArray); - } - - // Add unvisited non-null values to queue for next level traversal - for (Object val : nextValues) { - if (val != null) { - visitedNodes.add(val); - queue.offer(val); - } + // Always add matched row to results + if (graphLookup.depthField != null) { + Object[] rowWithDepth = new Object[rowArray.length + 1]; + System.arraycopy(rowArray, 0, rowWithDepth, 0, rowArray.length); + rowWithDepth[rowArray.length] = currentLevelDepth; + results.add(rowWithDepth); + } else { + results.add(rowArray); + } + + // Add unvisited non-null values to queue for next level traversal + for (Object val : nextValues) { + if (val != null) { + visitedNodes.add(val); + queue.offer(val); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java` around lines 398 - 430, The loop over forwardResults currently drops rows when collectValues produces an empty nextValues (thus losing leaf nodes); change the logic in CalciteEnumerableGraphLookup so that each processed row (from forwardResults) is always added to results (respecting graphLookup.depthField by appending currentLevelDepth via the same rowWithDepth logic) and then use nextValues only to decide which values to add to visitedNodes and queue for further BFS expansion; keep existing uses of collectValues, visitedNodes, queue, results, forwardResults, and graphLookup.depthField but move the results.add(...) block out of the if (!nextValues.isEmpty()) conditional so queue expansion and result collection are handled independently.core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
2550-2551: Hard-coded 100-row cap on source rows already noted.
This matches a prior review concern; please address that feedback.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` around lines 2550 - 2551, Replace the hard-coded limit call in CalciteRelNodeVisitor (builder.limit(0, 100)) with a configurable source-row cap: introduce or read a configuration/query option (e.g., getSourceRowLimit() on the existing session/config object or a new constant DEFAULT_SOURCE_ROW_LIMIT) and call builder.limit(0, sourceRowLimit) using that value; ensure the value is validated (non-negative) and falls back to the default when unset so the limit becomes configurable rather than fixed at 100.
🧹 Nitpick comments (4)
ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java (1)
1658-1728: Consider adding test cases for optional graphLookup parameters.The positive tests cover
fromField,toField,maxDepth,startField,depthField, anddirection=bi. The error cases are solid (missingfromField, missing table, missingtoField). However, per coding guidelines, parser tests should include edge cases and boundary conditions. Consider adding:
- A case with
supportArray=true,batchMode=true, orfilter=(...)to verify the AST builder correctly processes these options.- A negative test for a missing
asclause, sinceAS outputFieldis mandatory in the grammar rule.As per coding guidelines: "Include boundary condition tests (min/max values, empty inputs) for all new functions" and "Include error condition tests (invalid inputs, exceptions) for all new functions."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java` around lines 1658 - 1728, Add positive and negative graphLookup test cases in testGraphLookupCommand: include at least one positive case using GraphLookup.builder()/plan(...) that sets optional flags supportArray=true, batchMode=true and a filter=(...) expression to assert the AST contains those options, and include a negative assertThrows(SyntaxCheckException.class or SemanticCheckException.class, () -> plan(...)) for a graphLookup invocation missing the mandatory "as" clause to validate parser error handling; locate tests around the existing testGraphLookupCommand method and add assertions similar to the existing ones so they exercise parsing and AstBuilder behavior for these options.ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1)
646-712: Good coverage of graphLookup anonymization variants.The test covers basic, maxDepth, depthField, bidirectional, startField, supportArray, batchMode, filter, and compound filter scenarios. One option is missing:
usePit. Consider adding a case for it for completeness.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java` around lines 646 - 712, Add a missing test case to testGraphLookup that verifies anonymization of the graphLookup usePit option: call anonymize on a query like "source=t | graphLookup employees fromField=manager toField=name usePit=true as reportingHierarchy" and assert the expected anonymized string contains "usePit=true" (e.g., "source=table | graphlookup table fromField=identifier toField=identifier direction=uni usePit=true as identifier"); place the new assert alongside the other graphLookup asserts in the testGraphLookup method and use the existing anonymize helper to produce the output.opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java (1)
351-437: Unbounded BFS can exhaust heap on dense graphs.
results,visitedNodes, andqueuegrow without bound. Once themaxDepth = -1bug is fixed to allow truly unlimited traversal, a dense graph could produce millions of results and OOM the node. The TODO at line 356 acknowledges this, but consider also:
- Adding a configurable
maxResultscap as a safety valve.- Adding a traversal timeout to prevent runaway queries.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java` around lines 351 - 437, performBfs can run unbounded and OOM on dense graphs; add protective limits: introduce a configurable maxResults cap and a traversal timeout check inside performBfs (use the existing results, visitedNodes, queue and currentLevelDepth scope) and enforce them after each batch/queryLookupTable call and before enqueuing new nodes; when maxResults is reached or the timeout expires, stop traversal (break the loop), log a warning and return the partial results (or throw a specific exception if caller should handle truncation), and expose the new config fields (e.g., graphLookup.maxResults and a timeoutMillis) so callers can tune safety limits.core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
2545-2604: Extract helper methods to keep visitGraphLookup under 50 lines.
The new method is now >50 lines; consider extracting parameter parsing and RelNode construction helpers for readability and consistency.As per coding guidelines: Flag methods >50 lines - Suggest extracting complex logic into helper methods.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` around lines 2545 - 2604, The visitGraphLookup method exceeds 50 lines; extract the parameter parsing and RelNode construction into small helpers to reduce complexity: create a helper like parseGraphLookupParameters(GraphLookup node, CalcitePlanContext context) that performs all Rex/Literal analysis and returns a small DTO (startFieldName, fromFieldName, toFieldName, outputFieldName, depthFieldName, maxDepthValue, bidirectional, supportArray, batchMode, usePIT, filterRex), and a builder helper like buildGraphLookupRelNode(RelNode sourceTable, RelNode lookupTable, ParsedGraphLookupParams params) that calls LogicalGraphLookup.create(...) and pushes/returns the result; keep existing calls to tryToRemoveMetaFields(context, true), visitChildren(node, context), analyze(node.getFromTable(), context), rexVisitor.analyze(...) but move their parsing logic into the new helpers and then have visitGraphLookup only orchestrate the steps and delegate to parseGraphLookupParameters and buildGraphLookupRelNode to bring visitGraphLookup under 50 lines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 2545-2546: Add a JavaDoc block to the public method
visitGraphLookup(GraphLookup node, CalcitePlanContext context) in class
CalciteRelNodeVisitor documenting both parameters with `@param`, the returned
RelNode with `@return`, and any runtime or checked exceptions the method can throw
using `@throws` (list the actual exceptions thrown by the implementation),
following the file's existing JavaDoc style; while here, refactor the long
method by extracting clear helpers such as extractGraphLookupParameters(),
buildSourceTable(), and buildLookupTable() to encapsulate parameter extraction
and table processing and update visitGraphLookup to call those helpers for
improved readability.
- Around line 2565-2568: The parser default for maxDepth should be changed from
Literal.ZERO to an explicit unbounded sentinel and the visitor should stop
coercing null to 0: update AstBuilder to set the default maxDepth to
Literal.of(-1) instead of Literal.ZERO, and in CalciteRelNodeVisitor (around
rexVisitor.analyze(node.getMaxDepth(), context) / node.getMaxDepth()) remove or
change the null-to-0 coercion (the line that does maxDepthValue = maxDepthValue
== null ? 0 : maxDepthValue) so that -1 is treated as the unbounded sentinel (or
simply assume non-null after the AstBuilder change); ensure references to
node.getMaxDepth, rexVisitor.analyze, and the maxDepthValue handling reflect the
new -1 semantics used by CalciteEnumerableGraphLookup.
In `@docs/user/ppl/index.md`:
- Line 87: Fix the markdown table row for the graphlookup command by adding a
single space before the closing pipe so it matches surrounding entries; locate
the row containing "[graphlookup command](cmd/graphlookup.md) | 3.6 |
experimental (since 3.6) | Performs recursive graph traversal on a collection
using a BFS algorithm." and ensure there is " ... BFS algorithm. |" (space
before final |) to keep table formatting consistent.
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`:
- Around line 452-477: The current query logic in CalciteEnumerableGraphLookup
is excluding whole documents by adding mustNot(getQueryBuilder(...,
visitedValues)) when graphLookup.supportArray is false; remove these mustNot
exclusions (both in the forward query using fromFieldIdx/visitedValues and in
the bidirectional backQuery using toFieldIdx/visitedValues) so matched documents
are returned even if their expansion would revisit nodes, and rely on the
in-memory visitedNodes logic in performBfs to prevent re-queuing; update
getQueryBuilder usage accordingly (leave calls to getQueryBuilder(toFieldIdx,
values) and getQueryBuilder(fromFieldIdx, values) intact) and add a brief
comment near graphLookup.supportArray handling explaining that de-duplication is
handled in performBfs.
- Around line 190-242: The constructor GraphLookupEnumerator currently mutates a
shared CalciteEnumerableIndexScan (lookupScan) by calling
lookupScan.pushDownContext.add for LIMIT and FILTER, which will accumulate
across enumerator instances; to fix, stop mutating the shared RelNode in the
enumerator: either move the push-down logic from GraphLookupEnumerator to the
owning Enumerable (GraphLookupEnumerable.enumerator() or its initializer) so
pushdowns are applied once per RelNode, or clone/copy the lookupScan at the
start of GraphLookupEnumerator (create a fresh CalciteEnumerableIndexScan
instance or use the same copy strategy as queryLookupTable) and apply
lookupScan.pushDownContext.add to that copy only; ensure references to
graphLookup.usePIT, lookupScan.pushDownContext.add, graphLookup.filter,
PredicateAnalyzer.analyze, and queryLookupTable are used to locate and implement
the change.
- Around line 215-221: The code does not validate that
lookupFields.indexOf(graphLookup.fromField) and indexOf(graphLookup.toField)
return valid indices; add checks after computing fromFieldIdx and toFieldIdx in
CalciteEnumerableGraphLookup to ensure each is >= 0 (and optionally <
lookupFields.size()), and if not throw a clear exception (e.g.,
IllegalArgumentException or a domain-specific validation exception) including
the missing field name and available lookupFields; mirror the existing
validation pattern used for startFieldIndex so later array access at lines that
use fromFieldIdx and toFieldIdx cannot cause ArrayIndexOutOfBoundsException.
In `@ppl/src/main/antlr/OpenSearchPPLParser.g4`:
- Around line 648-663: The visitor for graphLookupCommand
(visitGraphLookupCommand) currently allows duplicate options to silently
overwrite; mirror the pattern used in visitBinCommand by adding a Set<String>
seenParams in visitGraphLookupCommand, for each graphLookupOption extract a
unique key (e.g., option token name like START_FIELD, FROM_FIELD, TO_FIELD,
MAX_DEPTH, DEPTH_FIELD, DIRECTION, SUPPORT_ARRAY, BATCH_MODE, USE_PIT, FILTER)
and call checkParamDuplication(seenParams, paramName) before inserting it into
the command map/structure so duplicates cause the same IllegalArgumentException
as in visitBinCommand.
---
Duplicate comments:
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 2550-2551: Replace the hard-coded limit call in
CalciteRelNodeVisitor (builder.limit(0, 100)) with a configurable source-row
cap: introduce or read a configuration/query option (e.g., getSourceRowLimit()
on the existing session/config object or a new constant
DEFAULT_SOURCE_ROW_LIMIT) and call builder.limit(0, sourceRowLimit) using that
value; ensure the value is validated (non-negative) and falls back to the
default when unset so the limit becomes configurable rather than fixed at 100.
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`:
- Around line 542-546: The reset() method currently resets sourceEnumerator and
current but fails to clear the batchModeCompleted flag, so subsequent iterations
in batch mode immediately stop; update reset() to also set batchModeCompleted =
false (or otherwise reinitialize batchModeCompleted) so moveNextBatchMode() can
run again after reset() — locate the reset() method in
CalciteEnumerableGraphLookup and update it alongside the existing references to
sourceEnumerator, current, and the batchModeCompleted flag.
- Line 433: The BFS depth check in CalciteEnumerableGraphLookup incorrectly
stops when graphLookup.maxDepth == -1 (unlimited); change the condition in the
loop to only compare and potentially break when maxDepth is non-negative (e.g.
if (graphLookup.maxDepth >= 0 && ++currentLevelDepth > graphLookup.maxDepth)
break;) so that a negative maxDepth (like -1) is treated as unlimited; update
the check that uses currentLevelDepth and graphLookup.maxDepth accordingly.
- Around line 398-430: The loop over forwardResults currently drops rows when
collectValues produces an empty nextValues (thus losing leaf nodes); change the
logic in CalciteEnumerableGraphLookup so that each processed row (from
forwardResults) is always added to results (respecting graphLookup.depthField by
appending currentLevelDepth via the same rowWithDepth logic) and then use
nextValues only to decide which values to add to visitedNodes and queue for
further BFS expansion; keep existing uses of collectValues, visitedNodes, queue,
results, forwardResults, and graphLookup.depthField but move the
results.add(...) block out of the if (!nextValues.isEmpty()) conditional so
queue expansion and result collection are handled independently.
In `@ppl/src/main/antlr/OpenSearchPPLParser.g4`:
- Around line 1728-1734: The parser rule searchableKeyWord is missing four
graphLookup tokens (START_FIELD, SUPPORT_ARRAY, BATCH_MODE, USE_PIT) so they
cannot be used as field identifiers; update the searchableKeyWord alternative
list in OpenSearchPPLParser.g4 (the searchableKeyWord rule) to include
START_FIELD, SUPPORT_ARRAY, BATCH_MODE, and USE_PIT alongside FROM_FIELD,
TO_FIELD, MAX_DEPTH, DEPTH_FIELD, DIRECTION, UNI, and BI so these tokens are
treated as searchable identifiers.
---
Nitpick comments:
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 2545-2604: The visitGraphLookup method exceeds 50 lines; extract
the parameter parsing and RelNode construction into small helpers to reduce
complexity: create a helper like parseGraphLookupParameters(GraphLookup node,
CalcitePlanContext context) that performs all Rex/Literal analysis and returns a
small DTO (startFieldName, fromFieldName, toFieldName, outputFieldName,
depthFieldName, maxDepthValue, bidirectional, supportArray, batchMode, usePIT,
filterRex), and a builder helper like buildGraphLookupRelNode(RelNode
sourceTable, RelNode lookupTable, ParsedGraphLookupParams params) that calls
LogicalGraphLookup.create(...) and pushes/returns the result; keep existing
calls to tryToRemoveMetaFields(context, true), visitChildren(node, context),
analyze(node.getFromTable(), context), rexVisitor.analyze(...) but move their
parsing logic into the new helpers and then have visitGraphLookup only
orchestrate the steps and delegate to parseGraphLookupParameters and
buildGraphLookupRelNode to bring visitGraphLookup under 50 lines.
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`:
- Around line 351-437: performBfs can run unbounded and OOM on dense graphs; add
protective limits: introduce a configurable maxResults cap and a traversal
timeout check inside performBfs (use the existing results, visitedNodes, queue
and currentLevelDepth scope) and enforce them after each batch/queryLookupTable
call and before enqueuing new nodes; when maxResults is reached or the timeout
expires, stop traversal (break the loop), log a warning and return the partial
results (or throw a specific exception if caller should handle truncation), and
expose the new config fields (e.g., graphLookup.maxResults and a timeoutMillis)
so callers can tune safety limits.
In `@ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java`:
- Around line 1658-1728: Add positive and negative graphLookup test cases in
testGraphLookupCommand: include at least one positive case using
GraphLookup.builder()/plan(...) that sets optional flags supportArray=true,
batchMode=true and a filter=(...) expression to assert the AST contains those
options, and include a negative assertThrows(SyntaxCheckException.class or
SemanticCheckException.class, () -> plan(...)) for a graphLookup invocation
missing the mandatory "as" clause to validate parser error handling; locate
tests around the existing testGraphLookupCommand method and add assertions
similar to the existing ones so they exercise parsing and AstBuilder behavior
for these options.
In
`@ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java`:
- Around line 646-712: Add a missing test case to testGraphLookup that verifies
anonymization of the graphLookup usePit option: call anonymize on a query like
"source=t | graphLookup employees fromField=manager toField=name usePit=true as
reportingHierarchy" and assert the expected anonymized string contains
"usePit=true" (e.g., "source=table | graphlookup table fromField=identifier
toField=identifier direction=uni usePit=true as identifier"); place the new
assert alongside the other graphLookup asserts in the testGraphLookup method and
use the existing anonymize helper to produce the output.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
core/src/main/java/org/opensearch/sql/analysis/Analyzer.javacore/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javadocs/user/ppl/index.mdinteg-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.javainteg-test/src/test/java/org/opensearch/sql/legacy/TestsConstants.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.javappl/src/main/antlr/OpenSearchPPLLexer.g4ppl/src/main/antlr/OpenSearchPPLParser.g4ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.javappl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.javappl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.javappl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
🚧 Files skipped from review as they are similar to previous changes (5)
- ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
- integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java
- integ-test/src/test/java/org/opensearch/sql/legacy/TestsConstants.java
- core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
- core/src/main/java/org/opensearch/sql/analysis/Analyzer.java
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Show resolved
Hide resolved
...h/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java
Show resolved
Hide resolved
| try { | ||
| List<String> sourceFields = graphLookup.getSource().getRowType().getFieldNames(); | ||
| this.lookupFields = graphLookup.getLookup().getRowType().getFieldNames(); | ||
| this.startFieldIndex = sourceFields.indexOf(graphLookup.getStartField()); | ||
| this.fromFieldIdx = lookupFields.indexOf(graphLookup.fromField); | ||
| this.toFieldIdx = lookupFields.indexOf(graphLookup.toField); | ||
|
|
There was a problem hiding this comment.
No validation that fromFieldIdx / toFieldIdx resolve to valid indices.
lookupFields.indexOf(...) returns -1 if the field is not found. Unlike startFieldIndex (which is bounds-checked at lines 289/330), fromFieldIdx and toFieldIdx are used directly as array indices at lines 400 and 408, which would throw ArrayIndexOutOfBoundsException at runtime.
🛠️ Proposed fix
this.fromFieldIdx = lookupFields.indexOf(graphLookup.fromField);
this.toFieldIdx = lookupFields.indexOf(graphLookup.toField);
+ if (this.fromFieldIdx < 0) {
+ throw new IllegalArgumentException(
+ "fromField '" + graphLookup.fromField + "' not found in lookup schema: " + lookupFields);
+ }
+ if (this.toFieldIdx < 0) {
+ throw new IllegalArgumentException(
+ "toField '" + graphLookup.toField + "' not found in lookup schema: " + lookupFields);
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try { | |
| List<String> sourceFields = graphLookup.getSource().getRowType().getFieldNames(); | |
| this.lookupFields = graphLookup.getLookup().getRowType().getFieldNames(); | |
| this.startFieldIndex = sourceFields.indexOf(graphLookup.getStartField()); | |
| this.fromFieldIdx = lookupFields.indexOf(graphLookup.fromField); | |
| this.toFieldIdx = lookupFields.indexOf(graphLookup.toField); | |
| try { | |
| List<String> sourceFields = graphLookup.getSource().getRowType().getFieldNames(); | |
| this.lookupFields = graphLookup.getLookup().getRowType().getFieldNames(); | |
| this.startFieldIndex = sourceFields.indexOf(graphLookup.getStartField()); | |
| this.fromFieldIdx = lookupFields.indexOf(graphLookup.fromField); | |
| this.toFieldIdx = lookupFields.indexOf(graphLookup.toField); | |
| if (this.fromFieldIdx < 0) { | |
| throw new IllegalArgumentException( | |
| "fromField '" + graphLookup.fromField + "' not found in lookup schema: " + lookupFields); | |
| } | |
| if (this.toFieldIdx < 0) { | |
| throw new IllegalArgumentException( | |
| "toField '" + graphLookup.toField + "' not found in lookup schema: " + lookupFields); | |
| } | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`
around lines 215 - 221, The code does not validate that
lookupFields.indexOf(graphLookup.fromField) and indexOf(graphLookup.toField)
return valid indices; add checks after computing fromFieldIdx and toFieldIdx in
CalciteEnumerableGraphLookup to ensure each is >= 0 (and optionally <
lookupFields.size()), and if not throw a clear exception (e.g.,
IllegalArgumentException or a domain-specific validation exception) including
the missing field name and available lookupFields; mirror the existing
validation pattern used for startFieldIndex so later array access at lines that
use fromFieldIdx and toFieldIdx cannot cause ArrayIndexOutOfBoundsException.
👋 Leave emoji reaction (👍/👎) to track effectiveness of CodeRabbit.
| // Forward direction query | ||
| QueryBuilder query; | ||
| if (graphLookup.supportArray) { | ||
| // When supportArray is true, don't push down visited filter | ||
| // because array fields may contain multiple values that need to be checked individually | ||
| query = getQueryBuilder(toFieldIdx, values); | ||
| } else { | ||
| query = | ||
| boolQuery() | ||
| .must(getQueryBuilder(toFieldIdx, values)) | ||
| .mustNot(getQueryBuilder(fromFieldIdx, visitedValues)); | ||
| } | ||
|
|
||
| if (graphLookup.bidirectional) { | ||
| // Also query fromField for bidirectional traversal | ||
| QueryBuilder backQuery; | ||
| if (graphLookup.supportArray) { | ||
| backQuery = getQueryBuilder(fromFieldIdx, values); | ||
| } else { | ||
| backQuery = | ||
| boolQuery() | ||
| .must(getQueryBuilder(fromFieldIdx, values)) | ||
| .mustNot(getQueryBuilder(toFieldIdx, visitedValues)); | ||
| } | ||
| query = QueryBuilders.boolQuery().should(query).should(backQuery); | ||
| } |
There was a problem hiding this comment.
mustNot(fromField IN visitedValues) can exclude valid result documents.
When supportArray is false, the query excludes entire documents whose fromField value has already been visited (lines 460–462). This is an optimization to avoid re-traversing visited nodes, but it also prevents those documents from appearing in the result set at all. A document reachable via toField is a valid BFS result even if its fromField leads back to an already-visited node (i.e., a leaf or converging edge).
For MongoDB $graphLookup–compatible semantics, all matched documents should be returned regardless of whether they expand the frontier. Consider removing the mustNot exclusion (relying on the in-memory visitedNodes set in performBfs to prevent re-queueing) or documenting that supportArray=false intentionally trades result completeness for performance.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`
around lines 452 - 477, The current query logic in CalciteEnumerableGraphLookup
is excluding whole documents by adding mustNot(getQueryBuilder(...,
visitedValues)) when graphLookup.supportArray is false; remove these mustNot
exclusions (both in the forward query using fromFieldIdx/visitedValues and in
the bidirectional backQuery using toFieldIdx/visitedValues) so matched documents
are returned even if their expansion would revisit nodes, and rely on the
in-memory visitedNodes logic in performBfs to prevent re-queuing; update
getQueryBuilder usage accordingly (leave calls to getQueryBuilder(toFieldIdx,
values) and getQueryBuilder(fromFieldIdx, values) intact) and add a brief
comment near graphLookup.supportArray handling explaining that de-duplication is
handled in performBfs.
👋 Leave emoji reaction (👍/👎) to track effectiveness of CodeRabbit.
Description
The
graphlookupcommand performs recursive graph traversal on OpenSearch indices using a breadth-first search (BFS) algorithm.Syntax
Option 1
Example Query
Option 2 (follow MongoDB naming convention)
Example Query
Performance
OpenSearch
graphlookupcommand has implemented several features, which make it more applicable on RAG scenarios as well as suitable for different requirements.direction=uniandusePIT=true, OpenSearchgraphlookupachieves the same result as MongoDB's$graphlookupaggregation command with 5-6x faster on SNB SF-1 dataset, 29x faster on SNB SF-10. And OpenSearch can finish all benchmark on SNB SF-10 in 734ms, SNB SF-30 in 8000ms while MongoDB fails to complete.batchMode, which is more compatible with the requirements of RAG. Apart from already deduped on the final results, it also gets 58x faster than MongoDB on SNB SF-1 dataset. And also OpenSearch can finish multiple start value benchmark on SNB SF-10 in 3508ms, on SNB SF-30 in 10992ms while MongoDB fails to complete.PITandnon-PITmode, the latter will get better performance on large dataset by sacrificing the accuracy. It gets 5x faster than the using PIT mode on SNB SF-30 but getting 27% nodes missed.Related Issues
Resolves #5088
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.