Skip to content

feat: Implement PPL convert command with 5 conversion functions#5157

Open
ritvibhatt wants to merge 37 commits intoopensearch-project:mainfrom
ritvibhatt:ppl-convert
Open

feat: Implement PPL convert command with 5 conversion functions#5157
ritvibhatt wants to merge 37 commits intoopensearch-project:mainfrom
ritvibhatt:ppl-convert

Conversation

@ritvibhatt
Copy link
Contributor

@ritvibhatt ritvibhatt commented Feb 19, 2026

Description

Copy of #5034, resolves merge conflicts

Related Issues

Resolves partially (only numeric and memory conversions in this PR) #5001

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

- Added convert command syntax and AST nodes (Convert, ConvertFunction)
- Implemented 5 conversion functions: auto, num, rmcomma, rmunit, none
- Full Calcite pushdown support via CalciteRelNodeVisitor
- Logical and physical operators (LogicalConvert, ConvertOperator)
- Comprehensive test coverage (26 tests total):
  * 13 unit tests (CalcitePPLConvertTest)
  * 8 integration tests with pushdown (ConvertCommandIT)
  * 8 non-pushdown tests (CalciteConvertCommandIT)
  * 3 explain tests (ExplainIT)
  * 2 cross-cluster tests (CrossClusterSearchIT)
  * Anonymizer and v2 unsupported tests
- User documentation (docs/user/ppl/cmd/convert.md)
- Code cleanup: removed dead code, simplified javadocs
- Version: 3.5 (experimental)

All tests compile and pass successfully.

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Centralized common conversion function logic into BaseConversionUDF base class,
reducing code from 219 to 162 lines (26% reduction) while maintaining all functionality.

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
…d separating concerns into each respective function class

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Replace ConvertFunction class with standard Let expressions to
handle none() at AST level and reduce code complexity.

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
…tCommand

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Add testNumConvertNullAndEmpty() to ensure NumConvertFunction.convert()
properly returns null for null, empty string, and whitespace-only inputs,
matching the coverage pattern of other conversion functions.

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Add testRmunitConvertNullAndEmpty() to ensure RmunitConvertFunction.convert()
properly returns null for null, empty string, and whitespace-only inputs,
completing the test coverage pattern across all conversion functions.

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Update testConvertWithAlias to verify that using AS clause preserves
the original field alongside the converted field. Test now checks both
balance (bigint) and balance_num (double) are present in results.

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🧹 Nitpick comments (9)
integ-test/src/test/java/org/opensearch/sql/security/CrossClusterSearchIT.java (1)

217-222: Assertions only verify column presence — consider asserting converted values.

Both tests call only verifyColumn(result, columnName(...)), which confirms the output column exists but does not validate that auto(balance) actually produced correct converted values. Compare with the existing testCrossClusterSortWithTypeCasting (line 162), which uses verifyDataRows to assert specific numeric values after a similar conversion. A column-name-only check would pass even if the conversion silently returned nulls or wrong types for every row.

Consider adding a verifyDataRows call with representative expected balance values (the same 7 rows already expected in testCrossClusterSortWithTypeCasting), or at minimum assert that the result is non-empty and the converted column is of numeric type.

As per coding guidelines, "Check test assertions are meaningful and specific."

Also applies to: 231-236

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@integ-test/src/test/java/org/opensearch/sql/security/CrossClusterSearchIT.java`
around lines 217 - 222, Current assertions in CrossClusterSearchIT call
verifyColumn(result, columnName("balance")) after executeQuery(...) which only
checks column presence; update the tests to assert actual converted numeric
values by invoking verifyDataRows(result, expectedRows...) using the same 7
expected balance values used in testCrossClusterSortWithTypeCasting, or at
minimum add assertions that the result is non-empty and that each value in the
converted "balance" column is numeric; locate the call sites around
executeQuery/String.format(...) with TEST_INDEX_BANK_REMOTE and replace or
extend the verifyColumn(...) checks with verifyDataRows(...) (or additional
numeric-type/non-empty assertions) to make the test meaningful.
integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java (1)

450-459: Consider adding a verifyDataRows call for a more specific assertion.

Both new tests assert only column presence and schema type, which is consistent with the existing pattern in this class (e.g., testCrossClusterFieldsSpaceDelimited). However, the coding guidelines ask that integration test assertions be "meaningful and specific." Adding at least one row-value check would strengthen confidence that actual values are being converted correctly across the cluster boundary, not just that the projected type is correct.

💡 Optional: add row-value assertion
  `@Test`
  public void testCrossClusterConvertWithAlias() throws IOException {
    JSONObject result =
        executeQuery(
            String.format(
-               "search source=%s | convert auto(balance) AS balance_num | fields balance_num",
+               "search source=%s | where firstname='Hattie' | convert auto(balance) AS balance_num | fields balance_num",
                TEST_INDEX_BANK_REMOTE));
    verifyColumn(result, columnName("balance_num"));
    verifySchema(result, schema("balance_num", "double"));
+   verifyDataRows(result, rows(5686.0));
  }

The same pattern could apply to testCrossClusterConvert.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java`
around lines 450 - 459, Add a concrete row-value assertion to
testCrossClusterConvertWithAlias: after the existing verifyColumn(result,
columnName("balance_num")) and verifySchema(result, schema("balance_num",
"double")), call verifyDataRows(result, rows("balance_num",
"<expected_double>")) to assert at least one converted numeric value is correct
(replace <expected_double> with the known balance value from
TEST_INDEX_BANK_REMOTE as a double); reference symbols:
testCrossClusterConvertWithAlias, executeQuery, verifyColumn, verifySchema,
verifyDataRows.
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLConvertTest.java (1)

1-272: Missing negative test cases required by PPL guidelines

This file path falls under **/ppl/**/*.java, which mandates verifying grammar tests with positive and negative cases. The entire suite is positive-path only. There are no tests verifying parser/planner behavior for:

  • An unrecognised conversion function name (e.g., convert unknown_fn(SAL))
  • A missing argument (e.g., convert auto())
  • A non-existent source field (e.g., convert auto(NONEXISTENT_COL))
  • none() called with an alias that shadows an existing column

At minimum, an exception/error-propagation test for an invalid function name should be added to confirm the parser rejects it cleanly.

As per coding guidelines, For PPL parser changes, verify grammar tests with positive/negative cases.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLConvertTest.java`
around lines 1 - 272, The test suite in CalcitePPLConvertTest only covers
positive convert cases; add negative grammar tests to validate error handling
(at minimum a test that asserts an invalid conversion function is rejected). In
CalcitePPLConvertTest add a new `@Test` method (e.g., testConvertInvalidFunction)
that calls getRelNode("source=EMP | convert unknown_fn(SAL)") and asserts an
exception is thrown (use assertThrows or the existing test harness), and
similarly add tests for missing argument ("convert auto()"), non-existent field
("convert auto(NONEXISTENT_COL)"), and alias shadowing with none() (e.g.,
"convert none(EMPNO) AS EMPNO") to verify the parser/planner raises/propagates
errors instead of succeeding; use getRelNode and the existing verification
helpers (verifyLogical/verifyPPLToSparkSQL) pattern for positive vs negative
assertions.
core/src/main/java/org/opensearch/sql/expression/function/udf/BaseConversionUDF.java (1)

25-53: Missing JavaDoc on public API methods.

Per coding guidelines, public methods in core/src/main/java/**/*.java must have JavaDoc with @param, @return, and @throws. The public methods getReturnTypeInference(), getOperandMetadata(), and convertValue() (which is the core template method) lack documentation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/BaseConversionUDF.java`
around lines 25 - 53, Add JavaDoc for the public API methods: add Javadoc
comments to BaseConversionUDF#getReturnTypeInference() and
BaseConversionUDF#getOperandMetadata() including a brief description and
appropriate `@return` tags; also locate the core template method convertValue(...)
(in this class or ConversionImplementor) and add Javadoc describing its purpose,
parameters (`@param` for each argument), return value (`@return`) and any exceptions
it can throw (`@throws`). Ensure the Javadoc text follows project conventions and
documents nullability/behavior where relevant.
core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java (1)

425-427: Optional: Add JavaDoc to visitConvert.

All other visitor method stubs in this class lack JavaDoc; this new method follows that same pattern. However, the core/src/main/java/**/*.java guideline requires public methods to have JavaDoc with @param, @return, and @throws. Consider adding it here (and ideally addressing the systemic gap in a follow-up).

📝 Suggested JavaDoc
+  /**
+   * Visit Convert node.
+   *
+   * `@param` node {`@link` Convert}
+   * `@param` context Context
+   * `@return` Return Type.
+   */
   public T visitConvert(Convert node, C context) {
     return visitChildren(node, context);
   }

As per coding guidelines: "Public methods MUST have JavaDoc with @param, @return, and @throws" (applicable to core/src/main/java/**/*.java).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java` around
lines 425 - 427, Add JavaDoc to the public method visitConvert in
AbstractNodeVisitor: document the method purpose and include `@param` for the
Convert node and context (C context), `@return` describing the returned generic T,
and `@throws` for any runtime exceptions that may be propagated; ensure the
JavaDoc style matches other methods in AbstractNodeVisitor and complies with the
core/src/main/java/**/*.java requirement for public methods to have `@param`,
`@return`, and `@throws` tags.
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1)

1046-1062: memk() conversion function has no anonymizer test coverage.

All other conversion functions (auto, num, rmcomma, rmunit, none) have at least one assertion in testConvertCommand, but memk is absent. Add a case such as:

+    assertEquals(
+        "source=table | convert memk(identifier)",
+        anonymize("source=t | convert memk(memory)"));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java`
around lines 1046 - 1062, The testConvertCommand in PPLQueryDataAnonymizerTest
is missing coverage for the memk() conversion; add an assertion that calls
anonymize with an input using memk (e.g., "source=t | convert memk(field)") and
expects the anonymized output to map the field to the generic identifier form
consistent with other cases (e.g., "source=table | convert memk(identifier)" or
with AS when appropriate). Update the testConvertCommand method to include this
new assertEquals using the existing anonymize helper so memk conversion is
validated alongside auto, num, rmcomma, rmunit, and none.
core/src/main/java/org/opensearch/sql/ast/tree/Convert.java (1)

19-26: Avoid public setters to keep the AST node immutable.

@Setter exposes mutable state outside attach, which weakens immutability guarantees for AST nodes. Consider removing the class-level @Setter to keep mutation scoped.

♻️ Proposed change
-@Setter
 `@ToString`
 `@EqualsAndHashCode`(callSuper = false)
 `@RequiredArgsConstructor`
 public class Convert extends UnresolvedPlan {
As per coding guidelines: `core/src/main/java/org/opensearch/sql/ast/**/*.java`: AST nodes must be immutable where possible.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/opensearch/sql/ast/tree/Convert.java` around lines 19
- 26, Remove the class-level `@Setter` to prevent broad mutation of the AST node;
keep `@Getter/`@ToString/@EqualsAndHashCode/@RequiredArgsConstructor. Replace the
broad setter by adding a narrowly scoped mutator (e.g., a package-private method
like attach(UnresolvedPlan child) or attachChild(UnresolvedPlan child)) that
sets the private field child so only callers in the same package (or the
intended attach logic) can modify it; leave conversions as final and immutable
via the existing field.
core/src/test/java/org/opensearch/sql/expression/function/udf/ConversionFunctionsTest.java (2)

316-327: RmcommaConvertFunction returns null for all scientific-notation extreme values.

Lines 318–324 assert null for strings like "1.7976931348623157E308" and "1e309" because the E/e character triggers containsLetter(). This means rmcomma silently rejects any scientific-notation input. If this is intentional, a brief comment in the test explaining why would help future maintainers — containsLetter treats e as alphabetic, so all exponential representations are rejected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/org/opensearch/sql/expression/function/udf/ConversionFunctionsTest.java`
around lines 316 - 327, The test currently asserts null for scientific-notation
strings because RmcommaConvertFunction.convert uses containsLetter(), which
treats 'e'/'E' as alphabetic and thus rejects exponential notation; update the
test method testRmcommaConvertNumericExtremes to include a brief inline comment
(near the assertions referencing RmcommaConvertFunction.convert) stating that
scientific/exponential formats are intentionally rejected due to
containsLetter() treating 'e' as a letter, so these inputs return null, to help
future maintainers understand the behavior.

228-236: Special-value tests (Infinity, NaN, ) only cover Num and Auto.

RmunitConvertFunction, RmcommaConvertFunction, and MemkConvertFunction are not tested against these special value strings. Per guidelines, error condition tests should be included for all new functions. Consider extending this test (or adding a sibling) to assert the expected behavior for the remaining three functions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/org/opensearch/sql/expression/function/udf/ConversionFunctionsTest.java`
around lines 228 - 236, Extend the special-value test so RmunitConvertFunction,
RmcommaConvertFunction, and MemkConvertFunction are asserted to behave the same
as NumConvertFunction/AutoConvertFunction for the strings "∞", "Infinity", and
"NaN"; update ConversionFunctionsTest.testSpecialValues (or add a sibling test)
to call RmunitConvertFunction.convert(...), RmcommaConvertFunction.convert(...),
and MemkConvertFunction.convert(...) for each special string and assertNull on
each result to cover all new functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java`:
- Around line 303-313: The enum entries CTIME, MKTIME, DUR2SEC, and MSTIME in
BuiltinFunctionName are declared but have no implementations or registrations
and will fail at runtime; either implement corresponding ConvertFunction classes
and register them in PPLFuncImpTable (similar to AUTO/NUM/MEMK/RMUNIT/RMCOMMA)
and add unit tests, or remove these four enum entries from BuiltinFunctionName
to avoid exposing unimplemented functions (note NONE is handled as a
special-case in AstBuilder.java and can remain). Ensure any new implementations
follow the existing ConvertFunction pattern and are included in test coverage
and the function dispatch table.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/AutoConvertFunction.java`:
- Around line 11-17: Add JavaDoc comments for the public AutoConvertFunction()
constructor and the public static convert(Object value) entrypoint: for
AutoConvertFunction(), add a short description and an `@throws` tag if the
constructor can propagate any runtime exceptions (or explicitly state none are
thrown); for convert(Object) add a description, an `@param` value explaining
accepted input, an `@return` describing the conversion result, and an `@throws`
documenting the runtime exception(s) that may be raised (e.g., if
convertValue(value) fails). Reference the constructor AutoConvertFunction(), the
static method convert(Object), and the instance helper convertValue(Object) in
the comments so maintainers can trace behavior.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/MemkConvertFunction.java`:
- Around line 11-17: Add JavaDoc comments for the public MemkConvertFunction()
constructor and the public static convert(Object value) entrypoint: for
MemkConvertFunction() document its purpose and any side effects and include
`@throws` if it can throw (or explicitly state none), and for convert(Object
value) document the parameter (`@param` value the input to convert), the return
value (`@return` the converted object or null) and any exceptions it can propagate
(`@throws` ... or state none); reference the convertValue(Object) helper in the
convert method description to clarify behavior.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/NumConvertFunction.java`:
- Around line 11-17: Add JavaDoc for the public API: document the
NumConvertFunction() constructor and the static method convert(Object value)
with `@param` describing the value argument, `@return` describing the returned
converted numeric/object result, and `@throws` for any runtime exceptions that
convertValue may propagate; reference the NumConvertFunction.convert(Object)
method and the NumConvertFunction() constructor and ensure the JavaDoc text
matches coding guidelines for public methods.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/RmcommaConvertFunction.java`:
- Around line 11-17: Add JavaDoc to the public no-arg constructor
RmcommaConvertFunction() and the static entrypoint convert(Object value): for
the constructor, add a short description of the class initialization and include
any `@throws` if construction can fail; for the static convert method, add a
description, an `@param` value describing accepted input types, an `@return`
describing the converted value (or null) and an `@throws` documenting the
exceptions that may be raised during conversion (e.g., IllegalArgumentException
for invalid input or RuntimeException for conversion errors). Reference the
convertValue(Object) call inside convert(Object) so the JavaDoc reflects that
the static method delegates to convertValue for the actual conversion.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/RmunitConvertFunction.java`:
- Around line 11-17: Add JavaDoc comments for the public constructor
RmunitConvertFunction() and the static entrypoint method convert(Object) that
document parameters, return value and possible exceptions; for the constructor
include a brief description and note any thrown runtime exceptions if
applicable, and for convert(Object) add `@param` value describing accepted inputs,
`@return` describing the returned converted object (or null) and `@throws` entries
for any runtime exceptions propagated by convertValue(Object) so the JavaDoc
satisfies the project's `@param/`@return/@throws requirement while referencing the
convertValue(Object) behavior.

In
`@core/src/test/java/org/opensearch/sql/expression/function/udf/ConversionFunctionsTest.java`:
- Around line 175-215: Add missing zero-boundary and numeric-extreme test cases
for MemkConvertFunction in ConversionFunctionsTest: create a test method (e.g.,
testMemkConvertZeroBoundaries) that asserts convert("0"), convert("0k"),
convert("0m"), convert("0g"), convert("+0") and convert("-0") return 0.0 (or
-0.0 as appropriate), and another method (e.g., testMemkConvertNumericExtremes)
that asserts convert(String.valueOf(Double.MAX_VALUE)) returns Double.MAX_VALUE
and that overflow/invalid huge values like "1e309" and "1e309k" return null; use
MemkConvertFunction.convert references to locate where to add these tests.

In `@docs/user/ppl/cmd/convert.md`:
- Around line 84-100: The examples use non-existent indices
(source=system_metrics in the convert example and source=metrics in the other
example) so update both examples to use the existing test index source=accounts
and simulate the intended fields with eval; locate the convert example
containing the pipeline with "source=system_metrics | eval memory='100m' |
convert memk(memory)" and the other example referencing "source=metrics" and
replace their source clauses with "source=accounts" and add an eval that creates
the memory/metric field values (e.g., eval memory='100m' or eval
<field>=<value>) so the examples run against the standard test data while
keeping the rest of the pipeline unchanged.
- Around line 60-78: The example's displayed row count is wrong for the query
`source=accounts | eval price='1,234' | convert num(price) | fields price`: it
will produce one result row per document in the index (all with price=1234.0).
Either append `| head 1` to the example query to legitimately show "fetched rows
/ total rows = 1/1", or update the shown result block to reflect the full-row
counts (e.g., fetched rows / total rows = N/N) and keep the original query;
modify the example in convert.md accordingly so the query and displayed result
are consistent.
- Around line 237-240: Fix the markdown table closing fence by replacing the
trailing pipe character in the closing separator string
"+------------+-----------+----------|" with a plus so it becomes
"+------------+-----------+----------+"; look for the table block that contains
the rows with "6 | Hattie | Bond" and "13 | Nanette | Bates" and update the
final separator line accordingly to correct the table formatting.
- Around line 19-21: The docs currently list `*` as a supported value for the
`<field>` parameter in the convert command table, but wildcard support is not
implemented (see the "when wildcard support is implemented" comment later);
update the `<field>` row in the parameter table to either remove the `*` example
or explicitly mark it as "not yet supported" (e.g., change "Single field name to
convert, or `*` to convert all fields." to "Single field name to convert;
wildcard `*` not yet supported."). Ensure the text mentions `*` and `<field>` so
reviewers can find and verify the change, and keep the rest of the table
unchanged.

In `@integ-test/src/test/java/org/opensearch/sql/ppl/ConvertCommandIT.java`:
- Around line 72-75: The test method verifyQueryThrowsCalciteError uses
assertThrows but the JUnit 5 static import is missing; add a static import for
org.junit.jupiter.api.Assertions.assertThrows at the top of the test file so the
call in verifyQueryThrowsCalciteError(...) resolves correctly when using JUnit 5
`@Test`.

In `@integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java`:
- Around line 229-238: The test testConvertCommand currently only calls
verifyQuery(result) in the ResponseException catch path, so when executeQuery
succeeds (Calcite enabled) nothing is asserted; change the flow to capture the
successful response into the existing JSONObject result inside the try (assign
the executeQuery return into result) and move a single verifyQuery(result) call
outside the try/catch so it runs for both success and failure; keep the catch
building result from TestUtils.getResponseBody(e.getResponse()) as it does now
so verifyQuery sees the error response when an exception occurs.

In
`@integ-test/src/test/java/org/opensearch/sql/security/CrossClusterSearchIT.java`:
- Around line 213-239: Both testCrossClusterConvert and
testCrossClusterConvertWithAlias enable Calcite with enableCalcite() but call
disableCalcite() only on the happy path; wrap the body of each test in a try {
... } finally { disableCalcite(); } so disableCalcite() always runs even if
executeQuery(...) or verifyColumn(...) throws; locate the two test methods
(testCrossClusterConvert and testCrossClusterConvertWithAlias) and move their
enableCalcite() to before the try and ensure the executeQuery/verifyColumn calls
are inside the try block with disableCalcite() in the finally.

In `@ppl/src/main/antlr/OpenSearchPPLLexer.g4`:
- Line 163: The TIMEFORMAT lexer token is unused; either remove the TIMEFORMAT
token definition from OpenSearchPPLLexer.g4 to eliminate dead code, or if it's
intended, add corresponding parser grammar rules and/or Java references that
consume TIMEFORMAT (e.g., incorporate it into relevant parser rules that parse
time formats or make the token part of an existing rule), then regenerate the
lexer/parser and update any tests; search for TIMEFORMAT in the project to
confirm no existing usage before deleting or integrate it where parser rules
expect a time format token.

In `@ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLConvertTest.java`:
- Around line 257-271: The test method testConvertAutoWithMemoryField is
misnamed because it uses auto(JOB) aliased as memory_size while JOB is a string
job-title field; rename the test to reflect aliasing behavior (e.g.,
testConvertAutoAliasOnStringField or testConvertAutoWithAlias) or change the
input to use a memory-like field (e.g., replace JOB with a numeric/string
memory_field that contains values like "10k") so the name matches behavior;
update the method name testConvertAutoWithMemoryField and any associated
expected strings (retain auto(JOB) and memory_size if renaming, or replace
auto(JOB)/memory_size occurrences if you change the field) to keep assertions
consistent.
- Around line 1-272: The test class CalcitePPLConvertTest lacks required
NULL-input, boundary-condition, and error-condition tests for the five
conversion functions (auto, num, rmcomma, rmunit, memk); add unit tests that
exercise each function with (1) a nullable field (NULL-input) e.g., use COMM or
another nullable column and assert getRelNode(ppl) + verifyLogical(root, ...)
and verifyPPLToSparkSQL(root, ...), (2) a boundary/empty input case (e.g.,
rmcomma applied to fields that can be empty or empty-string equivalent) and
assert plan/SQL, and (3) an error-condition test that passes an invalid function
or argument and expects a parse/validation exception (use `@Test`(expected =
SomeParseException.class) around getRelNode(ppl)); implement these tests in
CalcitePPLConvertTest using the existing helpers getRelNode, verifyLogical, and
verifyPPLToSparkSQL and name them clearly like testConvertAutoWithNullInput,
testConvertNumBoundaryEmpty, testConvertInvalidFunctionThrows, etc.

---

Nitpick comments:
In `@core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java`:
- Around line 425-427: Add JavaDoc to the public method visitConvert in
AbstractNodeVisitor: document the method purpose and include `@param` for the
Convert node and context (C context), `@return` describing the returned generic T,
and `@throws` for any runtime exceptions that may be propagated; ensure the
JavaDoc style matches other methods in AbstractNodeVisitor and complies with the
core/src/main/java/**/*.java requirement for public methods to have `@param`,
`@return`, and `@throws` tags.

In `@core/src/main/java/org/opensearch/sql/ast/tree/Convert.java`:
- Around line 19-26: Remove the class-level `@Setter` to prevent broad mutation of
the AST node; keep
`@Getter/`@ToString/@EqualsAndHashCode/@RequiredArgsConstructor. Replace the broad
setter by adding a narrowly scoped mutator (e.g., a package-private method like
attach(UnresolvedPlan child) or attachChild(UnresolvedPlan child)) that sets the
private field child so only callers in the same package (or the intended attach
logic) can modify it; leave conversions as final and immutable via the existing
field.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/BaseConversionUDF.java`:
- Around line 25-53: Add JavaDoc for the public API methods: add Javadoc
comments to BaseConversionUDF#getReturnTypeInference() and
BaseConversionUDF#getOperandMetadata() including a brief description and
appropriate `@return` tags; also locate the core template method convertValue(...)
(in this class or ConversionImplementor) and add Javadoc describing its purpose,
parameters (`@param` for each argument), return value (`@return`) and any exceptions
it can throw (`@throws`). Ensure the Javadoc text follows project conventions and
documents nullability/behavior where relevant.

In
`@core/src/test/java/org/opensearch/sql/expression/function/udf/ConversionFunctionsTest.java`:
- Around line 316-327: The test currently asserts null for scientific-notation
strings because RmcommaConvertFunction.convert uses containsLetter(), which
treats 'e'/'E' as alphabetic and thus rejects exponential notation; update the
test method testRmcommaConvertNumericExtremes to include a brief inline comment
(near the assertions referencing RmcommaConvertFunction.convert) stating that
scientific/exponential formats are intentionally rejected due to
containsLetter() treating 'e' as a letter, so these inputs return null, to help
future maintainers understand the behavior.
- Around line 228-236: Extend the special-value test so RmunitConvertFunction,
RmcommaConvertFunction, and MemkConvertFunction are asserted to behave the same
as NumConvertFunction/AutoConvertFunction for the strings "∞", "Infinity", and
"NaN"; update ConversionFunctionsTest.testSpecialValues (or add a sibling test)
to call RmunitConvertFunction.convert(...), RmcommaConvertFunction.convert(...),
and MemkConvertFunction.convert(...) for each special string and assertNull on
each result to cover all new functions.

In
`@integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java`:
- Around line 450-459: Add a concrete row-value assertion to
testCrossClusterConvertWithAlias: after the existing verifyColumn(result,
columnName("balance_num")) and verifySchema(result, schema("balance_num",
"double")), call verifyDataRows(result, rows("balance_num",
"<expected_double>")) to assert at least one converted numeric value is correct
(replace <expected_double> with the known balance value from
TEST_INDEX_BANK_REMOTE as a double); reference symbols:
testCrossClusterConvertWithAlias, executeQuery, verifyColumn, verifySchema,
verifyDataRows.

In
`@integ-test/src/test/java/org/opensearch/sql/security/CrossClusterSearchIT.java`:
- Around line 217-222: Current assertions in CrossClusterSearchIT call
verifyColumn(result, columnName("balance")) after executeQuery(...) which only
checks column presence; update the tests to assert actual converted numeric
values by invoking verifyDataRows(result, expectedRows...) using the same 7
expected balance values used in testCrossClusterSortWithTypeCasting, or at
minimum add assertions that the result is non-empty and that each value in the
converted "balance" column is numeric; locate the call sites around
executeQuery/String.format(...) with TEST_INDEX_BANK_REMOTE and replace or
extend the verifyColumn(...) checks with verifyDataRows(...) (or additional
numeric-type/non-empty assertions) to make the test meaningful.

In `@ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLConvertTest.java`:
- Around line 1-272: The test suite in CalcitePPLConvertTest only covers
positive convert cases; add negative grammar tests to validate error handling
(at minimum a test that asserts an invalid conversion function is rejected). In
CalcitePPLConvertTest add a new `@Test` method (e.g., testConvertInvalidFunction)
that calls getRelNode("source=EMP | convert unknown_fn(SAL)") and asserts an
exception is thrown (use assertThrows or the existing test harness), and
similarly add tests for missing argument ("convert auto()"), non-existent field
("convert auto(NONEXISTENT_COL)"), and alias shadowing with none() (e.g.,
"convert none(EMPNO) AS EMPNO") to verify the parser/planner raises/propagates
errors instead of succeeding; use getRelNode and the existing verification
helpers (verifyLogical/verifyPPLToSparkSQL) pattern for positive vs negative
assertions.

In
`@ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java`:
- Around line 1046-1062: The testConvertCommand in PPLQueryDataAnonymizerTest is
missing coverage for the memk() conversion; add an assertion that calls
anonymize with an input using memk (e.g., "source=t | convert memk(field)") and
expects the anonymized output to map the field to the generic identifier form
consistent with other cases (e.g., "source=table | convert memk(identifier)" or
with AS when appropriate). Update the testConvertCommand method to include this
new assertEquals using the existing anonymize helper so memk conversion is
validated alongside auto, num, rmcomma, rmunit, and none.

Comment on lines +11 to +17
public AutoConvertFunction() {
super(AutoConvertFunction.class);
}

public static Object convert(Object value) {
return new AutoConvertFunction().convertValue(value);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add JavaDoc for the public constructor and convert entrypoint.

These public methods currently lack JavaDoc with @param/@return/@throws. Please add the required JavaDoc for the constructor and static convert method.

As per coding guidelines: "core/src/main/java/**/*.java: Public methods MUST have JavaDoc with @param, @return, and @throws".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/AutoConvertFunction.java`
around lines 11 - 17, Add JavaDoc comments for the public AutoConvertFunction()
constructor and the public static convert(Object value) entrypoint: for
AutoConvertFunction(), add a short description and an `@throws` tag if the
constructor can propagate any runtime exceptions (or explicitly state none are
thrown); for convert(Object) add a description, an `@param` value explaining
accepted input, an `@return` describing the conversion result, and an `@throws`
documenting the runtime exception(s) that may be raised (e.g., if
convertValue(value) fails). Reference the constructor AutoConvertFunction(), the
static method convert(Object), and the instance helper convertValue(Object) in
the comments so maintainers can trace behavior.

Comment on lines +11 to +17
public MemkConvertFunction() {
super(MemkConvertFunction.class);
}

public static Object convert(Object value) {
return new MemkConvertFunction().convertValue(value);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add JavaDoc for the public constructor and convert entrypoint.

These public methods currently lack JavaDoc with @param/@return/@throws. Please add the required JavaDoc for the constructor and static convert method.

As per coding guidelines: "core/src/main/java/**/*.java: Public methods MUST have JavaDoc with @param, @return, and @throws".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/MemkConvertFunction.java`
around lines 11 - 17, Add JavaDoc comments for the public MemkConvertFunction()
constructor and the public static convert(Object value) entrypoint: for
MemkConvertFunction() document its purpose and any side effects and include
`@throws` if it can throw (or explicitly state none), and for convert(Object
value) document the parameter (`@param` value the input to convert), the return
value (`@return` the converted object or null) and any exceptions it can propagate
(`@throws` ... or state none); reference the convertValue(Object) helper in the
convert method description to clarify behavior.

Comment on lines +11 to +17
public NumConvertFunction() {
super(NumConvertFunction.class);
}

public static Object convert(Object value) {
return new NumConvertFunction().convertValue(value);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document the public convert API.

The public convert method (and constructor if kept public) should have JavaDoc with @param/@return.

✍️ Example JavaDoc
+  /**
+   * Convert a value using num() semantics.
+   *
+   * `@param` value input value
+   * `@return` converted double, or null if not convertible
+   */
   public static Object convert(Object value) {
     return new NumConvertFunction().convertValue(value);
   }
As per coding guidelines: `core/src/main/java/**/*.java`: Public methods MUST have JavaDoc with `@param`, `@return`, and `@throws`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/NumConvertFunction.java`
around lines 11 - 17, Add JavaDoc for the public API: document the
NumConvertFunction() constructor and the static method convert(Object value)
with `@param` describing the value argument, `@return` describing the returned
converted numeric/object result, and `@throws` for any runtime exceptions that
convertValue may propagate; reference the NumConvertFunction.convert(Object)
method and the NumConvertFunction() constructor and ensure the JavaDoc text
matches coding guidelines for public methods.

Comment on lines +11 to +17
public RmcommaConvertFunction() {
super(RmcommaConvertFunction.class);
}

public static Object convert(Object value) {
return new RmcommaConvertFunction().convertValue(value);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add JavaDoc for the public constructor and convert entrypoint.

These public methods currently lack JavaDoc with @param/@return/@throws. Please add the required JavaDoc for the constructor and static convert method.

As per coding guidelines: "core/src/main/java/**/*.java: Public methods MUST have JavaDoc with @param, @return, and @throws".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/RmcommaConvertFunction.java`
around lines 11 - 17, Add JavaDoc to the public no-arg constructor
RmcommaConvertFunction() and the static entrypoint convert(Object value): for
the constructor, add a short description of the class initialization and include
any `@throws` if construction can fail; for the static convert method, add a
description, an `@param` value describing accepted input types, an `@return`
describing the converted value (or null) and an `@throws` documenting the
exceptions that may be raised during conversion (e.g., IllegalArgumentException
for invalid input or RuntimeException for conversion errors). Reference the
convertValue(Object) call inside convert(Object) so the JavaDoc reflects that
the static method delegates to convertValue for the actual conversion.

Comment on lines +213 to +239
@Test
public void testCrossClusterConvert() throws IOException {
enableCalcite();

JSONObject result =
executeQuery(
String.format(
"search source=%s | convert auto(balance) | fields balance",
TEST_INDEX_BANK_REMOTE));
verifyColumn(result, columnName("balance"));

disableCalcite();
}

@Test
public void testCrossClusterConvertWithAlias() throws IOException {
enableCalcite();

JSONObject result =
executeQuery(
String.format(
"search source=%s | convert auto(balance) AS balance_num | fields balance_num",
TEST_INDEX_BANK_REMOTE));
verifyColumn(result, columnName("balance_num"));

disableCalcite();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

disableCalcite() must be called in a finally block to prevent Calcite state leaking into subsequent tests.

Both testCrossClusterConvert and testCrossClusterConvertWithAlias enable Calcite at the top but only disable it on the happy path. Any exception thrown from executeQuery or verifyColumn (e.g., an assertion failure, an IOException, or a network error) will skip disableCalcite(), leaving Calcite enabled for every test that runs afterwards. This violates test independence and can cause cascading, order-dependent failures in the rest of the suite.

🛡️ Proposed fix: wrap in try-finally
   `@Test`
   public void testCrossClusterConvert() throws IOException {
     enableCalcite();
-
-    JSONObject result =
-        executeQuery(
-            String.format(
-                "search source=%s | convert auto(balance) | fields balance",
-                TEST_INDEX_BANK_REMOTE));
-    verifyColumn(result, columnName("balance"));
-
-    disableCalcite();
+    try {
+      JSONObject result =
+          executeQuery(
+              String.format(
+                  "search source=%s | convert auto(balance) | fields balance",
+                  TEST_INDEX_BANK_REMOTE));
+      verifyColumn(result, columnName("balance"));
+    } finally {
+      disableCalcite();
+    }
   }

   `@Test`
   public void testCrossClusterConvertWithAlias() throws IOException {
     enableCalcite();
-
-    JSONObject result =
-        executeQuery(
-            String.format(
-                "search source=%s | convert auto(balance) AS balance_num | fields balance_num",
-                TEST_INDEX_BANK_REMOTE));
-    verifyColumn(result, columnName("balance_num"));
-
-    disableCalcite();
+    try {
+      JSONObject result =
+          executeQuery(
+              String.format(
+                  "search source=%s | convert auto(balance) AS balance_num | fields balance_num",
+                  TEST_INDEX_BANK_REMOTE));
+      verifyColumn(result, columnName("balance_num"));
+    } finally {
+      disableCalcite();
+    }
   }

As per coding guidelines, "Tests must not rely on execution order; ensure test independence" and "Ensure tests are independent and can run in any order."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@integ-test/src/test/java/org/opensearch/sql/security/CrossClusterSearchIT.java`
around lines 213 - 239, Both testCrossClusterConvert and
testCrossClusterConvertWithAlias enable Calcite with enableCalcite() but call
disableCalcite() only on the happy path; wrap the body of each test in a try {
... } finally { disableCalcite(); } so disableCalcite() always runs even if
executeQuery(...) or verifyColumn(...) throws; locate the two test methods
(testCrossClusterConvert and testCrossClusterConvertWithAlias) and move their
enableCalcite() to before the try and ensure the executeQuery/verifyColumn calls
are inside the try block with disableCalcite() in the finally.

OTHERSTR: 'OTHERSTR';
NULLSTR: 'NULLSTR';
TIMEFIELD: 'TIMEFIELD';
TIMEFORMAT: 'TIMEFORMAT';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if TIMEFORMAT is referenced in any parser grammar or Java source files
echo "=== References in .g4 parser files ==="
rg -n 'TIMEFORMAT' -t g4 2>/dev/null || echo "No g4 type found, trying without type filter..."
rg -n 'TIMEFORMAT' --type-list | grep -i g4 > /dev/null && rg -n 'TIMEFORMAT' --type-add 'g4:*.g4' -t g4 || rg -n 'TIMEFORMAT' '\.g4$'

echo ""
echo "=== References in Java files ==="
rg -n 'TIMEFORMAT' -t java 2>/dev/null | head -20

echo ""
echo "=== All references in repo ==="
rg -n 'TIMEFORMAT' | head -30

Repository: opensearch-project/sql

Length of output: 766


Remove the unused TIMEFORMAT token or verify it's needed elsewhere.

The TIMEFORMAT token at line 163 is defined in the lexer but has no references anywhere in the parser grammar or Java codebase. If this was added inadvertently or is planned for future use, either integrate it into parser rules or remove it to avoid dead code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ppl/src/main/antlr/OpenSearchPPLLexer.g4` at line 163, The TIMEFORMAT lexer
token is unused; either remove the TIMEFORMAT token definition from
OpenSearchPPLLexer.g4 to eliminate dead code, or if it's intended, add
corresponding parser grammar rules and/or Java references that consume
TIMEFORMAT (e.g., incorporate it into relevant parser rules that parse time
formats or make the token part of an existing rule), then regenerate the
lexer/parser and update any tests; search for TIMEFORMAT in the project to
confirm no existing usage before deleting or integrate it where parser rules
expect a time format token.

Comment on lines +1 to +272
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.sql.ppl.calcite;

import org.apache.calcite.rel.RelNode;
import org.apache.calcite.test.CalciteAssert;
import org.junit.Test;

/** Unit tests for PPL convert command. */
public class CalcitePPLConvertTest extends CalcitePPLAbstractTest {

public CalcitePPLConvertTest() {
super(CalciteAssert.SchemaSpec.SCOTT_WITH_TEMPORAL);
}

@Test
public void testConvertBasic() {
String ppl = "source=EMP | convert auto(SAL)";
RelNode root = getRelNode(ppl);
String expectedLogical =
"LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[AUTO($5)],"
+ " COMM=[$6], DEPTNO=[$7])\n"
+ " LogicalTableScan(table=[[scott, EMP]])\n";
verifyLogical(root, expectedLogical);

String expectedSparkSql =
"SELECT `EMPNO`, `ENAME`, `JOB`, `MGR`, `HIREDATE`, AUTO(`SAL`) `SAL`, `COMM`, `DEPTNO`\n"
+ "FROM `scott`.`EMP`";
verifyPPLToSparkSQL(root, expectedSparkSql);
}

@Test
public void testConvertWithAlias() {
String ppl = "source=EMP | convert auto(SAL) AS salary_num";
RelNode root = getRelNode(ppl);
String expectedLogical =
"LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5],"
+ " COMM=[$6], DEPTNO=[$7], salary_num=[AUTO($5)])\n"
+ " LogicalTableScan(table=[[scott, EMP]])\n";
verifyLogical(root, expectedLogical);

String expectedSparkSql =
"SELECT `EMPNO`, `ENAME`, `JOB`, `MGR`, `HIREDATE`, `SAL`, `COMM`, `DEPTNO`, AUTO(`SAL`)"
+ " `salary_num`\n"
+ "FROM `scott`.`EMP`";
verifyPPLToSparkSQL(root, expectedSparkSql);
}

@Test
public void testConvertMultipleFunctions() {
String ppl = "source=EMP | convert auto(SAL), num(COMM)";
RelNode root = getRelNode(ppl);
String expectedLogical =
"LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[AUTO($5)],"
+ " COMM=[NUM($6)], DEPTNO=[$7])\n"
+ " LogicalTableScan(table=[[scott, EMP]])\n";
verifyLogical(root, expectedLogical);

String expectedSparkSql =
"SELECT `EMPNO`, `ENAME`, `JOB`, `MGR`, `HIREDATE`, AUTO(`SAL`) `SAL`, NUM(`COMM`) `COMM`,"
+ " `DEPTNO`\n"
+ "FROM `scott`.`EMP`";
verifyPPLToSparkSQL(root, expectedSparkSql);
}

@Test
public void testConvertMultipleWithAliases() {
String ppl = "source=EMP | convert auto(SAL) AS salary, num(COMM) AS commission";
RelNode root = getRelNode(ppl);
String expectedLogical =
"LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5],"
+ " COMM=[$6], DEPTNO=[$7], salary=[AUTO($5)], commission=[NUM($6)])\n"
+ " LogicalTableScan(table=[[scott, EMP]])\n";
verifyLogical(root, expectedLogical);

String expectedSparkSql =
"SELECT `EMPNO`, `ENAME`, `JOB`, `MGR`, `HIREDATE`, `SAL`, `COMM`, `DEPTNO`, AUTO(`SAL`)"
+ " `salary`, NUM(`COMM`) `commission`\n"
+ "FROM `scott`.`EMP`";
verifyPPLToSparkSQL(root, expectedSparkSql);
}

@Test
public void testConvertWithFields() {
String ppl = "source=EMP | convert auto(SAL) AS salary_num | fields EMPNO, ENAME, salary_num";
RelNode root = getRelNode(ppl);
// Calcite optimizes the two projections into one - this is more efficient
String expectedLogical =
"LogicalProject(EMPNO=[$0], ENAME=[$1], salary_num=[AUTO($5)])\n"
+ " LogicalTableScan(table=[[scott, EMP]])\n";
verifyLogical(root, expectedLogical);

String expectedSparkSql =
"SELECT `EMPNO`, `ENAME`, AUTO(`SAL`) `salary_num`\n" + "FROM `scott`.`EMP`";
verifyPPLToSparkSQL(root, expectedSparkSql);
}

@Test
public void testConvertNumFunction() {
String ppl = "source=EMP | convert num(SAL)";
RelNode root = getRelNode(ppl);
String expectedLogical =
"LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[NUM($5)],"
+ " COMM=[$6], DEPTNO=[$7])\n"
+ " LogicalTableScan(table=[[scott, EMP]])\n";
verifyLogical(root, expectedLogical);

String expectedSparkSql =
"SELECT `EMPNO`, `ENAME`, `JOB`, `MGR`, `HIREDATE`, NUM(`SAL`) `SAL`, `COMM`, `DEPTNO`\n"
+ "FROM `scott`.`EMP`";
verifyPPLToSparkSQL(root, expectedSparkSql);
}

@Test
public void testConvertRmcommaFunction() {
String ppl = "source=EMP | convert rmcomma(ENAME)";
RelNode root = getRelNode(ppl);
String expectedLogical =
"LogicalProject(EMPNO=[$0], ENAME=[RMCOMMA($1)], JOB=[$2], MGR=[$3], HIREDATE=[$4],"
+ " SAL=[$5], COMM=[$6], DEPTNO=[$7])\n"
+ " LogicalTableScan(table=[[scott, EMP]])\n";
verifyLogical(root, expectedLogical);

String expectedSparkSql =
"SELECT `EMPNO`, RMCOMMA(`ENAME`) `ENAME`, `JOB`, `MGR`, `HIREDATE`, `SAL`, `COMM`,"
+ " `DEPTNO`\n"
+ "FROM `scott`.`EMP`";
verifyPPLToSparkSQL(root, expectedSparkSql);
}

@Test
public void testConvertRmunitFunction() {
String ppl = "source=EMP | convert rmunit(ENAME)";
RelNode root = getRelNode(ppl);
String expectedLogical =
"LogicalProject(EMPNO=[$0], ENAME=[RMUNIT($1)], JOB=[$2], MGR=[$3], HIREDATE=[$4],"
+ " SAL=[$5], COMM=[$6], DEPTNO=[$7])\n"
+ " LogicalTableScan(table=[[scott, EMP]])\n";
verifyLogical(root, expectedLogical);

String expectedSparkSql =
"SELECT `EMPNO`, RMUNIT(`ENAME`) `ENAME`, `JOB`, `MGR`, `HIREDATE`, `SAL`, `COMM`,"
+ " `DEPTNO`\n"
+ "FROM `scott`.`EMP`";
verifyPPLToSparkSQL(root, expectedSparkSql);
}

@Test
public void testConvertMemkFunction() {
String ppl = "source=EMP | convert memk(ENAME)";
RelNode root = getRelNode(ppl);
String expectedLogical =
"LogicalProject(EMPNO=[$0], ENAME=[MEMK($1)], JOB=[$2], MGR=[$3], HIREDATE=[$4],"
+ " SAL=[$5], COMM=[$6], DEPTNO=[$7])\n"
+ " LogicalTableScan(table=[[scott, EMP]])\n";
verifyLogical(root, expectedLogical);

String expectedSparkSql =
"SELECT `EMPNO`, MEMK(`ENAME`) `ENAME`, `JOB`, `MGR`, `HIREDATE`, `SAL`, `COMM`,"
+ " `DEPTNO`\n"
+ "FROM `scott`.`EMP`";
verifyPPLToSparkSQL(root, expectedSparkSql);
}

@Test
public void testConvertNoneFunction() {
String ppl = "source=EMP | convert none(ENAME)";
RelNode root = getRelNode(ppl);
String expectedLogical = "LogicalTableScan(table=[[scott, EMP]])\n";
verifyLogical(root, expectedLogical);

String expectedSparkSql = "SELECT *\n" + "FROM `scott`.`EMP`";
verifyPPLToSparkSQL(root, expectedSparkSql);
}

@Test
public void testConvertWithWhere() {
String ppl = "source=EMP | where DEPTNO = 10 | convert auto(SAL)";
RelNode root = getRelNode(ppl);
String expectedLogical =
"LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[AUTO($5)],"
+ " COMM=[$6], DEPTNO=[$7])\n"
+ " LogicalFilter(condition=[=($7, 10)])\n"
+ " LogicalTableScan(table=[[scott, EMP]])\n";
verifyLogical(root, expectedLogical);

String expectedSparkSql =
"SELECT `EMPNO`, `ENAME`, `JOB`, `MGR`, `HIREDATE`, AUTO(`SAL`) `SAL`, `COMM`, `DEPTNO`\n"
+ "FROM `scott`.`EMP`\n"
+ "WHERE `DEPTNO` = 10";
verifyPPLToSparkSQL(root, expectedSparkSql);
}

@Test
public void testConvertWithSort() {
String ppl = "source=EMP | convert auto(SAL) AS salary_num | sort - salary_num | head 3";
RelNode root = getRelNode(ppl);
String expectedLogical =
"LogicalSort(sort0=[$8], dir0=[DESC-nulls-last], fetch=[3])\n"
+ " LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],"
+ " SAL=[$5], COMM=[$6], DEPTNO=[$7], salary_num=[AUTO($5)])\n"
+ " LogicalTableScan(table=[[scott, EMP]])\n";
verifyLogical(root, expectedLogical);

String expectedSparkSql =
"SELECT `EMPNO`, `ENAME`, `JOB`, `MGR`, `HIREDATE`, `SAL`, `COMM`, `DEPTNO`, AUTO(`SAL`)"
+ " `salary_num`\n"
+ "FROM `scott`.`EMP`\n"
+ "ORDER BY 9 DESC\n"
+ "LIMIT 3";
verifyPPLToSparkSQL(root, expectedSparkSql);
}

@Test
public void testConvertWithStats() {
String ppl = "source=EMP | convert auto(SAL) AS salary_num | stats avg(salary_num) by DEPTNO";
RelNode root = getRelNode(ppl);
String expectedLogical =
"LogicalProject(avg(salary_num)=[$1], DEPTNO=[$0])\n"
+ " LogicalAggregate(group=[{0}], avg(salary_num)=[AVG($1)])\n"
+ " LogicalProject(DEPTNO=[$7], salary_num=[AUTO($5)])\n"
+ " LogicalTableScan(table=[[scott, EMP]])\n";
verifyLogical(root, expectedLogical);

String expectedSparkSql =
"SELECT AVG(AUTO(`SAL`)) `avg(salary_num)`, `DEPTNO`\n"
+ "FROM `scott`.`EMP`\n"
+ "GROUP BY `DEPTNO`";
verifyPPLToSparkSQL(root, expectedSparkSql);
}

@Test
public void testConvertAllFunctions() {
String ppl =
"source=EMP | convert auto(SAL) AS sal_auto, num(COMM) AS comm_num, rmcomma(ENAME) AS"
+ " name_clean, rmunit(JOB) AS job_clean, none(EMPNO) AS empno_same";
RelNode root = getRelNode(ppl);
String expectedLogical =
"LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5],"
+ " COMM=[$6], DEPTNO=[$7], sal_auto=[AUTO($5)], comm_num=[NUM($6)],"
+ " name_clean=[RMCOMMA($1)], job_clean=[RMUNIT($2)], empno_same=[$0])\n"
+ " LogicalTableScan(table=[[scott, EMP]])\n";
verifyLogical(root, expectedLogical);

String expectedSparkSql =
"SELECT `EMPNO`, `ENAME`, `JOB`, `MGR`, `HIREDATE`, `SAL`, `COMM`, `DEPTNO`, AUTO(`SAL`)"
+ " `sal_auto`, NUM(`COMM`) `comm_num`, RMCOMMA(`ENAME`) `name_clean`, RMUNIT(`JOB`)"
+ " `job_clean`, `EMPNO` `empno_same`\n"
+ "FROM `scott`.`EMP`";
verifyPPLToSparkSQL(root, expectedSparkSql);
}

@Test
public void testConvertAutoWithMemoryField() {
String ppl = "source=EMP | convert auto(JOB) AS memory_size";
RelNode root = getRelNode(ppl);
String expectedLogical =
"LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5],"
+ " COMM=[$6], DEPTNO=[$7], memory_size=[AUTO($2)])\n"
+ " LogicalTableScan(table=[[scott, EMP]])\n";
verifyLogical(root, expectedLogical);

String expectedSparkSql =
"SELECT `EMPNO`, `ENAME`, `JOB`, `MGR`, `HIREDATE`, `SAL`, `COMM`, `DEPTNO`, AUTO(`JOB`)"
+ " `memory_size`\n"
+ "FROM `scott`.`EMP`";
verifyPPLToSparkSQL(root, expectedSparkSql);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing required NULL-input, boundary-condition, and error-condition tests for all five conversion functions

The coding guidelines for **/+(*Test|*IT).java files explicitly mandate:

  • NULL input tests for all new functions
  • Boundary condition tests (min/max values, empty inputs)
  • Error condition tests (invalid inputs, exceptions)

None of these are present in this class. The five conversion functions (auto, num, rmcomma, rmunit, memk) each need at least one of each category. Examples of what should be added:

// NULL input — verifying SQL generation/plan doesn't blow up on nullable field
`@Test`
public void testConvertNumWithNullInput() {
  // COMM is a nullable field in the SCOTT EMP schema
  String ppl = "source=EMP | convert num(COMM)";
  RelNode root = getRelNode(ppl);
  // assert plan handles null-bearing field correctly
  verifyLogical(root, ...);
  verifyPPLToSparkSQL(root, ...);
}

// Boundary condition — empty-string equivalent: applying rmcomma to a field
// with potentially empty-string values
`@Test`
public void testConvertRmcommaWithEmptyString() { ... }

// Error condition — converting a field with a nonsensical function argument
// or verifying an exception is thrown on illegal input
`@Test`(expected = SomeParseException.class)
public void testConvertInvalidFunctionThrows() {
  String ppl = "source=EMP | convert unknown_fn(SAL)";
  getRelNode(ppl);
}

As per coding guidelines, NULL input tests must be included for all new functions and boundary condition tests (min/max values, empty inputs) for all new functions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLConvertTest.java`
around lines 1 - 272, The test class CalcitePPLConvertTest lacks required
NULL-input, boundary-condition, and error-condition tests for the five
conversion functions (auto, num, rmcomma, rmunit, memk); add unit tests that
exercise each function with (1) a nullable field (NULL-input) e.g., use COMM or
another nullable column and assert getRelNode(ppl) + verifyLogical(root, ...)
and verifyPPLToSparkSQL(root, ...), (2) a boundary/empty input case (e.g.,
rmcomma applied to fields that can be empty or empty-string equivalent) and
assert plan/SQL, and (3) an error-condition test that passes an invalid function
or argument and expects a parse/validation exception (use `@Test`(expected =
SomeParseException.class) around getRelNode(ppl)); implement these tests in
CalcitePPLConvertTest using the existing helpers getRelNode, verifyLogical, and
verifyPPLToSparkSQL and name them clearly like testConvertAutoWithNullInput,
testConvertNumBoundaryEmpty, testConvertInvalidFunctionThrows, etc.

Comment on lines +257 to +271
public void testConvertAutoWithMemoryField() {
String ppl = "source=EMP | convert auto(JOB) AS memory_size";
RelNode root = getRelNode(ppl);
String expectedLogical =
"LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5],"
+ " COMM=[$6], DEPTNO=[$7], memory_size=[AUTO($2)])\n"
+ " LogicalTableScan(table=[[scott, EMP]])\n";
verifyLogical(root, expectedLogical);

String expectedSparkSql =
"SELECT `EMPNO`, `ENAME`, `JOB`, `MGR`, `HIREDATE`, `SAL`, `COMM`, `DEPTNO`, AUTO(`JOB`)"
+ " `memory_size`\n"
+ "FROM `scott`.`EMP`";
verifyPPLToSparkSQL(root, expectedSparkSql);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Misleading test name: testConvertAutoWithMemoryField

The test applies auto(JOB) with alias memory_size, but JOB is a text job-title field in the SCOTT schema — not a memory-related field. The name implies memory-specific behavior (e.g., "10k", "1.5M" strings) is being exercised, which it isn't. This is misleading for anyone trying to understand the test intent.

Consider renaming to testConvertAutoWithAlias (since testConvertWithAlias already covers SAL with auto, this could be testConvertAutoAliasOnStringField) or updating the test to actually use a field resembling a memory value.

✏️ Suggested rename
-  public void testConvertAutoWithMemoryField() {
-    String ppl = "source=EMP | convert auto(JOB) AS memory_size";
+  public void testConvertAutoAliasOnStringField() {
+    String ppl = "source=EMP | convert auto(JOB) AS job_converted";

As per coding guidelines, Test naming should follow pattern test<Functionality><Condition>.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLConvertTest.java`
around lines 257 - 271, The test method testConvertAutoWithMemoryField is
misnamed because it uses auto(JOB) aliased as memory_size while JOB is a string
job-title field; rename the test to reflect aliasing behavior (e.g.,
testConvertAutoAliasOnStringField or testConvertAutoWithAlias) or change the
input to use a memory-like field (e.g., replace JOB with a numeric/string
memory_field that contains values like "10k") so the name matches behavior;
update the method name testConvertAutoWithMemoryField and any associated
expected strings (retain auto(JOB) and memory_size if renaming, or replace
auto(JOB)/memory_size occurrences if you change the field) to keep assertions
consistent.

Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

984-1093: Consider validating allowed convert functions.

If convert is meant to accept only the conversion UDFs (auto/num/rmcomma/rmunit/none/memk), this currently permits any single-argument function name. Consider an allowlist check (or explicitly document that arbitrary UDFs are accepted).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`
around lines 984 - 1093, The convert implementation accepts any single-argument
function name; restrict it to the allowed conversion UDFs by adding an allowlist
check in processFunctionConversion (before calling
PPLFuncImpTable.INSTANCE.resolve): define the set of permitted names (e.g.,
"auto","num","rmcomma","rmunit","none","memk") and if functionName is not in
that set throw a SemanticCheckException with a clear message; keep the rest of
processFunctionConversion logic (replacements/additions) intact and update any
related tests or docs if you intend to allow additional functions.
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java (1)

1206-1214: Prefer Objects::nonNull over inline null check; confirm empty-list behavior

Objects is already imported. The inline lambda .filter(conversion -> conversion != null) should be .filter(Objects::nonNull). Additionally, if every conversion is a none(field) with the same source and target, conversions will be empty. Verify that a Convert node with an empty conversion list is handled gracefully downstream (analyzer, physical operator, Calcite planner).

♻️ Proposed refactor
-    List<Let> conversions =
-        ctx.convertFunction().stream()
-            .map(this::buildConversion)
-            .filter(conversion -> conversion != null)
-            .collect(Collectors.toList());
+    List<Let> conversions =
+        ctx.convertFunction().stream()
+            .map(this::buildConversion)
+            .filter(Objects::nonNull)
+            .collect(Collectors.toList());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java` around lines
1206 - 1214, Replace the inline null-check filter in visitConvertCommand by
using Objects::nonNull (i.e., change .filter(conversion -> conversion != null)
to .filter(Objects::nonNull)) so the stream is cleaner; while here, confirm
behavior when buildConversion yields no conversions (e.g., all none(field)) by
verifying downstream handling of a Convert constructed with an empty conversions
list—trace usages of Convert and the conversions list through the analyzer,
physical operator builders, and Calcite planner and either ensure they accept an
empty list or add a guard in visitConvertCommand to produce an appropriate
no-op/validation/error if empty conversions are invalid.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java`:
- Around line 460-461: Add a missing closing brace to end the method
testCrossClusterConvertWithAlias and add the JUnit `@Test` annotation immediately
before the method declaration of testCrossClusterMvExpandBasic so the first
method is properly closed and the second is recognized as a test; locate the
methods testCrossClusterConvertWithAlias and testCrossClusterMvExpandBasic in
CalciteCrossClusterSearchIT and insert a single closing '}' after the
verifySchema(...) call and add '@Test' on the line directly above the public
void testCrossClusterMvExpandBasic() declaration.

In `@ppl/src/main/antlr/OpenSearchPPLParser.g4`:
- Around line 543-549: Add negative/grammar-level tests for
convertCommand/convertFunction in CalcitePPLConvertTest: create test cases that
feed malformed convert syntaxes (e.g., missing field argument inside
LT_PRTHS...RT_PRTHS, malformed function names, missing LT_PRTHS/RT_PRTHS,
stray/extra COMMA, or misplaced AS alias) and assert the parser fails (throws a
RecognitionException or records parse errors) rather than building an AST;
reference convertCommand, convertFunction, visitConvertCommand, and
buildConversion to ensure the tests target parsing failures (not just
semantic/AST logic) and cover edge cases alongside the existing 14 positive
tests.

In `@ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java`:
- Around line 1225-1229: The current check for the "none" no-op uses toString()
comparison between fieldArg and targetField.getField(), which is fragile;
instead detect the no-op by checking the function context alias (funcCtx.alias
== null) because determineTargetField returns the same field when no alias is
provided. Replace the ternary that returns null or new Let(targetField,
fieldArg) with a condition that if "none".equalsIgnoreCase(functionName) and
funcCtx.alias == null then return null, otherwise return new Let(targetField,
fieldArg); update the block referencing functionName, fieldArg, targetField (and
Let) accordingly so the redundant new Let(field, field) is not created when
alias is absent.
- Around line 1221-1231: The parser extracts functionName via
funcCtx.functionName.getText() but passes the raw string into AstDSL.function;
keep the equalsIgnoreCase("none") check as-is but normalize the name before
creating the function node: compute a normalizedName (e.g.,
functionName.toLowerCase()) and use that when calling
AstDSL.function(normalizedName, fieldArg) so the Let creation and
AstDSL.function use a consistent normalized function name (references:
functionName, funcCtx.functionName.getText(), AstDSL.function,
determineTargetField, Let).

In `@ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java`:
- Around line 463-495: In visitConvert, handle the case where
conversion.getExpression() is a Field (the "none(field) AS alias" case) so
functionName isn’t left empty: if conversion.getExpression() is a Field, set
functionName = "none", set actualSourceField to that Field's string, and set
fields to MASK_COLUMN (so the resulting format becomes "none(***) AS ***"); keep
the existing logic for Function nodes unchanged and preserve the existing
asClause/targetField logic that compares targetField to actualSourceField.

---

Duplicate comments:
In `@integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java`:
- Around line 232-241: The test currently only verifies the result inside the
catch block, so when executeQuery succeeds the test does not assert anything;
modify testConvertCommand to capture the successful response into the existing
JSONObject result and call verifyQuery(result) on the success path as well:
inside the try block assign the response from executeQuery to result (or parse
the returned response body into result using TestUtils.getResponseBody if
executeQuery returns a response object) and then call verifyQuery(result); keep
the existing catch block for ResponseException to preserve the error-path
behavior.

In `@ppl/src/main/antlr/OpenSearchPPLLexer.g4`:
- Line 164: The TIMEFORMAT lexer token is currently unused in the grammar;
either remove its definition from OpenSearchPPLLexer.g4 or wire it into parser
rules that consume it (e.g., add a parser rule like timeFormatSpec : TIMEFORMAT
... or use TIMEFORMAT in existing rules such as functionCall, literal, or format
clauses). Locate the TIMEFORMAT token in OpenSearchPPLLexer.g4 and decide:
delete the 'TIMEFORMAT' token line and run a project-wide search for TIMEFORMAT
to ensure no references remain, or add the appropriate parser rule(s) and update
any callers (e.g., format-related rules) so the token is actually consumed by
the parser.

---

Nitpick comments:
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 984-1093: The convert implementation accepts any single-argument
function name; restrict it to the allowed conversion UDFs by adding an allowlist
check in processFunctionConversion (before calling
PPLFuncImpTable.INSTANCE.resolve): define the set of permitted names (e.g.,
"auto","num","rmcomma","rmunit","none","memk") and if functionName is not in
that set throw a SemanticCheckException with a clear message; keep the rest of
processFunctionConversion logic (replacements/additions) intact and update any
related tests or docs if you intend to allow additional functions.

In `@ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java`:
- Around line 1206-1214: Replace the inline null-check filter in
visitConvertCommand by using Objects::nonNull (i.e., change .filter(conversion
-> conversion != null) to .filter(Objects::nonNull)) so the stream is cleaner;
while here, confirm behavior when buildConversion yields no conversions (e.g.,
all none(field)) by verifying downstream handling of a Convert constructed with
an empty conversions list—trace usages of Convert and the conversions list
through the analyzer, physical operator builders, and Calcite planner and either
ensure they accept an empty list or add a guard in visitConvertCommand to
produce an appropriate no-op/validation/error if empty conversions are invalid.

Comment on lines +460 to 461
verifySchema(result, schema("balance_num", "double"));
public void testCrossClusterMvExpandBasic() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

fd -I "CalciteCrossClusterSearchIT.java" --exec sed -n '455,475p' {}

Repository: opensearch-project/sql

Length of output: 1021


Compile error: missing closing } for testCrossClusterConvertWithAlias; @Test annotation also missing from testCrossClusterMvExpandBasic.

The file is missing two required lines between line 460 and line 461:

  1. A closing brace } to end testCrossClusterConvertWithAlias — without it the method declaration testCrossClusterMvExpandBasic is parsed as a statement inside testCrossClusterConvertWithAlias's body, causing a Java compilation error.
  2. The @Test annotation before testCrossClusterMvExpandBasic — without it the method will not be recognized as a test and will not run.
Proposed fix
     verifySchema(result, schema("balance_num", "double"));
+  }
+
+  `@Test`
   public void testCrossClusterMvExpandBasic() throws IOException {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
verifySchema(result, schema("balance_num", "double"));
public void testCrossClusterMvExpandBasic() throws IOException {
verifySchema(result, schema("balance_num", "double"));
}
`@Test`
public void testCrossClusterMvExpandBasic() throws IOException {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java`
around lines 460 - 461, Add a missing closing brace to end the method
testCrossClusterConvertWithAlias and add the JUnit `@Test` annotation immediately
before the method declaration of testCrossClusterMvExpandBasic so the first
method is properly closed and the second is recognized as a test; locate the
methods testCrossClusterConvertWithAlias and testCrossClusterMvExpandBasic in
CalciteCrossClusterSearchIT and insert a single closing '}' after the
verifySchema(...) call and add '@Test' on the line directly above the public
void testCrossClusterMvExpandBasic() declaration.

Comment on lines +543 to +549
convertCommand
: CONVERT convertFunction (COMMA? convertFunction)*
;

convertFunction
: functionName = ident LT_PRTHS fieldExpression RT_PRTHS (AS alias = fieldExpression)?
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n 'convertCommand|convertFunction|visitConvert' -g '*.java' -g '*.g4' -A 2

Repository: opensearch-project/sql

Length of output: 7268


🏁 Script executed:

rg -n 'convert' -g '*Test*.java' -g '*Parser*.g4' --max-count 30

Repository: opensearch-project/sql

Length of output: 42406


🏁 Script executed:

# Search for AST builder classes and convert-related implementations
find . -name '*Parser*.java' -o -name '*Visitor*.java' -o -name '*Builder*.java' | grep -i ast | head -20

Repository: opensearch-project/sql

Length of output: 1664


🏁 Script executed:

# Check for existing command patterns in grammar file
rg -n 'Command\s*:' ppl/src/main/antlr/OpenSearchPPLParser.g4 -A 5 | head -50

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check the buildConversion method implementation
rg -n 'buildConversion' ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java -A 15

Repository: opensearch-project/sql

Length of output: 1059


🏁 Script executed:

# Check CalcitePPLConvertTest to see test cases and coverage
cat ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLConvertTest.java | head -100

Repository: opensearch-project/sql

Length of output: 3902


🏁 Script executed:

# Look for negative/error test cases for convert
rg -n 'convert.*invalid|convert.*error|convert.*negative|convert.*exception' ppl/src/test -i

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check for grammar rule validation tests
rg -n 'convertFunction|convertCommand' ppl/src/test/java -B 2 -A 5

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Get full CalcitePPLConvertTest file to see all test methods
wc -l ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLConvertTest.java

Repository: opensearch-project/sql

Length of output: 145


🏁 Script executed:

# See all test methods in CalcitePPLConvertTest
rg -n '@Test|public void test' ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLConvertTest.java

Repository: opensearch-project/sql

Length of output: 926


🏁 Script executed:

# Check for any exception/negative tests or edge cases
grep -n 'throw\|Exception\|assertThrow\|invalid\|error' ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLConvertTest.java

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look for parser-level tests (grammar-level tests)
find ppl -path '*test*' -name '*Parser*Test*.java' -o -path '*test*' -name '*Grammar*Test*.java' | xargs grep -l 'convert' 2>/dev/null || echo "No parser grammar tests found"

Repository: opensearch-project/sql

Length of output: 95


Add negative/grammar-level tests for convertCommand syntax validation.

AST builder support is confirmed—visitConvertCommand properly handles convertFunction contexts and aliasing via buildConversion. However, CalcitePPLConvertTest covers only positive cases (14 test methods). Add tests for invalid syntax (e.g., missing field argument, malformed function names, edge cases) at the parser/grammar level to validate error handling during parsing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ppl/src/main/antlr/OpenSearchPPLParser.g4` around lines 543 - 549, Add
negative/grammar-level tests for convertCommand/convertFunction in
CalcitePPLConvertTest: create test cases that feed malformed convert syntaxes
(e.g., missing field argument inside LT_PRTHS...RT_PRTHS, malformed function
names, missing LT_PRTHS/RT_PRTHS, stray/extra COMMA, or misplaced AS alias) and
assert the parser fails (throws a RecognitionException or records parse errors)
rather than building an AST; reference convertCommand, convertFunction,
visitConvertCommand, and buildConversion to ensure the tests target parsing
failures (not just semantic/AST logic) and cover edge cases alongside the
existing 14 positive tests.

Comment on lines +1221 to +1231
String functionName = funcCtx.functionName.getText();
UnresolvedExpression fieldArg = internalVisitExpression(funcCtx.fieldExpression(0));
Field targetField = determineTargetField(funcCtx, fieldArg);

if ("none".equalsIgnoreCase(functionName)) {
return fieldArg.toString().equals(targetField.getField().toString())
? null
: new Let(targetField, fieldArg);
}

return new Let(targetField, AstDSL.function(functionName, fieldArg));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, locate key files related to function resolution and UDF registry
find . -type f -name "*.java" | grep -iE "(function|resolver|udf|registry)" | head -20

Repository: opensearch-project/sql

Length of output: 1820


🏁 Script executed:

# Search for AstDSL.function usage to understand the expected input
rg -n "AstDSL\.function" --type=java -B 2 -A 2 | head -50

Repository: opensearch-project/sql

Length of output: 5510


🏁 Script executed:

# Look for FunctionRegistry or similar registration mechanisms
rg -n "class.*FunctionRegistry\|class.*FunctionResolver\|registerFunction" --type=java -B 1 -A 5

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check the buildConversion method context and surrounding code
cat -n ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java | sed -n '1215,1235p'

Repository: opensearch-project/sql

Length of output: 1052


🏁 Script executed:

# Find the Function class definition to understand how it's used
find . -type f -name "Function.java" -path "*/ast/*" | head -5

Repository: opensearch-project/sql

Length of output: 134


🏁 Script executed:

# Look for PPL-specific function resolution or analysis
find . -type f -name "*.java" -path "*/ppl/*" | grep -iE "(analysis|resolver|function)" | head -20

Repository: opensearch-project/sql

Length of output: 1594


🏁 Script executed:

# Search for where convert functions are defined/used in PPL
rg -n "convert\|AUTO\|HEX" --type=java -i ppl/src/ -B 2 -A 2 | head -80

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check how Function expressions are resolved during analysis
rg -n "resolveFunction\|visitFunction\|analyzeFunction" --type=java -B 2 -A 5 | head -80

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Read the Function class to understand function name handling
cat -n core/src/main/java/org/opensearch/sql/ast/expression/Function.java | head -100

Repository: opensearch-project/sql

Length of output: 1515


🏁 Script executed:

# Check the ConversionFunctionIT test to see how convert is tested
cat -n integ-test/src/test/java/org/opensearch/sql/ppl/ConversionFunctionIT.java | head -150

Repository: opensearch-project/sql

Length of output: 3867


🏁 Script executed:

# Look for PPL analysis/type resolution code
find ./ppl -type f -name "*.java" -path "*/analysis/*" -o -path "*/calcite/*" | grep -v test | head -15

Repository: opensearch-project/sql

Length of output: 147


🏁 Script executed:

# Search for actual "convert" function usage in tests
rg -n "convert\s*\(" --type=java -i ppl/src/ integ-test/src/ -B 2 -A 2 | head -100

Repository: opensearch-project/sql

Length of output: 4457


🏁 Script executed:

# Look for the PPL analyzer/visitor that processes Function nodes
find ./ppl -type f -name "*Analyzer.java" -o -name "*Visitor.java" | grep -v test

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check how functions are handled in PPL's expression analysis
rg -n "visitFunction\|analyzeFunction" --type=java -i ppl/src/ -B 2 -A 5 | head -100

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look for function resolution in the core module
rg -n "resolveFunction\|getFunctionName\|toLowerCase" --type=java core/src/ -B 2 -A 2 | grep -iE "(function|name)" | head -50

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Find the Let class definition
find . -type f -name "Let.java" | head -5

Repository: opensearch-project/sql

Length of output: 129


🏁 Script executed:

# Search for convert function implementations (rmcomma, rmunit, auto, hex, etc.)
rg -n "rmcomma\|rmunit\|auto\|hex" --type=java -i | grep -iE "convert|function|class" | head -50

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look for how Let expressions and functions are analyzed in PPL
rg -n "class.*PPLAnalyzer\|visitLet\|visitFunction" --type=java ppl/src/ -B 1 -A 5 | head -100

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check if there's any function registry in PPL
find ./ppl -type f -name "*.java" | xargs grep -l "FunctionRegistry\|registerFunction" | head -10

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Read the Let class to understand how it stores the function
cat -n core/src/main/java/org/opensearch/sql/ast/expression/Let.java

Repository: opensearch-project/sql

Length of output: 2384


🏁 Script executed:

# Search for where convert functions are actually defined/resolved
rg -n "rmcomma\|rmunit\|convert" --type=java ppl/src/ -i | head -30

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look for how functions are processed in PPL after AST parsing
rg -n "class PPL" --type=java ppl/src/main/java | head -10

Repository: opensearch-project/sql

Length of output: 640


🏁 Script executed:

# Check what happens in the visitor that processes Function expressions
rg -n "public.*Function\|visitFunction" --type=java core/src/main/java/org/opensearch/sql -B 1 -A 5 | head -80

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look for where Function expressions are converted to Calcite nodes
rg -n "visitFunction\|Function.*Calcite\|RexCall" --type=java ppl/src/ core/src/ -B 2 -A 3 | head -100

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Search for Calcite function resolution in core
rg -n "FunctionRegistry\|SqlOperator\|function.*toLowerCase" --type=java core/src/ -i -B 2 -A 2 | head -80

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check the grammar file for convert command
find . -type f -name "*.g4" | xargs grep -n "convert\|functionName" -i

Repository: opensearch-project/sql

Length of output: 28202


🏁 Script executed:

# Look for where functions get passed to Calcite's function registry
rg -n "getFunction\|getFunctionRegistry\|resolveFunction" --type=java -i | head -60

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Find the ident rule definition in PPL grammar
rg -n "^ident\s*:" ppl/src/main/antlr/OpenSearchPPLParser.g4 -A 3

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look for where Function expressions are processed in PPL/Calcite pipeline
rg -n "class.*ExpressionAnalyzer\|visitExpression\|Function.*visitor" ppl/src/ --type=java -B 2 -A 5 | head -80

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check for Calcite's RexBuilder or function conversion
rg -n "RexBuilder\|createCall\|getFunctionRegistry\|lookupFunction" --type=java ppl/src/ core/src/ -i -B 2 -A 3 | head -80

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look for toLowerCase in function processing
rg -n "toLowerCase\|toUpperCase\|normalize.*function" --type=java ppl/src/main/java -i -B 2 -A 2

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look at tests to see if convert functions work with mixed case
grep -r "convert.*[A-Z]" ppl/src/test --include="*.java" -i | head -20

Repository: opensearch-project/sql

Length of output: 2735


🏁 Script executed:

# Check the SQL parser to see how it handles function names
rg -n "functionName\|function.*toLowerCase" sql/src/main/java --type=java -B 2 -A 2 | head -60

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Search for AstBuilder or function visitor in core that might normalize
rg -n "visitFunction\|Function.*Visitor" core/src/main/java --type=java -B 2 -A 8 | head -100

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look for UnifiedFunctionRepository which might be the registry
cat -n api/src/main/java/org/opensearch/sql/api/function/UnifiedFunctionRepository.java | head -100

Repository: opensearch-project/sql

Length of output: 3473


🏁 Script executed:

# Read the convert test to see what function names are used and tested
cat -n ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLConvertTest.java

Repository: opensearch-project/sql

Length of output: 12886


🏁 Script executed:

# Search for where convert function names are actually resolved/validated
rg -n "auto\|rmcomma\|rmunit" --type=java ppl/src/ core/src/ -B 3 -A 3 | head -100

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check how the Function expression in the AST is processed to become a Calcite expression
rg -n "visitFunction\|ExpressionConverter" --type=java ppl/src/main/java -B 2 -A 8 | head -100

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look at the end of CalcitePPLConvertTest more carefully and check for any case-sensitive tests
tail -n 50 ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLConvertTest.java

Repository: opensearch-project/sql

Length of output: 2289


🏁 Script executed:

# Search for where Function expressions are converted to Calcite SqlCall/RexCall
rg -n "class.*RelBuilder\|class.*ExpressionConverter\|public.*visit.*Function" ppl/src/main/java --type=java -B 2 -A 10 | head -120

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look for PPLBuiltinOperators to see how convert functions are registered
find . -type f -name "PPLBuiltinOperators.java" -o -name "*BuiltinOperators.java" | head -5

Repository: opensearch-project/sql

Length of output: 150


🏁 Script executed:

# Check if there are any tests with mixed-case function names
find ppl/src/test -name "*.java" -exec grep -l "convert.*[A-Z][a-z]\|AUTO\|Auto" {} \;

Repository: opensearch-project/sql

Length of output: 615


🏁 Script executed:

# Read PPLBuiltinOperators to see how convert functions are registered
cat -n core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java | head -200

Repository: opensearch-project/sql

Length of output: 13606


🏁 Script executed:

# Look for where function names are normalized (uppercase in tests)
rg -n "toUpperCase\|toLowerCase\|getOperatorName\|getName" core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java --type=java -B 2 -A 2

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Search for where Calcite function resolution happens for PPL
rg -n "class.*PPL.*Analyzer\|PPL.*TypeResolver\|getFunction" ppl/src/main/java --type=java -B 2 -A 5 | head -80

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Continue reading PPLBuiltinOperators to see convert function registrations
sed -n '200,400p' core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java

Repository: opensearch-project/sql

Length of output: 9729


🏁 Script executed:

# Search for where AUTO, NUM, RMCOMMA, RMUNIT functions are registered
rg -n "AUTO\|RMCOMMA\|RMUNIT" core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java --type=java

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look at one of the convert function implementations
cat -n core/src/main/java/org/opensearch/sql/expression/function/udf/AutoConvertFunction.java | head -60

Repository: opensearch-project/sql

Length of output: 953


🏁 Script executed:

# Search for all tests with mixed-case convert functions
find ppl/src/test integ-test/src/test -name "*.java" -exec grep -l "convert.*[A-Z][a-z]\|convert.*AUTO\|convert.*Auto" {} \;

Repository: opensearch-project/sql

Length of output: 1359


🏁 Script executed:

# Look at how the PPLBuiltinOperators functions are registered - check for AUTO, NUM, etc.
sed -n '300,500p' core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java | grep -E "AUTO|NUM|RMCOMMA|RMUNIT|MEMK"

Repository: opensearch-project/sql

Length of output: 750


🏁 Script executed:

# Check if there's a resolver/analyzer that uses UnifiedFunctionRepository for PPL
rg -n "UnifiedFunctionRepository\|loadFunction" ppl/src/main/java --type=java -B 3 -A 3

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look for where Function nodes from AST are converted to Calcite operators
rg -n "class.*ExpressionConverter\|convertFunction\|Function.*expression" ppl/src/main/java --type=java -B 2 -A 5 | head -100

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check the ConvertCommandIT to see if there are any mixed-case tests
cat -n integ-test/src/test/java/org/opensearch/sql/ppl/ConvertCommandIT.java | head -150

Repository: opensearch-project/sql

Length of output: 2897


🏁 Script executed:

# Look for where Calcite's operator table is used in PPL
rg -n "getOperatorList\|getSqlOperator\|operatorTable" ppl/src/main/java --type=java -B 2 -A 5

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check if there's any test with mixed-case convert function names
grep -r "convert.*[A-Z][a-z]\|[Aa]uto\|[Nn]um" integ-test/src/test/java/org/opensearch/sql/ppl/ConvertCommandIT.java

Repository: opensearch-project/sql

Length of output: 939


🏁 Script executed:

# Look at PPL FunctionTypeTest
cat -n ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLFunctionTypeTest.java | head -100

Repository: opensearch-project/sql

Length of output: 4725


Normalize function name to lowercase before passing to AstDSL.function

The function name from funcCtx.functionName.getText() is passed to AstDSL.function() without normalization. While the "none" check correctly uses equalsIgnoreCase(), the function node itself uses the raw, un-normalized string. PPL's convert functions are registered in uppercase (AUTO, NUM, RMCOMMA, RMUNIT, MEMK) in PPLBuiltinOperators. Although downstream case-insensitive resolution may mask this issue, normalizing at parse time ensures consistency and prevents potential failures if the resolver behavior changes.

🛠️ Suggested fix
-    String functionName = funcCtx.functionName.getText();
+    String functionName = funcCtx.functionName.getText().toLowerCase(Locale.ROOT);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java` around lines
1221 - 1231, The parser extracts functionName via funcCtx.functionName.getText()
but passes the raw string into AstDSL.function; keep the
equalsIgnoreCase("none") check as-is but normalize the name before creating the
function node: compute a normalizedName (e.g., functionName.toLowerCase()) and
use that when calling AstDSL.function(normalizedName, fieldArg) so the Let
creation and AstDSL.function use a consistent normalized function name
(references: functionName, funcCtx.functionName.getText(), AstDSL.function,
determineTargetField, Let).

Comment on lines +1225 to +1229
if ("none".equalsIgnoreCase(functionName)) {
return fieldArg.toString().equals(targetField.getField().toString())
? null
: new Let(targetField, fieldArg);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find the Field class definition
find . -type f -name "Field.java" | head -20

Repository: opensearch-project/sql

Length of output: 198


🏁 Script executed:

# Search for the Field class in the ast/expression directory
rg -l "class Field" --type=java

Repository: opensearch-project/sql

Length of output: 946


🏁 Script executed:

# Check the determineTargetField method and surrounding context in AstBuilder.java
rg -A 20 "determineTargetField" ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java | head -50

Repository: opensearch-project/sql

Length of output: 1329


🏁 Script executed:

# Get the full context around lines 1225-1229 in AstBuilder.java
sed -n '1200,1250p' ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java

Repository: opensearch-project/sql

Length of output: 2124


🏁 Script executed:

# Read the Field class definition
cat -n ./core/src/main/java/org/opensearch/sql/ast/expression/Field.java | head -100

Repository: opensearch-project/sql

Length of output: 1572


🏁 Script executed:

# Check for Lombok annotations and full class structure
rg -A 50 "public class Field" ./core/src/main/java/org/opensearch/sql/ast/expression/Field.java

Repository: opensearch-project/sql

Length of output: 813


🏁 Script executed:

# Look for QualifiedName class to understand what targetField.getField() is
find . -type f -name "QualifiedName.java" | grep -v legacy

Repository: opensearch-project/sql

Length of output: 139


🏁 Script executed:

# Read QualifiedName class
cat -n ./core/src/main/java/org/opensearch/sql/ast/expression/QualifiedName.java | head -80

Repository: opensearch-project/sql

Length of output: 2839


🏁 Script executed:

# Check what toString() implementations exist in QualifiedName
rg -A 10 "toString()" ./core/src/main/java/org/opensearch/sql/ast/expression/QualifiedName.java

Repository: opensearch-project/sql

Length of output: 332


🏁 Script executed:

# Get more context around line 1225 to understand fieldArg type better
sed -n '1205,1235p' ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java

Repository: opensearch-project/sql

Length of output: 1256


Use funcCtx.alias == null to detect none no-op instead of fragile toString() comparison

The Field class uses Lombok's @ToString annotation, which generates a representation like Field(field=..., fieldArgs=[...]). When fieldArg is a Field and targetField.getField() is a QualifiedName (which has a custom toString() returning the plain field name), these string representations are fundamentally different and will never be equal. This means the null branch is never taken, and none(field) without an alias always produces a redundant new Let(field, field) instead of being filtered out.

Since determineTargetField returns the same field object when no alias is provided, the correct no-op condition is simply:

🛠️ Proposed fix
     if ("none".equalsIgnoreCase(functionName)) {
-      return fieldArg.toString().equals(targetField.getField().toString())
-          ? null
-          : new Let(targetField, fieldArg);
+      // When no alias, none(field) is a no-op (source == target); drop it.
+      return funcCtx.alias == null ? null : new Let(targetField, fieldArg);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java` around lines
1225 - 1229, The current check for the "none" no-op uses toString() comparison
between fieldArg and targetField.getField(), which is fragile; instead detect
the no-op by checking the function context alias (funcCtx.alias == null) because
determineTargetField returns the same field when no alias is provided. Replace
the ternary that returns null or new Let(targetField, fieldArg) with a condition
that if "none".equalsIgnoreCase(functionName) and funcCtx.alias == null then
return null, otherwise return new Let(targetField, fieldArg); update the block
referencing functionName, fieldArg, targetField (and Let) accordingly so the
redundant new Let(field, field) is not created when alias is absent.

Comment on lines +463 to +495
@Override
public String visitConvert(Convert node, String context) {
String child = node.getChild().get(0).accept(this, context);
String conversions =
node.getConversions().stream()
.map(
conversion -> {
String functionName = "";
String fields = MASK_COLUMN;
String actualSourceField = "";

if (conversion.getExpression() instanceof Function) {
Function func = (Function) conversion.getExpression();
functionName = func.getFuncName().toLowerCase(Locale.ROOT);
if (!func.getFuncArgs().isEmpty()
&& func.getFuncArgs().get(0) instanceof Field) {
actualSourceField = ((Field) func.getFuncArgs().get(0)).getField().toString();
}
fields =
func.getFuncArgs().stream()
.map(arg -> MASK_COLUMN)
.collect(Collectors.joining(","));
}

String targetField = conversion.getVar().getField().toString();

String asClause =
!targetField.equals(actualSourceField) ? " AS " + MASK_COLUMN : "";
return StringUtils.format("%s(%s)%s", functionName, fields, asClause);
})
.collect(Collectors.joining(","));
return StringUtils.format("%s | convert %s", child, conversions);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

none(field) AS alias conversions produce an empty function name in the anonymized output

When buildConversion in AstBuilder handles a none function with an alias, it stores the raw fieldArg (Field) — not a Function — as the Let expression. Here in visitConvert, the instanceof Function guard is never satisfied for that case, so functionName remains "" and the output is "(***) AS ***" instead of "none(***) AS ***".

🛠️ Suggested fix in the anonymizer
                if (conversion.getExpression() instanceof Function) {
                  Function func = (Function) conversion.getExpression();
                  functionName = func.getFuncName().toLowerCase(Locale.ROOT);
                  if (!func.getFuncArgs().isEmpty()
                      && func.getFuncArgs().get(0) instanceof Field) {
                    actualSourceField = ((Field) func.getFuncArgs().get(0)).getField().toString();
                  }
                  fields =
                      func.getFuncArgs().stream()
                          .map(arg -> MASK_COLUMN)
                          .collect(Collectors.joining(","));
+               } else if (conversion.getExpression() instanceof Field sourceField) {
+                 // none(field) AS alias — expression is the raw field, function name is "none"
+                 functionName = "none";
+                 actualSourceField = sourceField.getField().toString();
                }

Alternatively, in AstBuilder.buildConversion, always wrap the none-with-alias case in AstDSL.function("none", fieldArg) (matching the other functions), so the anonymizer sees a uniform Function node.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java`
around lines 463 - 495, In visitConvert, handle the case where
conversion.getExpression() is a Field (the "none(field) AS alias" case) so
functionName isn’t left empty: if conversion.getExpression() is a Field, set
functionName = "none", set actualSourceField to that Field's string, and set
fields to MASK_COLUMN (so the resulting format becomes "none(***) AS ***"); keep
the existing logic for Function nodes unchanged and preserve the existing
asClause/targetField logic that compares targetField to actualSourceField.

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@penghuo penghuo added PPL Piped processing language feature labels Feb 23, 2026
}

public static Object convert(Object value) {
return new AutoConvertFunction().convertValue(value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each static convert() method creates a new UDF instance per row (new AutoConvertFunction().convertValue(value)). Additionally, AutoConvertFunction.applyConversion creates new NumConvertFunction() per row at line 26. For large datasets this produces 2 object allocations per row per convert call. Since convertValue and applyConversion are stateless, use a cached singleton instance instead (e.g. private static final AutoConvertFunction INSTANCE = new AutoConvertFunction()). Same pattern applies to all 5 UDF classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use cached singleton instance instead of creating new instance every row for the 5 UDF classes

Pattern.compile("^([+-]?(?:\\d+\\.?\\d*|\\.\\d+)(?:[eE][+-]?\\d+)?)(.*)$");
private static final Pattern CONTAINS_LETTER_PATTERN = Pattern.compile(".*[a-zA-Z].*");
private static final Pattern STARTS_WITH_SIGN_OR_DIGIT = Pattern.compile("^[+-]?[\\d.].*");
private static final Pattern MEMK_PATTERN = Pattern.compile("^([+-]?\\d+\\.?\\d*)([kmgKMG])?$");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memk("512 m") (with space before unit) returns null. Is this the intended Splunk compatible behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Splunk also returns null when there is a space between the value and unit

throw new IllegalArgumentException("Convert function requires a field argument");
}

String functionName = funcCtx.functionName.getText();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert currently accepts any identifier as a function name and forwards it without semantic validation, so unsupported/non-convert functions (for example convert lower(field)) are accepted even though the command contract documents a fixed set (auto/num/rmcomma/rmunit/memk/none). This makes convert behave like a generic single-arg eval and can silently produce non-numeric output types, so function names should be rejected early unless they are in the supported conversion list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to only accept supported convert functions

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (5)
core/src/main/java/org/opensearch/sql/expression/function/udf/MemkConvertFunction.java (1)

13-18: ⚠️ Potential issue | 🟡 Minor

Add JavaDoc for the public constructor and convert entrypoint.

These public methods still lack JavaDoc with @param, @return, and @throws.
As per coding guidelines: "core/src/main/java/**/*.java: Public methods MUST have JavaDoc with @param, @return, and @throws".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/MemkConvertFunction.java`
around lines 13 - 18, Add JavaDoc for the public constructor
MemkConvertFunction() and the public static entrypoint convert(Object value):
document the constructor and the convert method with a brief description,
include `@param` for "value", `@return` describing the returned Object (and its
semantics), and `@throws` for any exceptions the method may propagate (e.g.,
IllegalArgumentException or any specific runtime exceptions thrown by
INSTANCE.convertValue). Place the JavaDoc immediately above the constructor and
the convert method declarations so they comply with the project's "Public
methods MUST have JavaDoc" rule.
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java (2)

1225-1244: ⚠️ Potential issue | 🟡 Minor

Normalize conversion function names before building the function node.

You validate using toLowerCase() but pass the raw functionName into AstDSL.function(...). If resolution is case-sensitive, CONVERT AUTO(field) could fail or diverge from registered operator naming. Normalize once and use the normalized name in the AST.

🛠️ Suggested fix
-    String functionName = funcCtx.functionName.getText();
+    String functionName = funcCtx.functionName.getText();
+    String normalizedFunctionName = functionName.toLowerCase(Locale.ROOT);
...
-    if (!SUPPORTED_CONVERSION_FUNCTIONS.contains(functionName.toLowerCase())) {
+    if (!SUPPORTED_CONVERSION_FUNCTIONS.contains(normalizedFunctionName)) {
       throw new IllegalArgumentException(
           String.format(
               "Unsupported conversion function '%s'. Supported functions are: %s",
               functionName, SUPPORTED_CONVERSION_FUNCTIONS));
     }
...
-    return new Let(targetField, AstDSL.function(functionName, fieldArg));
+    return new Let(targetField, AstDSL.function(normalizedFunctionName, fieldArg));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java` around lines
1225 - 1244, The code validates functionName using toLowerCase() against
SUPPORTED_CONVERSION_FUNCTIONS but still uses the raw functionName when building
the AST; normalize the name once (e.g., lowercased normalizedName) after reading
funcCtx.functionName.getText(), use normalizedName for the contains check, for
the "none" branch comparison, and pass normalizedName into AstDSL.function(...)
when constructing the Let node (leave usage of internalVisitExpression and
determineTargetField unchanged).

1238-1242: ⚠️ Potential issue | 🟡 Minor

Fix none no-op detection to avoid redundant Let.

The toString() comparison between fieldArg and targetField.getField() is fragile and will never match for Field vs QualifiedName. This makes none(field) always emit new Let(field, field) instead of being a no-op. Use the alias presence to detect no-op instead.

🛠️ Suggested fix
-    if ("none".equalsIgnoreCase(functionName)) {
-      return fieldArg.toString().equals(targetField.getField().toString())
-          ? null
-          : new Let(targetField, fieldArg);
-    }
+    if ("none".equalsIgnoreCase(functionName)) {
+      // When no alias is provided, none(field) is a no-op.
+      return funcCtx.alias == null ? null : new Let(targetField, fieldArg);
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java` around lines
1238 - 1242, The current "none" branch in AstBuilder uses a fragile toString()
comparison between fieldArg and targetField.getField() and thus emits new
Let(...) even for no-ops; change the check to detect a true no-op by inspecting
fieldArg's alias instead: in the if ("none".equalsIgnoreCase(functionName))
branch, return null when fieldArg has an alias that matches the target field
(i.e., detect alias presence on fieldArg and compare it to the
targetField.getField() identity/name) and only create new Let(targetField,
fieldArg) when that alias-based no-op check fails.
core/src/main/java/org/opensearch/sql/expression/function/udf/AutoConvertFunction.java (1)

13-18: ⚠️ Potential issue | 🟡 Minor

Add JavaDoc for the public constructor and convert entrypoint.

These public methods still lack JavaDoc with @param, @return, and @throws.
As per coding guidelines: "core/src/main/java/**/*.java: Public methods MUST have JavaDoc with @param, @return, and @throws".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/AutoConvertFunction.java`
around lines 13 - 18, Add JavaDoc comments for the public constructor
AutoConvertFunction() and the public static entrypoint convert(Object value):
document the parameters with `@param`, the return value with `@return`, and any
possible exceptions thrown with `@throws` (e.g., if convertValue can throw
IllegalArgumentException or other runtime exceptions) following the project's
JavaDoc guidelines; place the JavaDoc immediately above the
AutoConvertFunction() constructor and above the convert(Object) method and
ensure wording describes purpose, parameter semantics, return type, and
conditions that lead to exceptions.
core/src/main/java/org/opensearch/sql/expression/function/udf/RmcommaConvertFunction.java (1)

13-18: ⚠️ Potential issue | 🟡 Minor

Add JavaDoc for the public constructor and convert entrypoint.

These public methods still lack JavaDoc with @param, @return, and @throws.
As per coding guidelines: "core/src/main/java/**/*.java: Public methods MUST have JavaDoc with @param, @return, and @throws".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/RmcommaConvertFunction.java`
around lines 13 - 18, Add JavaDoc for the public RmcommaConvertFunction()
constructor and the public static convert(Object value) entrypoint: document
parameters with `@param` (describe "value"), describe the return value with
`@return`, and declare any runtime exceptions thrown (or document that none are
thrown) with `@throws`; reference the class RmcommaConvertFunction, the
constructor RmcommaConvertFunction(), and the convert(Object) method (which
delegates to INSTANCE.convertValue) so future readers know the delegation to
INSTANCE.convertValue and any exceptions it may propagate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/AutoConvertFunction.java`:
- Around line 13-18: Add JavaDoc comments for the public constructor
AutoConvertFunction() and the public static entrypoint convert(Object value):
document the parameters with `@param`, the return value with `@return`, and any
possible exceptions thrown with `@throws` (e.g., if convertValue can throw
IllegalArgumentException or other runtime exceptions) following the project's
JavaDoc guidelines; place the JavaDoc immediately above the
AutoConvertFunction() constructor and above the convert(Object) method and
ensure wording describes purpose, parameter semantics, return type, and
conditions that lead to exceptions.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/MemkConvertFunction.java`:
- Around line 13-18: Add JavaDoc for the public constructor
MemkConvertFunction() and the public static entrypoint convert(Object value):
document the constructor and the convert method with a brief description,
include `@param` for "value", `@return` describing the returned Object (and its
semantics), and `@throws` for any exceptions the method may propagate (e.g.,
IllegalArgumentException or any specific runtime exceptions thrown by
INSTANCE.convertValue). Place the JavaDoc immediately above the constructor and
the convert method declarations so they comply with the project's "Public
methods MUST have JavaDoc" rule.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/RmcommaConvertFunction.java`:
- Around line 13-18: Add JavaDoc for the public RmcommaConvertFunction()
constructor and the public static convert(Object value) entrypoint: document
parameters with `@param` (describe "value"), describe the return value with
`@return`, and declare any runtime exceptions thrown (or document that none are
thrown) with `@throws`; reference the class RmcommaConvertFunction, the
constructor RmcommaConvertFunction(), and the convert(Object) method (which
delegates to INSTANCE.convertValue) so future readers know the delegation to
INSTANCE.convertValue and any exceptions it may propagate.

In `@ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java`:
- Around line 1225-1244: The code validates functionName using toLowerCase()
against SUPPORTED_CONVERSION_FUNCTIONS but still uses the raw functionName when
building the AST; normalize the name once (e.g., lowercased normalizedName)
after reading funcCtx.functionName.getText(), use normalizedName for the
contains check, for the "none" branch comparison, and pass normalizedName into
AstDSL.function(...) when constructing the Let node (leave usage of
internalVisitExpression and determineTargetField unchanged).
- Around line 1238-1242: The current "none" branch in AstBuilder uses a fragile
toString() comparison between fieldArg and targetField.getField() and thus emits
new Let(...) even for no-ops; change the check to detect a true no-op by
inspecting fieldArg's alias instead: in the if
("none".equalsIgnoreCase(functionName)) branch, return null when fieldArg has an
alias that matches the target field (i.e., detect alias presence on fieldArg and
compare it to the targetField.getField() identity/name) and only create new
Let(targetField, fieldArg) when that alias-based no-op check fails.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3293010 and ac93662.

📒 Files selected for processing (7)
  • core/src/main/java/org/opensearch/sql/expression/function/udf/AutoConvertFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/MemkConvertFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/NumConvertFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/RmcommaConvertFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/RmunitConvertFunction.java
  • docs/user/ppl/cmd/convert.md
  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
🚧 Files skipped from review as they are similar to previous changes (3)
  • core/src/main/java/org/opensearch/sql/expression/function/udf/NumConvertFunction.java
  • docs/user/ppl/cmd/convert.md
  • core/src/main/java/org/opensearch/sql/expression/function/udf/RmunitConvertFunction.java

penghuo
penghuo previously approved these changes Feb 26, 2026
Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Please fix merge conflict.

Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>
@github-actions
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Add UDF conversion functions (auto, num, rmcomma, rmunit, memk)

Relevant files:

  • core/src/main/java/org/opensearch/sql/expression/function/udf/BaseConversionUDF.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/AutoConvertFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/NumConvertFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/RmcommaConvertFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/RmunitConvertFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/MemkConvertFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
  • core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java
  • core/src/test/java/org/opensearch/sql/expression/function/udf/ConversionFunctionsTest.java

Sub-PR theme: Implement PPL convert command parser, AST, and Calcite visitor

Relevant files:

  • ppl/src/main/antlr/OpenSearchPPLParser.g4
  • ppl/src/main/antlr/OpenSearchPPLLexer.g4
  • core/src/main/java/org/opensearch/sql/ast/tree/Convert.java
  • core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
  • core/src/main/java/org/opensearch/sql/analysis/Analyzer.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLConvertTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteConvertCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/ppl/ConvertCommandIT.java
  • docs/user/ppl/cmd/convert.md
  • docs/user/ppl/index.md

⚡ Recommended focus areas for review

Incorrect Pattern

The MEMK_PATTERN regex ^([+-]?\d+\.?\d*)([kmgKMG])?$ does not support decimal numbers with digits after the decimal point (e.g., "1.5g"). The pattern \d+\.?\d* matches "1." but the \d* after the dot is zero or more digits, which should work. However, the pattern does not support scientific notation for memk, which may be inconsistent. More critically, the pattern does not allow a leading dot (e.g., ".5g"), while LEADING_NUMBER_WITH_UNIT_PATTERN does allow \.\\d+. This inconsistency could cause unexpected null returns for valid inputs.

private static final Pattern MEMK_PATTERN = Pattern.compile("^([+-]?\\d+\\.?\\d*)([kmgKMG])?$");
Comma Handling Logic

In applyConversion, when a string contains commas, tryConvertWithCommaRemoval is called which removes ALL commas and tries to parse. However, if the string also has a unit suffix (e.g., "1,234 kg"), the comma removal path is tried first and may succeed with an incorrect result (e.g., "1234 kg" fails to parse as double), then the leading number extraction is tried. The ordering of these checks may produce unexpected results for strings like "1,234.56 units" — the comma removal produces "1234.56 units" which fails, then leading number extraction on the original "1,234.56 units" may behave differently than expected.

if (preprocessedValue.contains(",")) {
  result = tryConvertWithCommaRemoval(preprocessedValue);
  if (result != null) {
    return result;
  }
}

String leadingNumber = extractLeadingNumber(preprocessedValue);
if (hasValidUnitSuffix(preprocessedValue, leadingNumber)) {
  return tryParseDouble(leadingNumber);
}

return null;
Null Filtering

In visitConvertCommand, conversions that return null (from buildConversion when none() is used without an alias on the same field) are silently filtered out. This means a Convert node could be created with an empty conversions list if all conversions are none() without aliases. While visitConvert handles empty conversions gracefully, this silent filtering may hide user errors or unexpected behavior.

    ctx.convertFunction().stream()
        .map(this::buildConversion)
        .filter(conversion -> conversion != null)
        .collect(Collectors.toList());
return new Convert(conversions);
Anonymization Bug

In visitConvert, when the expression is not a Function (e.g., for none() conversions that are represented as Field in the AST after the buildConversion logic), functionName remains an empty string and actualSourceField remains empty. This results in output like "(identifier) AS identifier" for none() with alias, which is inconsistent with the expected format. The anonymizer test confirms this behavior but it looks like a formatting issue in the anonymized output.

  String functionName = "";
  String fields = MASK_COLUMN;
  String actualSourceField = "";

  if (conversion.getExpression() instanceof Function) {
    Function func = (Function) conversion.getExpression();
    functionName = func.getFuncName().toLowerCase(Locale.ROOT);
    if (!func.getFuncArgs().isEmpty()
        && func.getFuncArgs().get(0) instanceof Field) {
      actualSourceField = ((Field) func.getFuncArgs().get(0)).getField().toString();
    }
    fields =
        func.getFuncArgs().stream()
            .map(arg -> MASK_COLUMN)
            .collect(Collectors.joining(","));
  }

  String targetField = conversion.getVar().getField().toString();

  String asClause =
      !targetField.equals(actualSourceField) ? " AS " + MASK_COLUMN : "";
  return StringUtils.format("%s(%s)%s", functionName, fields, asClause);
})
Unused Enums

Several enum values are added (CTIME, MKTIME, DUR2SEC, MSTIME, NONE) that are not implemented or registered as operators in this PR. These appear to be placeholders for future work but are not documented as such, and NONE in particular is referenced in the grammar but not registered as an operator (which is intentional since none() is a no-op, but could be confusing).

AUTO(FunctionName.of("auto")),
NUM(FunctionName.of("num")),
CTIME(FunctionName.of("ctime")),
MKTIME(FunctionName.of("mktime")),
DUR2SEC(FunctionName.of("dur2sec")),
MEMK(FunctionName.of("memk")),
MSTIME(FunctionName.of("mstime")),
RMUNIT(FunctionName.of("rmunit")),
RMCOMMA(FunctionName.of("rmcomma")),
NONE(FunctionName.of("none")),

@github-actions
Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incorrect duplicate field detection for alias conversions

The seenFields set tracks source fields to prevent duplicate conversions, but in
processFieldCopyConversion (used for none() with alias), the source field is added
to seenFields. This means if a user writes convert none(balance) AS balance_copy,
auto(balance), the auto(balance) conversion will be incorrectly rejected with "Field
'balance' cannot be converted more than once", even though the original field is not
being overwritten. The seenFields check should only apply when the source field is
being replaced (i.e., target.equals(source)).

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [1024-1038]

 private void processFieldCopyConversion(
     String target, Field field, ConversionState state, CalcitePlanContext context) {
   String source = field.getField().toString();
-
-  if (state.seenFields.contains(source)) {
-    throw new SemanticCheckException(
-        String.format("Field '%s' cannot be converted more than once", source));
-  }
-  state.seenFields.add(source);
 
   if (!target.equals(source)) {
     RexNode sourceField = context.relBuilder.field(source);
     state.additions.add(Pair.of(target, context.relBuilder.alias(sourceField, target)));
   }
+  // Only track as "seen" when overwriting the source field
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion identifies a real logical issue where seenFields tracking in processFieldCopyConversion could incorrectly block valid conversions like convert none(balance) AS balance_copy, auto(balance). However, the improved code removes the seenFields tracking entirely for copy conversions, which may allow other edge cases.

Low
Fix regex pattern to support leading plus sign

The MEMK_PATTERN only matches digits without a sign prefix
(^([+-]?\d+\.?\d
)([kmgKMG])?$), but the test testMemkConvert expects
MemkConvertFunction.convert("+100") to return 100.0 and
MemkConvertFunction.convert("+50m") to return 51200.0. The pattern \d+ does not
match a leading + sign, so these cases will return null instead of the expected
values. The pattern should be updated to allow an optional leading sign.
*

core/src/main/java/org/opensearch/sql/expression/function/udf/BaseConversionUDF.java [34]

-protected Double tryConvertMemoryUnit(String str) {
-  Matcher matcher = MEMK_PATTERN.matcher(str);
-  if (!matcher.matches()) {
-    return null;
-  }
+private static final Pattern MEMK_PATTERN = Pattern.compile("^([+-]?\\d+\\.?\\d*)([kmgKMG])?$");
 
-  Double number = tryParseDouble(matcher.group(1));
-  if (number == null) {
-    return null;
-  }
-
-  String unit = matcher.group(2);
-  if (unit == null || unit.equalsIgnoreCase("k")) {
-    return number;
-  }
-
-  double multiplier =
-      switch (unit.toLowerCase()) {
-        case "m" -> MB_TO_KB;
-        case "g" -> GB_TO_KB;
-        default -> 1.0;
-      };
-
-  return number * multiplier;
-}
-
Suggestion importance[1-10]: 2

__

Why: The MEMK_PATTERN already includes [+-]? in the pattern (^([+-]?\\d+\\.?\\d*)([kmgKMG])?$), so the suggestion's claim that + is not supported is incorrect. The 'improved_code' is identical to the existing code, making this suggestion invalid.

Low
General
Avoid relying on package-private instance field

AutoConvertFunction accesses NumConvertFunction.INSTANCE which has package-private
visibility (static final NumConvertFunction INSTANCE). While this works within the
same package, it creates a tight coupling and bypasses the public API. The auto()
function should delegate to NumConvertFunction.convert() or use its own logic to
avoid relying on package-private state.

core/src/main/java/org/opensearch/sql/expression/function/udf/AutoConvertFunction.java [22-29]

 @Override
 protected Object applyConversion(String preprocessedValue) {
   Double result = tryConvertMemoryUnit(preprocessedValue);
   if (result != null) {
     return result;
   }
 
-  return NumConvertFunction.INSTANCE.applyConversion(preprocessedValue);
+  return new NumConvertFunction().applyConversion(preprocessedValue);
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion correctly identifies that NumConvertFunction.INSTANCE has package-private visibility, creating tight coupling. However, the proposed fix of creating a new NumConvertFunction() each time is not ideal either; a better approach would be to make INSTANCE public or use the static convert method.

Low
Log unexpected non-numeric conversion results

The toDoubleOrNull method silently returns null for non-Number objects, but the
convert static methods in subclasses (e.g., AutoConvertFunction.convert) can return
non-Number objects (e.g., a String). This means if applyConversion accidentally
returns a non-numeric type, the error will be silently swallowed and null will be
returned instead of surfacing the bug. Consider adding a log warning or assertion
for unexpected non-null, non-Number return values.

core/src/main/java/org/opensearch/sql/expression/function/udf/BaseConversionUDF.java [181-186]

 public static Double toDoubleOrNull(Object value) {
+  if (value == null) {
+    return null;
+  }
   if (value instanceof Number) {
     return ((Number) value).doubleValue();
   }
+  log.warn("Unexpected non-numeric value returned from conversion function: {}", value.getClass());
   return null;
 }
Suggestion importance[1-10]: 2

__

Why: This is a minor defensive logging suggestion. The subclass applyConversion methods are designed to return Object (either Double or null), and the toDoubleOrNull method is a final safety net. Adding a warning log is a minor improvement but has low impact on correctness.

Low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants