Skip to content

Fix #5176: Return actual null from JSON_EXTRACT for missing/null paths#5196

Open
opensearchpplteam wants to merge 2 commits intoopensearch-project:mainfrom
opensearchpplteam:fix/issue-5176
Open

Fix #5176: Return actual null from JSON_EXTRACT for missing/null paths#5196
opensearchpplteam wants to merge 2 commits intoopensearch-project:mainfrom
opensearchpplteam:fix/issue-5176

Conversation

@opensearchpplteam
Copy link

Description

Root cause: JsonExtractFunctionImpl.doJsonize() explicitly returned the string "null" when the candidate object was null. When JSON_EXTRACT encounters a missing path or an explicit JSON null value, both Calcite's jsonQuery() and jsonValue() return Java null, and doJsonize(null) converted this to string "null" instead of propagating actual null.

Fix: Changed doJsonize() to return null (Java null) when candidate is null. The function's return type is already declared as STRING_FORCE_NULLABLE (nullable VARCHAR), so Calcite's expression framework correctly handles null returns. This is consistent with other JSON UDFs in the codebase (e.g., JsonExtractAllFunctionImpl).

Testing:

  • Unit tests: Added tests for single-path null (missing path, explicit JSON null), multi-path with missing paths, and multi-path all missing
  • Integration tests: Added end-to-end tests in CalcitePPLJsonBuiltinFunctionIT verifying null-return behavior through the REST API for both missing paths and explicit JSON null values, plus multi-path extraction with missing paths

Related Issues

#5176

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

… missing/null paths

The doJsonize() method in JsonExtractFunctionImpl was explicitly converting
null values to the string "null". When JSON_EXTRACT encounters a missing
path or an explicit JSON null value, it should return actual null instead
of the string "null".

Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>
…arch-project#5176)

Address review feedback:
- Add integration test for null-return behavior (missing path and explicit null)
- Add integration test for multi-path extraction with missing path
- Add unit tests for multi-path extraction where some/all paths are missing

Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>
@github-actions
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Fix doJsonize null handling with unit tests

Relevant files:

  • core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImpl.java
  • core/src/test/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImplTest.java

Sub-PR theme: Add integration tests for JSON_EXTRACT null behavior

Relevant files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJsonBuiltinFunctionIT.java

⚡ Recommended focus areas for review

Test Coverage Gap

The test testMultiPathWithMissingPath verifies that a missing path in a multi-path extraction returns [\"John\",null]. However, there is no test verifying that an explicit JSON null value in a multi-path extraction (e.g., {"name": null}) also returns [null] or [null, null]. This edge case should be covered to ensure consistency between single-path and multi-path behavior for explicit nulls.

public void testMultiPathWithMissingPath() {
  Object result = JsonExtractFunctionImpl.eval("{\"name\": \"John\"}", "name", "age");
  assertEquals("[\"John\",null]", result);
}
Redundant head

The integration test queries use both | head 1 before the eval and | head 1 after | fields. The second | head 1 is redundant since the first already limits to one row. This is a minor style issue but could cause confusion about intent.

"source=%s | head 1 | eval a = json_extract('{}', 'name'),"
    + " b = json_extract('{\\\"name\\\": null}', 'name')"
    + " | fields a, b | head 1",
TEST_INDEX_PEOPLE2));

@github-actions
Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Verify null handling in multi-path array serialization

The multi-path tests expect null values to be serialized as the JSON string null
within the array (e.g., "["John",null]"). However, since doJsonize now returns
Java null for missing/null values, the array serialization logic must explicitly
handle null elements and render them as the JSON token null rather than omitting
them or throwing a NullPointerException. Verify that the array-building code in eval
correctly handles null entries from doJsonize.

core/src/test/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImplTest.java [69-78]

 @Test
 public void testMultiPathWithMissingPath() {
   Object result = JsonExtractFunctionImpl.eval("{\"name\": \"John\"}", "name", "age");
+  // null entries in multi-path results should be serialized as JSON null token, not omitted
   assertEquals("[\"John\",null]", result);
 }
 
 @Test
 public void testMultiPathAllMissing() {
   Object result = JsonExtractFunctionImpl.eval("{}", "name", "age");
   assertEquals("[null,null]", result);
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion raises a valid concern about whether the array-building code in eval correctly handles Java null returned by doJsonize and serializes it as the JSON token null. However, the improved_code is functionally identical to existing_code (only adds a comment), so it doesn't actually fix the potential issue.

Low
General
Clarify expected numeric return type in test

The test expects the numeric value 30 to be returned as the string "30", but this
assumption should be verified against the actual behavior of the implementation. If
the underlying JSON parser returns an Integer or Long object, calling toString() on
it would produce "30", but if the implementation changes, this test could give a
false positive. Consider adding a comment or asserting the type explicitly to
document the expected behavior.

core/src/test/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImplTest.java [45-47]

 Object result = JsonExtractFunctionImpl.eval("{\"age\": 30}", "age");
-assertEquals("30", result);
+// Numeric values are returned as their string representation via toString()
+assertEquals("30", result, "Numeric JSON values should be returned as their string representation");
Suggestion importance[1-10]: 2

__

Why: The suggestion only adds a comment to the test, which is a minor documentation improvement. The existing_code and improved_code are functionally equivalent, and the suggestion score criteria penalizes adding comments.

Low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant