[Feature] Calcite PPL search result highlighting by RyanL1997 · Pull Request #5141 · opensearch-project/sql

RyanL1997 · 2026-02-12T21:25:20Z

Description.

For ease of review, please reference the design doc in issue #5156.

Add request-level highlight API for PPL queries. Callers (OSD, API, CLI) can include an optional highlight JSON object in the PPL request body, which the backend forwards as-is to OpenSearch.
Highlight config is carried across the thread boundary (REST handler → sql-worker) via AbstractPlan, then set as a ThreadLocal for Calcite planning and execution.
Highlight metadata from OpenSearch hits is collected via a side-channel ThreadLocal in OpenSearchIndexEnumerator and merged back into the JDBC/SimpleJson response as a highlights array parallel to datarows.

Related Issues

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

coderabbitai · 2026-02-12T21:25:29Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds per-request PPL highlighting end-to-end: request parsing, plan-level propagation via ThreadLocal, conditional hidden _highlight column in Calcite schema, attaching HighlightBuilder to OpenSearch requests, extracting highlights from responses, and exposing per-row highlights through QueryResult/formatters with unit and integration tests.

Changes

Cohort / File(s)	Summary
Plan context & propagation `core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java`, `core/src/main/java/org/opensearch/sql/executor/execution/AbstractPlan.java`	Add ThreadLocal `highlightConfig` and plan-level `highlightConfig` field with accessors to carry highlight settings across thread boundary.
Plan execution flow `core/src/main/java/org/opensearch/sql/executor/execution/QueryPlan.java`, `core/src/main/java/org/opensearch/sql/executor/execution/ExplainPlan.java`	Set/clear CalcitePlanContext ThreadLocal around execute/explain to ensure worker thread sees plan highlight config.
PPL request surface `ppl/src/main/java/org/opensearch/sql/ppl/PPLService.java`, `ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java`	Parse `highlight` from PPL request JSON and attach it to created plan before submission.
Calcite schema / planning `opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java`, `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`	When highlight config exists, add hidden `_highlight` column (ANY) to row type and preserve it through projection/visitor logic.
OpenSearch request building `opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java`, `opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableIndexScan.java`	Add `applyHighlightConfig` to build/attach OpenSearch HighlightBuilder from CalcitePlanContext and apply it before executing requests; include in explain output.
Response handling & enumerator `opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java`, `opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java`, `opensearch/src/main/java/org/opensearch/sql/opensearch/response/OpenSearchResponse.java`	Carry `_highlight` as an opaque per-row value (excluded from schema), special-case resolution in enumerator, and centralize highlight field constant usage.
Protocol & formatting `protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java`, `protocol/src/main/java/org/opensearch/sql/protocol/response/format/JdbcResponseFormatter.java`, `protocol/src/main/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatter.java`	Add `QueryResult.highlights()` and include highlights in JDBC/Simple JSON outputs when present.
Expressions & constants `core/src/main/java/org/opensearch/sql/expression/HighlightExpression.java`	Introduce `HIGHLIGHT_FIELD` constant and use it instead of hard-coded string.
Tests (unit & integration) `core/src/test/.../ExplainPlanTest.java`, `core/src/test/.../QueryPlanTest.java`, `opensearch/src/test/.../AbstractCalciteIndexScanHighlightTest.java`, `protocol/src/test/.../*`, `ppl/src/test/.../PPLQueryRequestTest.java`, `integ-test/src/test/java/.../CalcitePPLHighlightIT.java`, `integ-test/src/yamlRestTest/resources/rest-api-spec/test/ppl_highlight.yml`	Add unit and integration tests for propagation, request-building, highlighter config application, enumerator extraction, QueryResult highlights, and formatted outputs.
Docs `docs/user/ppl/interfaces/endpoint.md`	Add Highlight section documenting `highlight` request object, parameters, examples, response format, and limitations.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant PPLService
    participant Planner as CalcitePlanner
    participant Worker as ExecThread
    participant CalciteCtx as CalcitePlanContext
    participant OpenSearch as OpenSearchNode
    participant Enumerator as OpenSearchIndexEnumerator
    participant Protocol as QueryResult/Formatter

    Client->>PPLService: POST /_plugins/_ppl (with optional "highlight")
    PPLService->>Planner: build execution plan
    PPLService->>Planner: set plan.highlightConfig
    Planner->>Worker: submit plan (worker thread)
    Worker->>CalciteCtx: setHighlightConfig(plan.highlightConfig)
    Worker->>OpenSearch: execute search (HighlightBuilder attached)
    OpenSearch-->>Enumerator: return hits with highlight fragments
    Enumerator->>Worker: produce rows including hidden _highlight value
    Worker->>Protocol: build QueryResult (extract highlights())
    Worker->>CalciteCtx: clearHighlightConfig()
    Protocol-->>Client: JSON response (optional "highlights" array)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Support spath with dynamic fields #5058 — touches projection/visitor logic in CalciteRelNodeVisitor, overlapping with hidden-field preservation changes.

Suggested labels

calcite

Suggested reviewers

LantaoJin
penghuo
ps48
kavithacm
derek-ho
joshuali925
anirudha
Swiddis
yuancu
GumpacG

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.81% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[Feature] Calcite PPL search result highlighting' directly and clearly describes the main feature being added—PPL result highlighting via Calcite.
Description check	✅ Passed	The description is related to the changeset, explaining the highlighting feature, thread-local propagation, response formatting, and referencing design docs and related issues.
Linked Issues check	✅ Passed	The PR successfully implements all key objectives from issues `#5156` and `#5059`: request-level highlight API, thread-local config propagation, conditional _highlight column, HighlightBuilder integration, highlight metadata collection, and response formatting with highlights array.
Out of Scope Changes check	✅ Passed	All changes are scoped to highlighting feature implementation: config propagation (AbstractPlan, ThreadLocal), Calcite integration (_highlight column, projection handling), OpenSearch request building, response extraction/formatting, and comprehensive tests. No unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 411-418: When highlight is enabled
(CalcitePlanContext.getHighlightConfig() != null) the highlight preservation
block in CalciteRelNodeVisitor can add HIGHLIGHT_FIELD twice into expandedFields
(duplicate of expandProjectFields' AllFields expansion); modify the logic around
currentFields/index hlIndex and expandedFields to first scan expandedFields for
an existing RexInputRef that references the same input index (or any existing
projection that already corresponds to HIGHLIGHT_FIELD) and only add
context.relBuilder.field(hlIndex) if no matching RexInputRef exists, so the
projection is not duplicated and downstream tuple.get(HIGHLIGHT_FIELD) continues
to work.

coderabbitai · 2026-02-20T04:18:09Z

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

+      // When highlight is active, include _highlight in the projection so it survives
+      // through the Calcite pipeline. This matches DSL behavior where _source filtering
+      // does not affect highlights.
+      if (CalcitePlanContext.getHighlightConfig() != null) {
+        int hlIndex = currentFields.indexOf(HIGHLIGHT_FIELD);
+        if (hlIndex >= 0) {
+          expandedFields.add(context.relBuilder.field(hlIndex));
+        }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's see the full context of the visitProject method and the code around lines 411-418 cd core/src/main/java/org/opensearch/sql/calcite && wc -l CalciteRelNodeVisitor.java

Repository: opensearch-project/sql

Length of output: 97

🏁 Script executed:

# Read the visitProject method context (expanded range to understand the logic) cd core/src/main/java/org/opensearch/sql/calcite && sed -n '380,430p' CalciteRelNodeVisitor.java

Repository: opensearch-project/sql

Length of output: 1792

🏁 Script executed:

# Search for how _highlight is used downstream (extraction/handling) rg "HIGHLIGHT_FIELD|_highlight|getHighlight" -n -A 3 -B 1 --type java

Repository: opensearch-project/sql

Length of output: 50378

🏁 Script executed:

# Check if there are similar guards against duplicate projection in the codebase rg "expandedFields|alreadyProjected|anyMatch.*RexInputRef" -n --type java

Repository: opensearch-project/sql

Length of output: 1744

🏁 Script executed:

# Search for tests related to highlight projection rg "highlight.*[Pp]roject|visitProject.*highlight" -n --type java

Repository: opensearch-project/sql

Length of output: 365

🏁 Script executed:

# Check if _highlight can be explicitly selected in user queries # Look for test cases with explicit _highlight selection rg "SELECT.*_highlight|project.*_highlight" -n --type java -B 2 -A 2 | head -50

Repository: opensearch-project/sql

Length of output: 1538

🏁 Script executed:

# Check expandProjectFields to see if it could include _highlight from user input cd core/src/main/java/org/opensearch/sql/calcite && sed -n '440,495p' CalciteRelNodeVisitor.java

Repository: opensearch-project/sql

Length of output: 2319

🏁 Script executed:

# Search for how currentFields is populated and whether _highlight would be present cd core/src/main/java/org/opensearch/sql/calcite && sed -n '393,425p' CalciteRelNodeVisitor.java

Repository: opensearch-project/sql

Length of output: 1328

🏁 Script executed:

# Check if there are tests specifically covering _highlight in projections with other fields rg "highlight.*project|project.*highlight" -n --type java -A 5 | head -60

Repository: opensearch-project/sql

Length of output: 3601

🏁 Script executed:

# Check if _highlight is classified as a metadata field rg "isMetadataField|METADATA|_highlight" -n --type java -B 2 -A 2 | grep -A 5 "isMetadataField"

Repository: opensearch-project/sql

Length of output: 4271

🏁 Script executed:

# Get the isMetadataField implementation cd core/src/main/java/org/opensearch/sql/calcite && rg "isMetadataField" -A 5 CalciteRelNodeVisitor.java

Repository: opensearch-project/sql

Length of output: 1952

🏁 Script executed:

# Check if _highlight is in METADATAFIELD_TYPE_MAP rg "METADATAFIELD_TYPE_MAP|_highlight" -n --type java | grep -B 5 -A 5 "METADATAFIELD_TYPE_MAP"

Repository: opensearch-project/sql

Length of output: 7296

🏁 Script executed:

# Find where METADATAFIELD_TYPE_MAP is defined fd "OpenSearchConstants.java" --type f | head -1 | xargs cat | head -100

Repository: opensearch-project/sql

Length of output: 1137

🏁 Script executed:

# Verify that _highlight is NOT in the metadata field map and would be included in wildcard projections cd core/src/main/java/org/opensearch/sql/opensearch/storage && cat -n OpenSearchIndex.java | sed -n '65,75p'

Repository: opensearch-project/sql

Length of output: 172

🏁 Script executed:

# Check if there are any existing tests that combine SELECT * with highlight rg "SELECT \*|AllFields" -n --type java integ-test/src/test/java/org/opensearch/sql/sql/HighlightFunctionIT.java

Repository: opensearch-project/sql

Length of output: 48

🏁 Script executed:

# Find the correct path for OpenSearchIndex.java fd "OpenSearchIndex.java" --type f

Repository: opensearch-project/sql

Length of output: 243

🏁 Script executed:

# Check the METADATAFIELD_TYPE_MAP definition to confirm _highlight is NOT in it rg "METADATAFIELD_TYPE_MAP\|_highlight" opensearch/src/main/java/org/opensearch/sql/opensearch/storage/OpenSearchIndex.java -A 10 -B 2

Repository: opensearch-project/sql

Length of output: 48

🏁 Script executed:

# Check if tests verify SELECT * with highlight scenarios rg "SELECT.*\*.*highlight|highlight.*SELECT" integ-test/ -n --type java -A 3 -B 1

Repository: opensearch-project/sql

Length of output: 48

🏁 Script executed:

# Look for expandWildcardPattern to understand if _highlight gets included rg "expandWildcardPattern|WildcardUtils" -n --type java -A 5 | head -50

Repository: opensearch-project/sql

Length of output: 5426

🏁 Script executed:

# Verify the exact scenario: when ALL fields are selected, does _highlight get added to expandedFields? # Trace through: AllFields case adds all non-metadata fields # _highlight is in currentFields (added by CalciteLogicalIndexScan when highlight config is set) # _highlight is NOT in METADATAFIELD_TYPE_MAP, so !isMetadataField returns true # So _highlight WOULD be added to expandedFields via the AllFields expansion # Then the highlight logic adds it AGAIN at line 417 # Confirm _highlight is added to schema when highlight is active rg "buildSchema|HIGHLIGHT_FIELD" opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java -B 3 -A 3

Repository: opensearch-project/sql

Length of output: 1710

🏁 Script executed:

# Check if there are integration tests that cover SELECT * with highlight fd "HighlightFunctionIT.java" --type f -exec cat {} \; | head -150

Repository: opensearch-project/sql

Length of output: 253

🏁 Script executed:

# Read the HighlightFunctionIT.java test file cat integ-test/src/test/java/org/opensearch/sql/sql/HighlightFunctionIT.java | head -200

Repository: opensearch-project/sql

Length of output: 7756

🏁 Script executed:

# Let me verify the exact duplicate scenario one more time by checking both code paths: # Path 1: AllFields expansion (lines 479-482) # Path 2: Explicit highlight logic (lines 414-417) # Confirm that expandProjectFields is called BEFORE the highlight logic cd core/src/main/java/org/opensearch/sql/calcite && sed -n '393,425p' CalciteRelNodeVisitor.java | cat -n

Repository: opensearch-project/sql

Length of output: 1559

🏁 Script executed:

# Now trace through a hypothetical SELECT * query with highlight: # 1. Project node has AllFields as projectList # 2. visitProject is called # 3. isSingleAllFieldsProject returns true # 4. handleAllFieldsProject is called (not visitProject logic at 411-420) # Check handleAllFieldsProject cd core/src/main/java/org/opensearch/sql/calcite && sed -n '424,445p' CalciteRelNodeVisitor.java

Repository: opensearch-project/sql

Length of output: 951

🏁 Script executed:

# Check what happens in the non-AllFields-only case # If someone does: SELECT name, * with highlight # Then isSingleAllFieldsProject returns FALSE # Then expandProjectFields is called # Then highlight logic runs # Verify: when is expandProjectFields called? cd core/src/main/java/org/opensearch/sql/calcite && sed -n '395,404p' CalciteRelNodeVisitor.java

Repository: opensearch-project/sql

Length of output: 397

🏁 Script executed:

# Final verification: construct the exact scenario that would trigger the bug # SELECT name, * with highlight enabled # Trace: # 1. Project has [Field("name"), AllFields()] # 2. isSingleAllFieldsProject returns FALSE (size=2, not 1) # 3. expandProjectFields processes both: # - Field "name" → added to expandedFields # - AllFields → expands to all currentFields (which includes _highlight) # 4. Then highlight logic adds _highlight AGAIN # Check if there are tests covering SELECT <field>, * scenario rg "SELECT.*,.*\*" integ-test/ core/src/test -n --type java | head -20

Repository: opensearch-project/sql

Length of output: 3138

Guard against duplicate _highlight projection when using SELECT with mixed explicit fields and wildcards.

When highlight is enabled and a user queries with mixed explicit fields and AllFields (e.g., SELECT name, *), the code can add _highlight twice to the projection:

expandProjectFields includes _highlight via AllFields expansion (since _highlight is not in METADATAFIELD_TYPE_MAP)

The highlight preservation logic at lines 414-417 adds it again unconditionally

Calcite may rename duplicate columns (e.g., _highlight0, _highlight1), which breaks downstream extraction that uses exact name matching via tuple.get(HIGHLIGHT_FIELD). The suggested guard checking for existing RexInputRef matches prevents this duplication.

🔧 Suggested guard

if (CalcitePlanContext.getHighlightConfig() != null) { int hlIndex = currentFields.indexOf(HIGHLIGHT_FIELD); - if (hlIndex >= 0) { - expandedFields.add(context.relBuilder.field(hlIndex)); + if (hlIndex >= 0) { + boolean alreadyProjected = + expandedFields.stream() + .anyMatch( + rex -> + (rex instanceof RexInputRef ref && ref.getIndex() == hlIndex) + || (rex instanceof RexCall call + && call.getKind() == SqlKind.AS + && call.getOperands().get(0) instanceof RexInputRef ref + && ref.getIndex() == hlIndex)); + if (!alreadyProjected) { + expandedFields.add(context.relBuilder.field(hlIndex)); + } } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// When highlight is active, include _highlight in the projection so it survives

// through the Calcite pipeline. This matches DSL behavior where _source filtering

// does not affect highlights.

if (CalcitePlanContext.getHighlightConfig() != null) {

int hlIndex = currentFields.indexOf(HIGHLIGHT_FIELD);

if (hlIndex >= 0) {

expandedFields.add(context.relBuilder.field(hlIndex));

}

// When highlight is active, include _highlight in the projection so it survives

// through the Calcite pipeline. This matches DSL behavior where _source filtering

// does not affect highlights.

if (CalcitePlanContext.getHighlightConfig() != null) {

int hlIndex = currentFields.indexOf(HIGHLIGHT_FIELD);

if (hlIndex >= 0) {

boolean alreadyProjected =

expandedFields.stream()

.anyMatch(

rex ->

(rex instanceof RexInputRef ref && ref.getIndex() == hlIndex)

|| (rex instanceof RexCall call

&& call.getKind() == SqlKind.AS

&& call.getOperands().get(0) instanceof RexInputRef ref

&& ref.getIndex() == hlIndex));

if (!alreadyProjected) {

expandedFields.add(context.relBuilder.field(hlIndex));

}

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` around lines 411 - 418, When highlight is enabled (CalcitePlanContext.getHighlightConfig() != null) the highlight preservation block in CalciteRelNodeVisitor can add HIGHLIGHT_FIELD twice into expandedFields (duplicate of expandProjectFields' AllFields expansion); modify the logic around currentFields/index hlIndex and expandedFields to first scan expandedFields for an existing RexInputRef that references the same input index (or any existing projection that already corresponds to HIGHLIGHT_FIELD) and only add context.relBuilder.field(hlIndex) if no matching RexInputRef exists, so the projection is not duplicated and downstream tuple.get(HIGHLIGHT_FIELD) continues to work.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

docs/user/ppl/interfaces/endpoint.md (2)
107-116: Clarify dual text and keyword highlighting behavior.

Line 108 shows both "firstname": ["<em>Holmes</em>"] and "firstname.keyword": ["<em>Holmes</em>"] in the highlights. While line 76 mentions that wildcard matching includes .keyword subfields, users may find it surprising to see both the analyzed text field and the keyword subfield highlighted for the same match.

Consider adding a brief note in the example explanation (after line 116) to clarify why both appear and when users might want to filter this (e.g., by specifying explicit fields instead of "*").
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/user/ppl/interfaces/endpoint.md` around lines 107 - 116, Add a brief
explanatory note after the example that clarifies why both analyzed and keyword
subfield highlights appear: explain that using a wildcard ("*") matches both the
analyzed field (e.g., "firstname") and its unmapped keyword subfield (e.g.,
"firstname.keyword"), causing duplicate highlight entries for the same token,
and advise that users can avoid this by specifying explicit fields to highlight
(instead of "*") or by excluding ".keyword" fields when they only want
analyzed-text highlights; reference the "highlights" example showing "firstname"
and "firstname.keyword" and mention the wildcard behavior.
136-138: Consider expanding abbreviated example output.

Using ... for brevity in Example 2 is acceptable, but it may be clearer to either:

Show the minimal complete schema (just the fields that matter for this example), or

Add a note indicating what's omitted (e.g., // schema and datarows same as Example 1).

This helps readers understand the complete structure without needing to reference Example 1.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/user/ppl/interfaces/endpoint.md` around lines 136 - 138, Example 2 in
endpoint.md uses "schema" and "datarows" with "..." which hides structure;
update the Example 2 JSON so readers can understand it by either (a) replacing
the "..." under "schema" and "datarows" with a minimal complete schema and one
or two representative datarows (include the actual field names used in Example
1), or (b) add an explicit comment/note beneath the JSON such as "// schema and
datarows same as Example 1" that points to the source of the omitted content;
locate the block containing "schema" and "datarows" in Example 2 and apply one
of these two changes to make the example self-contained.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/user/ppl/interfaces/endpoint.md`:
- Around line 102-106: Several response examples in the docs are inconsistent
about including a "status" field (the snippet with "datarows" and the very first
POST example lack it while other examples include "status": 200); update all
example response JSON blocks to include a "status": 200 field so every example
(including the initial POST example and the block containing "datarows" and the
examples around lines referenced in the review) follows the same format,
ensuring the inserted key is placed at the same level as other top-level
response fields for uniformity.
- Around line 97-106: Example 1's query ("search source=accounts \"Holmes\"") is
missing an explicit fields clause, leaving the returned columns ambiguous;
update that example to include a fields clause listing the schema fields
(firstname, lastname, address) so the expected output is deterministic and
matches the earlier POST example pattern.
- Line 163: Update the sentence to explicitly state that highlighting is
supported only for single-source queries and that its behavior with joins,
subqueries, or multi-source queries (e.g., using the operator "| join") is
unsupported and untested: clarify that the system does not guarantee an error or
consistent output in those cases and that highlighting may be silently omitted
or produce unexpected results, so users should avoid combining highlighting with
joins, subqueries, or multi-source queries.

---

Nitpick comments:
In `@docs/user/ppl/interfaces/endpoint.md`:
- Around line 107-116: Add a brief explanatory note after the example that
clarifies why both analyzed and keyword subfield highlights appear: explain that
using a wildcard ("*") matches both the analyzed field (e.g., "firstname") and
its unmapped keyword subfield (e.g., "firstname.keyword"), causing duplicate
highlight entries for the same token, and advise that users can avoid this by
specifying explicit fields to highlight (instead of "*") or by excluding
".keyword" fields when they only want analyzed-text highlights; reference the
"highlights" example showing "firstname" and "firstname.keyword" and mention the
wildcard behavior.
- Around line 136-138: Example 2 in endpoint.md uses "schema" and "datarows"
with "..." which hides structure; update the Example 2 JSON so readers can
understand it by either (a) replacing the "..." under "schema" and "datarows"
with a minimal complete schema and one or two representative datarows (include
the actual field names used in Example 1), or (b) add an explicit comment/note
beneath the JSON such as "// schema and datarows same as Example 1" that points
to the source of the omitted content; locate the block containing "schema" and
"datarows" in Example 2 and apply one of these two changes to make the example
self-contained.

coderabbitai · 2026-02-20T11:06:25Z

docs/user/ppl/interfaces/endpoint.md

+  "schema": [
+    { "name": "firstname", "type": "string" },
+    { "name": "lastname", "type": "string" },
+    { "name": "address", "type": "string" }
+  ],
+  "datarows": [
+    ["Holmes", "Morgan", "123 Main St"],
+    ["Jane", "Holmes", "456 Oak Ave"],
+    ["John", "Smith", "880 Holmes Lane"]
+  ],


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n docs/user/ppl/interfaces/endpoint.md | head -150

Repository: opensearch-project/sql

Length of output: 5348

Add explicit field selection to make the query output consistent.

The query search source=accounts "Holmes" omits a fields clause but the schema shows only three fields (firstname, lastname, address). This creates ambiguity about what fields an unrestricted query returns. Add the fields clause:

- "query": "search source=accounts \"Holmes\"", + "query": "search source=accounts \"Holmes\" | fields firstname, lastname, address",

This aligns Example 1 with the explicit field selection pattern shown in the earlier POST example at line 16 and makes the expected output complete and unambiguous.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/user/ppl/interfaces/endpoint.md` around lines 97 - 106, Example 1's query ("search source=accounts \"Holmes\"") is missing an explicit fields clause, leaving the returned columns ambiguous; update that example to include a fields clause listing the schema fields (firstname, lastname, address) so the expected output is deterministic and matches the earlier POST example pattern.

coderabbitai · 2026-02-20T11:06:25Z

docs/user/ppl/interfaces/endpoint.md

+  "datarows": [
+    ["Holmes", "Morgan", "123 Main St"],
+    ["Jane", "Holmes", "456 Oak Ave"],
+    ["John", "Smith", "880 Holmes Lane"]
+  ],


⚠️ Potential issue | 🟡 Minor

Response format inconsistency: status field.

Example 1 (line 114) includes "status": 200 in the response, but the initial POST example (lines 51-53) and Example 2 (line 146) also include "status": 200, while the very first example (lines 21-53) does not show a status field at all.

For consistency and to avoid user confusion, ensure all example responses follow the same format. Either include status in all examples or document when it appears.

As per coding guidelines, documentation should follow existing patterns and structure consistently.

Also applies to: 139-143

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/user/ppl/interfaces/endpoint.md` around lines 102 - 106, Several response examples in the docs are inconsistent about including a "status" field (the snippet with "datarows" and the very first POST example lack it while other examples include "status": 200); update all example response JSON blocks to include a "status": 200 field so every example (including the initial POST example and the block containing "datarows" and the examples around lines referenced in the review) follows the same format, ensuring the inserted key is placed at the same level as other top-level response fields for uniformity.

coderabbitai · 2026-02-20T11:06:25Z

docs/user/ppl/interfaces/endpoint.md

+- Highlighting is supported only in the Calcite engine.
+- The backend forwards the highlight config as-is to OpenSearch. The same highlighting behavior and limitations as [OpenSearch's highlighting API](https://opensearch.org/docs/latest/search-plugins/searching-data/highlight/) apply.
+- Piped commands (`where`, `sort`, `head`, `dedup`) narrow or reorder the result set but do not affect which terms are highlighted.
+- Highlighting works with **single-source queries only**, consistent with DSL where highlighting is inherently single-index per request. Behavior with joins (`| join`), subqueries, and multi-source queries is not validated.


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "*.java" -o -name "*.kt" | head -30

Repository: opensearch-project/sql

Length of output: 2727

🏁 Script executed:

# Search for test files related to highlighting fd -e java -path "*/test/*" "highlight" 2>/dev/null | head -20

Repository: opensearch-project/sql

Length of output: 48

🏁 Script executed:

# Look for highlighting implementation and how it handles joins/multi-source rg -l "highlight" --type=java -i | head -20

Repository: opensearch-project/sql

Length of output: 1766

🏁 Script executed:

# Search for join-related tests fd -e java -path "*/test/*" "join" 2>/dev/null | head -10

Repository: opensearch-project/sql

Length of output: 48

🏁 Script executed:

cat ./sql/src/test/java/org/opensearch/sql/sql/antlr/HighlightTest.java

Repository: opensearch-project/sql

Length of output: 1162

🏁 Script executed:

# Search for highlighting implementation in PPL/SQL parsers rg -C5 "highlight" --type=java ./ppl ./sql | head -100

Repository: opensearch-project/sql

Length of output: 10904

🏁 Script executed:

# Search for join-related highlighting or multi-source validation rg -C3 "highlight" --type=java | rg -C3 "join|multi.*source|subquery" | head -50

Repository: opensearch-project/sql

Length of output: 48

🏁 Script executed:

# Look at the PPL parser/domain to see highlight handling cat ./ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java 2>/dev/null | head -200

Repository: opensearch-project/sql

Length of output: 3849

🏁 Script executed:

# Search for join implementation in SQL/PPL execution rg -C5 "join" --type=java -i | grep -C3 "highlight" | head -50

Repository: opensearch-project/sql

Length of output: 638

🏁 Script executed:

# Look for any validation or error handling when processing highlights rg -C3 "highlight" --type=java | grep -C2 "validate|error|fail|check" | head -50

Repository: opensearch-project/sql

Length of output: 48

🏁 Script executed:

# Search for how multi-source or subquery scenarios handle highlighting fd -e java | xargs rg "getHighlight|highlight" | head -30

Repository: opensearch-project/sql

Length of output: 4339

🏁 Script executed:

# Check if there's any join-related test with highlight find . -name "*JoinTest.java" -o -name "*MultiSourceTest.java" | head -10

Repository: opensearch-project/sql

Length of output: 140

🏁 Script executed:

# Examine the join test file to see if highlight is tested with joins cat ./ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLJoinTest.java | head -200

Repository: opensearch-project/sql

Length of output: 9103

🏁 Script executed:

# Look for the highlight integration test to see what it covers cat ./integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLHighlightIT.java | head -300

Repository: opensearch-project/sql

Length of output: 6838

🏁 Script executed:

# Search for any restrictions or validation related to highlight and joins rg "highlight" --type=java -C3 | grep -C3 "restrict|join|single" | head -40

Repository: opensearch-project/sql

Length of output: 48

🏁 Script executed:

# Check the documentation file mentioned in the review cat ./docs/user/ppl/interfaces/endpoint.md | sed -n '155,175p'

Repository: opensearch-project/sql

Length of output: 1699

🏁 Script executed:

# Search for any validation that rejects or restricts multi-source highlights rg "highlight" --type=java -C5 | grep -C3 "SemanticCheck|validate|exception|error|throw" | head -40

Repository: opensearch-project/sql

Length of output: 48

🏁 Script executed:

# Check if there's any multi-source query validation for highlight rg "multi.*source|subquery" --type=java -C3 | grep -C2 "highlight" | head -30

Repository: opensearch-project/sql

Length of output: 48

🏁 Script executed:

# Check if there's validation in query execution or planner that prevents multi-source with highlight rg "pushDownHighlight|highlight" --type=java -C5 | grep -C3 "source|join|table" | head -50

Repository: opensearch-project/sql

Length of output: 48

🏁 Script executed:

# Look for any semantic checks or validation rules for highlight fd -e java -path "*/semantic*" -o -path "*/validator*" | head -10

Repository: opensearch-project/sql

Length of output: 234

🏁 Script executed:

# Search for where highlight is actually processed/validated rg "class.*Highlight" --type=java | head -20

Repository: opensearch-project/sql

Length of output: 1525

🏁 Script executed:

# Examine HighlightAnalyzer to see if there's validation for multi-source scenarios cat ./core/src/main/java/org/opensearch/sql/analysis/HighlightAnalyzer.java

Repository: opensearch-project/sql

Length of output: 1586

🏁 Script executed:

# Check the test that throws SemanticCheckException for highlight cat ./opensearch/src/test/java/org/opensearch/sql/opensearch/request/OpenSearchRequestBuilderTest.java | grep -C15 "SemanticCheckException.*highlight"

Repository: opensearch-project/sql

Length of output: 48

🏁 Script executed:

# Look at SQL highlight integration tests to see what's tested cat ./integ-test/src/test/java/org/opensearch/sql/sql/HighlightFunctionIT.java

Repository: opensearch-project/sql

Length of output: 7756

🏁 Script executed:

# Get more context on the SemanticCheckException test for highlight cat ./opensearch/src/test/java/org/opensearch/sql/opensearch/request/OpenSearchRequestBuilderTest.java | grep -B20 -A5 "SemanticCheckException" | grep -B20 -A5 "highlight"

Repository: opensearch-project/sql

Length of output: 2170

🏁 Script executed:

# Search for any validation that explicitly prevents multi-source highlight rg "pushDownHighlight" --type=java -B5 -A10 | head -80

Repository: opensearch-project/sql

Length of output: 9815

🏁 Script executed:

# Search for how logical plans are executed when joins are involved with highlight rg "LogicalHighlight|LogicalJoin" --type=java -C3 | head -60

Repository: opensearch-project/sql

Length of output: 7701

🏁 Script executed:

# Check if there's any rule or optimizer that handles highlight with joins fd -e java -path "*/optimizer/rule*" | xargs rg "highlight" 2>/dev/null | head -20

Repository: opensearch-project/sql

Length of output: 2768

Clarify what happens when highlighting is used with joins or multi-source queries.

The phrase "Behavior with joins (| join), subqueries, and multi-source queries is not validated" is ambiguous. Based on the codebase, highlighting with multi-source queries is genuinely untested. Clarify whether:

The system rejects these queries with an error,

Highlighting is silently omitted from results, or

Behavior is undefined and may produce unexpected results.

This helps users avoid confusion when combining highlight with joins or subqueries.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/user/ppl/interfaces/endpoint.md` at line 163, Update the sentence to explicitly state that highlighting is supported only for single-source queries and that its behavior with joins, subqueries, or multi-source queries (e.g., using the operator "| join") is unsupported and untested: clarify that the system does not guarantee an error or consistent output in those cases and that highlighting may be silently omitted or produce unexpected results, so users should avoid combining highlighting with joins, subqueries, or multi-source queries.

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

docs/user/ppl/interfaces/endpoint.md

dai-chen

Although we only expose highlight via the PPL API today, internally can we make it generic:

Make the highlight part of PPL query as early as possible behind the API, as discussed inhttps://github.com//issues/5156#issuecomment-3924993361 (highlight function/command or search command parameter?)
If not possible, can we accept arbitrary DSL and merge them with the SearchSourceBuilder already constructed in the index scan operator? That way future extensions don't each need their own end-to-end pipeline.

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-02-26T02:21:56Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 798396c.

Path	Line	Severity	Description
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java	128	medium	Unchecked type casting of user-provided highlight config without validation. Malformed JSON could cause ClassCastException or allow injection of unexpected data types into OpenSearch query builder. While using OpenSearch's type-safe API reduces injection risk, lack of schema validation on user input (fields, pre_tags, post_tags, fragment_size) could allow malformed requests to reach the database layer.
ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java	120	medium	User-provided JSON highlight object is extracted and passed through the system without validation or sanitization. The raw user input flows from HTTP request → Plan → ThreadLocal → OpenSearch request builder. While the feature appears intentional, the complete lack of input validation (structure, types, bounds) creates potential for unexpected behavior or exploitation of downstream OpenSearch highlighter vulnerabilities.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 2 | Low: 0

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-02-26T02:22:49Z

PR Reviewer Guide 🔍

(Review updated until commit `b290e06`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Thread-local context propagation infrastructure for extra search source Relevant files: core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java core/src/main/java/org/opensearch/sql/executor/execution/AbstractPlan.java core/src/main/java/org/opensearch/sql/executor/execution/ExplainPlan.java core/src/main/java/org/opensearch/sql/executor/execution/QueryPlan.java core/src/test/java/org/opensearch/sql/executor/execution/ExplainPlanTest.java core/src/test/java/org/opensearch/sql/executor/execution/QueryPlanTest.java ppl/src/main/java/org/opensearch/sql/ppl/PPLService.java ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java ppl/src/test/java/org/opensearch/sql/ppl/domain/PPLQueryRequestTest.java Sub-PR theme: Calcite plan integration for highlight column flow Relevant files: opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableIndexScan.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java opensearch/src/test/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScanExtraSearchSourceTest.java core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java Sub-PR theme: Response formatting and highlight data extraction Relevant files: core/src/main/java/org/opensearch/sql/expression/HighlightExpression.java opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java opensearch/src/main/java/org/opensearch/sql/opensearch/response/OpenSearchResponse.java protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java protocol/src/main/java/org/opensearch/sql/protocol/response/format/JdbcResponseFormatter.java protocol/src/main/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatter.java protocol/src/test/java/org/opensearch/sql/protocol/response/QueryResultTest.java protocol/src/test/java/org/opensearch/sql/protocol/response/format/JdbcResponseFormatterTest.java protocol/src/test/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatterTest.java
⚡ Recommended focus areas for review String-based highlight detection The method `hasHighlightInExtraSearchSource()` uses a simple string contains check for "highlight" which could produce false positives if the JSON contains the word "highlight" in a comment, field value, or other context. Consider parsing the JSON or using a more robust detection method. public static boolean hasHighlightInExtraSearchSource() { String extra = extraSearchSource.get(); return extra != null && extra.contains("\"highlight\""); } Exception handling The catch block logs a warning but continues execution silently when extra search source JSON parsing fails. Consider whether this should fail more explicitly or provide better feedback to the caller about why highlighting was skipped. } catch (Exception e) { LOG.warn("Failed to parse extra search source JSON, skipping: {}", e.getMessage()); } Schema modification timing The `_highlight` column is added to the schema during scan construction based on ThreadLocal state. If the ThreadLocal is not properly set before scan construction or is cleared prematurely, the schema will be inconsistent with the actual data flow. Verify the lifecycle management is robust across all execution paths. * Build the initial schema for the scan, conditionally adding a hidden {@code _highlight} column * when highlight configuration is present. This allows highlight data to flow through all Calcite * operators (filter, sort, dedup) as a regular column, avoiding positional misalignment issues. / private static RelDataType buildInitialSchema(RelOptCluster cluster, RelOptTable table) { RelDataType base = table.getRowType(); if (!CalcitePlanContext.hasHighlightInExtraSearchSource()) { return base; } RelDataTypeFactory.Builder builder = cluster.getTypeFactory().builder(); builder.addAll(base.getFieldList()); builder.add(HIGHLIGHT_FIELD, SqlTypeName.ANY); return builder.build(); } Type safety concern* The code casts `Object` to `ExprValue` without validation. If the ResultSet contains unexpected data types for the `_highlight` column, this could cause ClassCastException. Consider adding type checking before the cast. field -> { // _highlight is carried as an opaque ExprValue through the Calcite row pipeline. // It uses SqlTypeName.ANY so Calcite passes it through without conversion. if (HIGHLIGHT_FIELD.equals(field)) { Map<String, ExprValue> tuple = ExprValueUtils.getTupleValue(current); Unchecked cast warning Line 126 contains an unnecessary explicit cast `(Map<String, Object>) hlMap` since `hlMap` is already declared as `Map<String, Object>`. This cast is redundant and could be removed. })

github-actions · 2026-02-26T02:23:24Z

PR Code Suggestions ✨

Latest suggestions up to b290e06

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Close parser resource properly The parser resource is not closed after use, which can lead to resource leaks. Wrap the parser creation in a try-with-resources block to ensure proper cleanup even when exceptions occur during parsing. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java [143-152] -try { - XContentParser parser = - XContentType.JSON.xContent().createParser(X_CONTENT_REGISTRY, IGNORE_DEPRECATIONS, json); +try (XContentParser parser = + XContentType.JSON.xContent().createParser(X_CONTENT_REGISTRY, IGNORE_DEPRECATIONS, json)) { SearchSourceBuilder extra = SearchSourceBuilder.fromXContent(parser); SearchSourceBuilder target = requestBuilder.getSourceBuilder(); if (extra.highlighter() != null) { target.highlighter(extra.highlighter()); } } catch (Exception e) { LOG.warn("Failed to parse extra search source JSON, skipping: {}", e.getMessage()); } Suggestion importance[1-10]: 8 __ Why: The `XContentParser` is a closeable resource that should be properly closed to prevent resource leaks. Using try-with-resources ensures cleanup even when exceptions occur, which is a critical best practice for resource management.	Medium
General	Parse JSON instead of substring check Using string contains check for JSON field detection is fragile and can produce false positives if the substring appears in field values or comments. Parse the JSON properly to verify the presence of the highlight field. core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java [69-72] public static boolean hasHighlightInExtraSearchSource() { String extra = extraSearchSource.get(); - return extra != null && extra.contains("\"highlight\""); + if (extra == null) { + return false; + } + try { + JSONObject json = new JSONObject(extra); + return json.has("highlight"); + } catch (Exception e) { + return false; + } } Suggestion importance[1-10]: 7 __ Why: The substring check `extra.contains("\"highlight\"")` is fragile and could produce false positives if the string appears in field values. However, the current implementation is simple and performant for a quick check. Proper JSON parsing would be more robust but adds overhead.	Medium
General	Use consistent ResultSet access method The code retrieves `resultSet.getObject(i)` for highlight and then `resultSet.getObject(columnName)` for regular columns. This inconsistency could cause issues if column indices and names don't align. Use consistent access method throughout the loop. opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java [272-283] for (int i = 1; i <= columnCount; i++) { String columnName = metaData.getColumnName(i); + Object value = resultSet.getObject(i); // _highlight flows through the Calcite pipeline as a hidden column (SqlTypeName.ANY). // Extract it as an opaque ExprValue and embed it in the row tuple directly. if (HighlightExpression.HIGHLIGHT_FIELD.equals(columnName)) { - Object value = resultSet.getObject(i); if (value instanceof ExprValue hl && !hl.isMissing()) { row.put(HighlightExpression.HIGHLIGHT_FIELD, hl); } continue; } - Object value = resultSet.getObject(columnName); Suggestion importance[1-10]: 6 __ Why: The code uses `resultSet.getObject(i)` for `_highlight` but `resultSet.getObject(columnName)` for other columns. While both approaches work, using consistent access (by index) throughout the loop is cleaner and avoids potential confusion. The impact is moderate as the current code functions correctly.	Low

Previous suggestions

Suggestions up to commit 5bda888

Category	Suggestion	Impact
Possible issue	Close XContentParser to prevent resource leak The XContentParser should be explicitly closed after use to prevent resource leaks. Wrap the parser creation and usage in a try-with-resources block to ensure proper cleanup even when exceptions occur during parsing. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java [137-153] protected static void applyExtraSearchSource(OpenSearchRequestBuilder requestBuilder) { String json = CalcitePlanContext.getExtraSearchSource(); if (json == null) { return; } - try { - XContentParser parser = - XContentType.JSON.xContent().createParser(X_CONTENT_REGISTRY, IGNORE_DEPRECATIONS, json); + try (XContentParser parser = + XContentType.JSON.xContent().createParser(X_CONTENT_REGISTRY, IGNORE_DEPRECATIONS, json)) { SearchSourceBuilder extra = SearchSourceBuilder.fromXContent(parser); SearchSourceBuilder target = requestBuilder.getSourceBuilder(); if (extra.highlighter() != null) { target.highlighter(extra.highlighter()); } } catch (Exception e) { LOG.warn("Failed to parse extra search source JSON, skipping: {}", e.getMessage()); } } Suggestion importance[1-10]: 8 __ Why: `XContentParser` implements `Closeable` and should be closed after use to prevent resource leaks. Using try-with-resources ensures proper cleanup even when exceptions occur.	Medium
General	Use robust JSON parsing for field detection Using string contains for JSON field detection is fragile and can produce false positives if "highlight" appears in field values or comments. Consider parsing the JSON or using a more robust check to verify the presence of the highlight configuration object. core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java [69-72] public static boolean hasHighlightInExtraSearchSource() { String extra = extraSearchSource.get(); - return extra != null && extra.contains("\"highlight\""); + if (extra == null) { + return false; + } + try { + JSONObject json = new JSONObject(extra); + return json.has("highlight"); + } catch (Exception e) { + return false; + } } Suggestion importance[1-10]: 6 __ Why: Using `contains("\"highlight\"")` for JSON field detection is fragile and can produce false positives. However, the current implementation is simple and works for the intended use case. Parsing JSON adds overhead but is more robust.	Low
General	Avoid capturing schema in lambda closure The lambda captures `newSchema` which may lead to unintended retention of the schema object. Consider extracting the filtered field names into a local list before creating the lambda to avoid potential memory retention issues. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [299-305] +List<String> filteredFields = newSchema.getFieldNames().stream() + .filter(name -> !HIGHLIGHT_FIELD.equals(name)) + .collect(Collectors.toList()); action = (OSRequestBuilderAction) requestBuilder -> - requestBuilder.pushDownProjectStream( - newSchema.getFieldNames().stream() - .filter(name -> !HIGHLIGHT_FIELD.equals(name))); + requestBuilder.pushDownProjectStream(filteredFields.stream()); Suggestion importance[1-10]: 3 __ Why: While extracting filtered fields before the lambda could reduce memory retention, the current implementation is straightforward and the schema object is likely short-lived in this context. The improvement is marginal.	Low

Suggestions up to commit 798396c

Category	Suggestion	Impact
Possible issue	Prevent NPE on tuple access Verify that `ExprValueUtils.getTupleValue(current)` does not return null before calling `get()` on it. If `current` is not a tuple or is null, this could throw a `NullPointerException`. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java [94-106] return fields.stream() .map( field -> { - // _highlight is carried as an opaque ExprValue through the Calcite row pipeline. - // It uses SqlTypeName.ANY so Calcite passes it through without conversion. if (HIGHLIGHT_FIELD.equals(field)) { Map<String, ExprValue> tuple = ExprValueUtils.getTupleValue(current); + if (tuple == null) { + return null; + } ExprValue hl = tuple.get(HIGHLIGHT_FIELD); return (hl != null && !hl.isMissing()) ? (Object) hl : null; } return resolveForCalcite(current, field); }) .toArray(); Suggestion importance[1-10]: 6 __ Why: The suggestion adds a null check for `tuple` before calling `get()`. This is a valid defensive improvement since `ExprValueUtils.getTupleValue()` could theoretically return null if `current` is not a tuple. The check prevents a potential `NullPointerException` and improves robustness, though the likelihood of this occurring in normal operation is low.	Low
Possible issue	Add null-safety for highlight extraction Add null-safety checks before calling `hl.tupleValue()` and `entry.getValue().collectionValue()`. If `hl.tupleValue()` returns null or if `entry.getValue()` is null, the code will throw a `NullPointerException`. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [108-128] public List<Map<String, Object>> highlights() { return exprValues.stream() .map(ExprValueUtils::getTupleValue) .map( tuple -> { ExprValue hl = tuple.get(HIGHLIGHT_FIELD); if (hl == null \|\| hl.isMissing() \|\| hl.isNull()) { return null; } + Map<String, ExprValue> tupleValue = hl.tupleValue(); + if (tupleValue == null) { + return null; + } Map<String, Object> hlMap = new LinkedHashMap<>(); - for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) { - hlMap.put( - entry.getKey(), - entry.getValue().collectionValue().stream() - .map(ExprValue::stringValue) - .collect(Collectors.toList())); + for (Map.Entry<String, ExprValue> entry : tupleValue.entrySet()) { + ExprValue value = entry.getValue(); + if (value != null && !value.isMissing()) { + hlMap.put( + entry.getKey(), + value.collectionValue().stream() + .map(ExprValue::stringValue) + .collect(Collectors.toList())); + } } return (Map<String, Object>) hlMap; }) .collect(Collectors.toList()); } Suggestion importance[1-10]: 6 __ Why: The suggestion adds null checks for `tupleValue` and `entry.getValue()` before accessing their methods. These are reasonable defensive checks that prevent potential `NullPointerException` if the highlight data structure is malformed. The checks for `!value.isMissing()` before processing collection values improve robustness, though the impact is moderate since highlight data is typically well-formed.	Low
General	Add null-safety for highlight fields Add null-safety checks before casting and iterating over the `fields` map. If `fieldsObj` is null or the cast fails, the method should handle it gracefully to prevent `NullPointerException` or `ClassCastException` at runtime. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java [128-159] @SuppressWarnings("unchecked") protected static void applyHighlightConfig(OpenSearchRequestBuilder requestBuilder) { Map<String, Object> config = CalcitePlanContext.getHighlightConfig(); if (config == null) { return; } HighlightBuilder highlightBuilder = new HighlightBuilder(); Object fieldsObj = config.get("fields"); if (fieldsObj instanceof Map) { Map<String, Object> fields = (Map<String, Object>) fieldsObj; - for (String fieldName : fields.keySet()) { - highlightBuilder.field(new HighlightBuilder.Field(fieldName)); + if (fields != null) { + for (String fieldName : fields.keySet()) { + if (fieldName != null) { + highlightBuilder.field(new HighlightBuilder.Field(fieldName)); + } + } } } ... } Suggestion importance[1-10]: 5 __ Why: The suggestion adds defensive null checks for `fields` map and `fieldName`. While the `fields != null` check is redundant (already verified by `instanceof Map`), checking `fieldName != null` before creating `HighlightBuilder.Field` is a reasonable defensive practice. However, the impact is moderate since the config comes from a controlled source (PPLService) and malformed configs are unlikely in practice.	Low

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-02-26T04:32:02Z

Persistent review updated to latest commit 5bda888

RyanL1997 · 2026-02-26T07:37:19Z

Although we only expose highlight via the PPL API today, internally can we make it generic:
Make the highlight part of PPL query as early as possible behind the API, as discussed inhttps://github.com/#5156 (comment) (highlight function/command or search command parameter?)
If not possible, can we accept arbitrary DSL and merge them with the SearchSourceBuilder already constructed in the index scan operator? That way future extensions don't each need their own end-to-end pipeline.

Hi @dai-chen. Thanks for the suggestion. After checking, I went with option 2 — accept arbitrary DSL and merge it with the SearchSourceBuilder at the index scan level.

Why not option 1 (make highlight part of PPL syntax)?

The main issue is the response format. The V2 highlight() function returns highlights inline as columns in datarows:

  {
    "schema": [
      { "name": "Tags", "type": "text" },
      { "name": "highlight('Tags')", "type": "nested" }
    ],
    "datarows": [
      ["yeast home-brew", ["<em>yeast</em> home-brew"]]
    ]
  }

OSD Explore already knows how to consume highlights from DSL, where they come back as a separate metadata array parallel to the hits. Our current approach matches that shape:

  {
    "schema": [{ "name": "Tags", "type": "text" }],
    "datarows": [["yeast home-brew"]],
    "highlights": [{ "Tags": ["<em>yeast</em> home-brew"] }]
  }

If we went with the highlight() function, OSD would need to handle a fundamentally different response shape — highlights mixed into datarows as columns instead of a separate highlights array it already consumes today. The request-body approach keeps the schema/datarows untouched and gives the caller (OSD, API users, CLI) full control over highlight config without changing the PPL query text.

Current design:

The plumbing is now generic. Instead of a highlight-specific pipeline (Map<String, Object> → manual HighlightBuilder construction), we:

PPLQueryRequest.getExtraSearchSource() wraps caller-provided fields into a search-source-compatible JSON string ({"highlight": {...}})
This String flows through AbstractPlan → ThreadLocal → index scan as an opaque blob
At the index scan level, applyExtraSearchSource() parses it via SearchSourceBuilder.fromXContent() and selectively merges recognized clauses (currently: if(extra.highlighter() != null) target.highlighter(extra.highlighter()))

To add a future extension (e.g. suggest, rescore, post_filter), you would:

Add the new key to the wrapper in getExtraSearchSource()
Add one merge line in applyExtraSearchSource()

No new end-to-end plumbing needed. So please take a look again thanks!

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-02-26T07:45:46Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit b290e06.

Path	Line	Severity	Description
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java	137	medium	User-controlled JSON from request body is parsed using SearchSourceBuilder.fromXContent() without schema validation. While currently only highlighter is extracted, the full SearchSourceBuilder is parsed which could process additional OpenSearch DSL clauses if code is extended. Recommend explicit schema validation before parsing.
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java	142	low	Highlight configuration from user input is applied directly to OpenSearch requests without validation. Depending on OpenSearch script security settings, custom highlighters could potentially execute scripts. Recommend validating highlight configuration structure and disabling script-based highlighters if not required.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 1 | Low: 1

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-02-26T07:46:53Z

Persistent review updated to latest commit b290e06

RyanL1997 added enhancement New feature or request PPL Piped processing language feature labels Feb 12, 2026

RyanL1997 force-pushed the syntax-highlighting branch from 333a24b to a4d156e Compare February 18, 2026 23:04

RyanL1997 marked this pull request as ready for review February 19, 2026 00:10

RyanL1997 changed the title ~~[WIP][Feature] Calcite PPL search result highlighting~~ eiifcbncngeijlvvcecdekikvhrgrfneuccvrhjrbkju[Feature] Calcite PPL search result highlighting Feb 19, 2026

RyanL1997 changed the title ~~eiifcbncngeijlvvcecdekikvhrgrfneuccvrhjrbkju[Feature] Calcite PPL search result highlighting~~ [Feature] Calcite PPL search result highlighting Feb 19, 2026

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

RyanL1997 added 11 commits February 20, 2026 11:45

Calcite PPL search result highlighting

ac13a8a

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

[review-ready] highligting with ppl api pass-through design

3b59ea1

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

Add tests

5c1557b

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

style cleanup

40cee36

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

add another test case

fcf44a9

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix tests

71faa1f

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

peng - _highlight hiden column

1b7d1a4

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

peng - add doc

0283f28

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

peng - hiden col solution 2

97a8418

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

update doc

cc6101a

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

update docs

583b870

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

RyanL1997 force-pushed the syntax-highlighting branch from 0b4e450 to 583b870 Compare February 20, 2026 19:46

RyanL1997 added 4 commits February 23, 2026 11:53

enhance highlight IT

b95f333

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix tests

ddd292b

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

coderabbit - enhancement of tests

12672d6

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix doc tests

063f7d0

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

dai-chen reviewed Feb 25, 2026

View reviewed changes

docs/user/ppl/interfaces/endpoint.md Outdated Show resolved Hide resolved

dai-chen reviewed Feb 25, 2026

View reviewed changes

peng - fix tests with more assertions

798396c

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

Generalize highlight plumbing to generic DSL merge mechanism

5bda888

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

chen - fix doc

b290e06

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

Conversation

RyanL1997 commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description.

Related Issues

Check List

Uh oh!

coderabbitai bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dai-chen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 26, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit b290e06)

Uh oh!

github-actions bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

RyanL1997 commented Feb 26, 2026

Why not option 1 (make highlight part of PPL syntax)?

Current design:

Uh oh!

github-actions bot commented Feb 26, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RyanL1997 commented Feb 12, 2026 •

edited

Loading

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

dai-chen left a comment •

edited

Loading

github-actions bot commented Feb 26, 2026 •

edited

Loading

(Review updated until commit `b290e06`)

github-actions bot commented Feb 26, 2026 •

edited

Loading