PPL `transpose` lowers via `RelBuilder.unpivot()` + `pivot()`. The unpivot
synthesizes a VALUES leaf carrying axis literals (the input field names),
e.g. `VALUES('firstname'), ('age'), ('balance')`. Calcite types each
RexLiteral as `CHAR(literalLen)` and types the VALUES column as
`CHAR(maxAxisLiteralLen)` — the longest literal wins on column-level
type inference.
This bites the analytics-engine route end-to-end:
1. After unpivot the `column` axis column is `CHAR(9)` (from "firstname").
Through Calcite's TRIM (`TO_VARYING`) it becomes `VARCHAR(9)`.
2. The `_value_transpose_` value column is built from
`CAST(input_field AS VARCHAR)` — unbounded VARCHAR.
3. `MAX(_value_transpose_)` aggCall is created with declared return type
= unbounded VARCHAR (inferred from arg 0 at call-construction time).
4. The downstream non-prefix groupSet aggregate
(`group=[{1}], MAX($0)`) splits into PARTIAL/FINAL on the analytics
path. PARTIAL hoists group keys to the output prefix, so FINAL's
`argList=[0]` reads the group-key slot — `VARCHAR(9)` — instead of
the agg-state slot. Calcite's `Aggregate.<init>` then runs
`typeMatchesInferred` and rejects the plan: declared `VARCHAR` ≠
inferred `VARCHAR(9)`.
5. Even when the aggregate validation passes, the substrait/Arrow path
sees `FixedChar(maxAxisLiteralLen)` schema vs runtime arrays whose
actual values are shorter (e.g. "age" with length 3) and trips
`Row field type (FixedChar{length=3}) does not match schema field
type (FixedChar{length=9})`.
Two fixes, both in the lowering site:
* Build every axis literal at the same `CHAR(maxAxisLiteralLen)` type.
Calcite then space-pads the shorter literals at value-construction
time, so the runtime CHAR vector and the declared schema both have
the same fixed length. The downstream TRIM strips the padding.
* Wrap the trimmed-axis group key in an explicit
`CAST(... AS VARCHAR)` to unbounded VARCHAR. This makes the group
key type match `_value_transpose_`'s unbounded VARCHAR end-to-end,
so the aggregate's row-type check sees consistent types regardless
of which side the analytics-engine split rule places the group key
on.
These have to live in sql plugin, not in the analytics-engine planner:
the typing decisions are made by Calcite's `RelBuilder.unpivot()`
implementation when it constructs the VALUES leaf — long before any
analytics-engine rule sees the plan. By the time the plan reaches the
analytics-engine route, the precision drift is already baked into the
RelDataType chain. Fixing it downstream would require pattern-matching
on transpose-shaped sub-trees inside the planner, which is fragile and
mis-attributes the root cause. The lowering author owns the type
contract for the operators it emits.
Adds:
- `testTransposeColumnAxisUsesUnboundedVarchar` regression assertion
pinning the output `column` field's type to unbounded VARCHAR. Catches
any future change that re-introduces axis-literal precision into the
group key.
- Updated plan-shape assertions across the existing transpose tests to
reflect the padded axis literals (`'cnt '`, `'COMM '`, etc.) and
the `CAST(TRIM(...) AS VARCHAR)` group key.
Verified end-to-end: `CalciteTransposeCommandIT` 5/5 pass with
`tests.analytics.parquet_indices=true`.
Signed-off-by: Songkan Tang <songkant@amazon.com>
Summary
PPL
transposelowers viaRelBuilder.unpivot() + pivot(). Calcite'sunpivotsynthesizes aLogicalValuesleaf carrying axis literals (the input field names), and types them asCHAR(maxAxisLiteralLen)— the longest literal wins on column-level type inference. That precision propagates through the rest of the lowering and breaks the analytics-engine route end-to-end.This PR makes the lowering own its type contract:
CHAR(maxAxisLiteralLen)type. Calcite then space-pads the shorter literals at value-construction time, so the runtime CHAR vector and the declared schema both have the same fixed length. The downstream TRIM strips the padding.CAST(... AS VARCHAR)to unbounded VARCHAR. This makes the group key type match_value_transpose_'s unbounded VARCHAR end-to-end, so the aggregate's row-type check sees consistent types.Why this has to be in sql plugin (not analytics-engine planner)
The typing decisions happen inside Calcite's
RelBuilder.unpivot()when it constructs the VALUES leaf — long before any analytics-engine rule sees the plan. By the time the plan reaches the analytics-engine route, the precision drift is already baked into theRelDataTypechain. Fixing it downstream would require pattern-matching on transpose-shaped sub-trees inside the planner, which is fragile and mis-attributes the root cause. The lowering author owns the type contract for the operators it emits.What goes wrong on the analytics-engine route without this fix
Concrete plan from
source=accounts | head N | fields firstname, age, balance | transpose:Type chain:
column(group key):CHAR(9)from longest literal'firstname'→ after TRIM →VARCHAR(9)_value_transpose_:CAST(... AS VARCHAR)— unboundedMAX(_value_transpose_).type: unbounded VARCHAR (inferred from arg 0 at construction)Two failure modes downstream:
Aggregate.<init>rejects the plan. Non-prefixgroupSet={1}aggregates split into PARTIAL+FINAL on the analytics path. PARTIAL hoists group keys to its output prefix, so FINAL'sargList=[0]ends up reading the group-key slot (VARCHAR(9)) instead of the agg-state slot. Calcite'stypeMatchesInferredcheck then fires:FixedCharlength mismatch at execution time. Even if the aggregate validation passes, theCHAR(maxAxisLiteralLen)propagates through isthmus → DataFusion as aFixedChar(maxAxisLiteralLen)schema. Runtime arrays carry the actual values whose lengths are shorter (e.g."age"is length 3, not 9), so DataFusion rejects:Both failures trace back to the same root cause — uneven typing in the unpivot/pivot lowering — and disappear once the lowering emits consistent types.
What changes
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java—visitTranspose:Tests
CalcitePPLTransposeTest— addedtestTransposeColumnAxisUsesUnboundedVarcharregression assertion that pins the outputcolumnfield's type to unbounded VARCHAR. This catches any future change that re-introduces axis-literal precision into the group key.'cnt ','COMM ','JOB ','SAL ', etc.)CAST(TRIM(...) AS VARCHAR)shape on the group keyGROUP BY CAST(TRIM(\column`) AS STRING)`End-to-end verification:
CalciteTransposeCommandIT5/5 pass with-Dtests.analytics.parquet_indices=true.Test plan
./gradlew :ppl:test --tests "*CalcitePPLTransposeTest"./gradlew spotlessCheckCalciteTransposeCommandIT5/5 (vs 0/5 before)