Skip to content

[VL] Fix native crash in toVeloxExpr for nested field reference into array/map#12290

Open
felipepessoto wants to merge 1 commit into
apache:mainfrom
felipepessoto:fix-fieldref-nested-array-crash
Open

[VL] Fix native crash in toVeloxExpr for nested field reference into array/map#12290
felipepessoto wants to merge 1 commit into
apache:mainfrom
felipepessoto:fix-fieldref-nested-array-crash

Conversation

@felipepessoto

@felipepessoto felipepessoto commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

TL;DR fix this: https://github.com/apache/gluten/actions/runs/27487184961/job/81246596702?pr=12278
image

PR/CI with test only, before the fix: #12291 / https://github.com/apache/gluten/actions/runs/27490867910/job/81255583450?pr=12291

image

SubstraitVeloxExprConverter::toVeloxExpr(const ::substrait::Expression::FieldReference&, const RowTypePtr&) resolves a nested struct_field reference by walking the chain and descending one level at a time with inputColumnType = asRowType(inputColumnType->childAt(idx)).

When the field path traverses a non-struct child — for example a field nested under an array — asRowType() returns nullptr, and the next loop iteration dereferenced that null RowType, crashing the whole (forked) JVM with a SIGSEGV in libvelox.so:

# C  [libvelox.so+0x214951c]  gluten::SubstraitVeloxExprConverter::toVeloxExpr(substrait::Expression_FieldReference const&, std::shared_ptr<facebook::velox::RowType const> const&)+0x9c

Because a SIGSEGV is not a catchable C++ exception, SubstraitToVeloxPlanValidator (which wraps expression conversion in try/catch (VeloxException)) could not catch it and fall back — the process died outright.

This PR replaces the unchecked navigation with VELOX_USER_CHECK guards that throw a VeloxUserError:

  • the field-reference index is within range (the raw RowType::childAt/nameOf use unchecked operator[]), and
  • the child being descended into is actually a struct/row.

With these checks, an unsupported nested reference makes plan validation fail cleanly and the query falls back to vanilla Spark instead of crashing the JVM.

How was this patch tested?

Added cpp/velox/tests/SubstraitVeloxExprConverterTest.cc, a unit test that builds the exact crashing FieldReference — a reference descending into an array column — and asserts the converter now throws a VeloxUserError (via VELOX_ASSERT_THROW) instead of crashing the process; it also covers an out-of-range field index. Before this change the same input aborts the JVM with the SIGSEGV shown above.

This is the same code path hit by Delta Lake's UpdateSuite test "nested data support - analysis error - updating array type" (an UPDATE whose field path descends through an array), which is exercised by the Gluten Delta Spark UT CI: before this change the forked test JVM aborts with the SIGSEGV; after it, validation falls back to vanilla Spark and the query is handled correctly.

clang-format-15 reports no changes.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: GitHub Copilot CLI (claude-opus-4.8)

…array/map

SubstraitVeloxExprConverter::toVeloxExpr(FieldReference) resolves a nested
struct_field chain by calling asRowType() on each child to descend a level.
When the reference traverses a non-struct child -- e.g. a field nested under
an array, as exercised by Delta's
"nested data support - analysis error - updating array type" UpdateSuite test
-- asRowType() returns null and the next loop iteration dereferenced that null
RowType, crashing the whole forked JVM with a SIGSEGV in libvelox.so
(gluten::SubstraitVeloxExprConverter::toVeloxExpr). A SIGSEGV is not catchable,
so plan validation could not fall back and the test process died outright.

Replace the unchecked navigation with VELOX_USER_CHECK guards (field-index
bounds + non-null struct child) that throw a VeloxUserError. The plan validator
already wraps expression conversion in try/catch(VeloxException), so the query
now falls back to vanilla execution cleanly instead of crashing.

Add SubstraitVeloxExprConverterTest with a regression repro: a field reference
descending into an array column, and an out-of-range field index, both of which
must now throw instead of crashing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 14, 2026 06:34
@github-actions github-actions Bot added the VELOX label Jun 14, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds regression coverage and hardens Substrait field-reference conversion to fail gracefully (VeloxUserError) instead of crashing (SIGSEGV) on invalid nested paths and out-of-range indices.

Changes:

  • Add bounds checking for direct struct_field indices during field-access conversion.
  • Reject nested field descent into non-row types with a user error (instead of nullptr deref).
  • Add unit tests covering both failure modes and wire them into the Velox test target.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
cpp/velox/tests/SubstraitVeloxExprConverterTest.cc New regression tests for non-struct nested field references and out-of-range field indices.
cpp/velox/tests/CMakeLists.txt Registers the new test file in the Velox test suite.
cpp/velox/substrait/SubstraitToVeloxExpr.cc Adds range checks and non-row descent checks to prevent SIGSEGV and throw clean user errors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +220 to +224
VELOX_USER_CHECK(
idx >= 0 && static_cast<uint32_t>(idx) < inputColumnType->size(),
"Field reference index {} is out of range for the {}-field row type.",
idx,
inputColumnType->size());
::substrait::Expression::FieldReference fieldReference;
fieldReference.mutable_direct_reference()->mutable_struct_field()->set_field(5);

VELOX_ASSERT_THROW(SubstraitVeloxExprConverter::toVeloxExpr(fieldReference, inputType), "out of range");
@felipepessoto felipepessoto changed the title [VL] Fix native crash in toVeloxExpr for nested field reference into array/map [WIP] [VL] Fix native crash in toVeloxExpr for nested field reference into array/map Jun 14, 2026
@felipepessoto felipepessoto changed the title [WIP] [VL] Fix native crash in toVeloxExpr for nested field reference into array/map [VL] Fix native crash in toVeloxExpr for nested field reference into array/map Jun 14, 2026
@felipepessoto

felipepessoto commented Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

@zhztheplayer @philo-he could you review this PR? I found the issue when running the Delta CI #12278.

I'm not familiar with this code, but with the fix the SIGSEGV are gone. I also created a PR without the fix and I could repro it: #12291 / https://github.com/apache/gluten/actions/runs/27490867910/job/81255583450?pr=12291.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants