fix: support CHAR/VARCHAR types in SQL DDL write path by LuciferYang · Pull Request #456 · lance-format/lance-spark

LuciferYang · 2026-04-18T16:57:04Z

What

CREATE TABLE ... (v CHAR(n)) or VARCHAR(n) via a V2 catalog blows up on the write path. Two separate match misses, different blast radius.

Why

Two Scala pattern styles with different semantics:

case _: StringType   →  isInstanceOf (type test)
case StringType      →  equals()     (value comparison)

toArrowType uses case _: StringType (type test)

On Spark 4.0+, CharType/VarcharType extend StringType, so they match fine.
On 3.4/3.5 they extend AtomicType instead — no match, throws unsupportedDataTypeError.

createFieldWriter uses case (StringType, ...) (value comparison)

CharType(10).equals(StringType) is false on every Spark version:

3.4/3.5: completely different type hierarchy
4.0+: same hierarchy, but CharType carries a FixedLength constraint while the StringType singleton has NoConstraint

So even when toArrowType succeeds, the writer still fails.

Summary:

Location	3.4/3.5	4.0+
`toArrowType` (`_: StringType`)	broken	fine
`createFieldWriter` (`StringType`)	broken	broken

How

LanceArrowUtils.toArrowType: add case _: CharType | _: VarcharType branches mapping to Utf8 / LargeUtf8 (respecting largeVarTypes). On 4.0+ these are dead code.
LanceArrowWriter.createFieldWriter: add case (_: CharType | _: VarcharType, vector) branches dispatching to StringWriter / LargeStringWriter.

Length constraints are discarded — Arrow has no varchar concept.

Tests

LanceArrowUtilsSuite: 4 unit tests covering CharType/VarcharType → Arrow type mapping with largeVarTypes on/off
BaseCharVarcharRoundtripTest: E2E via V2 catalog — DDL create + insert + read back, covering CHAR, VARCHAR, mixed columns, and NULLs
CharVarcharRoundtripTest subclass in Spark 3.4 and 3.5 modules; Spark 4.0 and 4.1 inherit it automatically via shared test-source configuration

- Add BaseCharVarcharRoundtripTest with SQL DDL E2E tests through V2 catalog (CHAR, VARCHAR, and mixed columns with NULL handling) - Add CharVarcharRoundtripTest subclasses for Spark 3.4 and 3.5 modules - Add unit tests for CharType/VarcharType → Arrow Utf8/LargeUtf8 mapping in LanceArrowUtilsSuite - Remove old char/varchar tests from BaseSparkDataTypeRoundtripTest that used the DataFrame API path (which normalizes types to StringType) - Fix inaccurate comment about V2 write path behavior

The LanceArrowWriter.createFieldWriter bug (value-equality pattern `case (StringType, ...)` does not match CharType/VarcharType) affects all Spark versions, not just 3.4/3.5. Add E2E test subclasses for the 4.0 and 4.1 modules to match.

…Test

…ules These modules already compile 3.5's test sources via build-helper-maven-plugin (add-java-test-source), so adding a second copy causes "already defined" errors.

LuciferYang added 3 commits April 18, 2026 22:24

fix

f6dd61c

github-actions Bot added the bug Something isn't working label Apr 18, 2026

LuciferYang added 2 commits April 19, 2026 01:34

style: fix spotless line-length violation in BaseCharVarcharRoundtrip…

786f70d

…Test

fix: remove duplicate CharVarcharRoundtripTest from Spark 4.0/4.1 mod…

4cbc01d

…ules These modules already compile 3.5's test sources via build-helper-maven-plugin (add-java-test-source), so adding a second copy causes "already defined" errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: support CHAR/VARCHAR types in SQL DDL write path#456

fix: support CHAR/VARCHAR types in SQL DDL write path#456
LuciferYang wants to merge 5 commits intolance-format:mainfrom
LuciferYang:fix/chartype-varchar-support

LuciferYang commented Apr 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LuciferYang commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LuciferYang commented Apr 18, 2026 •

edited

Loading