Skip to content

fix: support CHAR/VARCHAR types in SQL DDL write path#456

Open
LuciferYang wants to merge 5 commits intolance-format:mainfrom
LuciferYang:fix/chartype-varchar-support
Open

fix: support CHAR/VARCHAR types in SQL DDL write path#456
LuciferYang wants to merge 5 commits intolance-format:mainfrom
LuciferYang:fix/chartype-varchar-support

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

@LuciferYang LuciferYang commented Apr 18, 2026

What

CREATE TABLE ... (v CHAR(n)) or VARCHAR(n) via a V2 catalog blows up on the write path. Two separate match misses, different blast radius.

Why

Two Scala pattern styles with different semantics:

case _: StringType   →  isInstanceOf (type test)
case StringType      →  equals()     (value comparison)

toArrowType uses case _: StringType (type test)

On Spark 4.0+, CharType/VarcharType extend StringType, so they match fine.
On 3.4/3.5 they extend AtomicType instead — no match, throws unsupportedDataTypeError.

createFieldWriter uses case (StringType, ...) (value comparison)

CharType(10).equals(StringType) is false on every Spark version:

  • 3.4/3.5: completely different type hierarchy
  • 4.0+: same hierarchy, but CharType carries a FixedLength constraint while the StringType singleton has NoConstraint

So even when toArrowType succeeds, the writer still fails.

Summary:

Location 3.4/3.5 4.0+
toArrowType (_: StringType) broken fine
createFieldWriter (StringType) broken broken

How

  • LanceArrowUtils.toArrowType: add case _: CharType | _: VarcharType branches mapping to Utf8 / LargeUtf8 (respecting largeVarTypes). On 4.0+ these are dead code.
  • LanceArrowWriter.createFieldWriter: add case (_: CharType | _: VarcharType, vector) branches dispatching to StringWriter / LargeStringWriter.

Length constraints are discarded — Arrow has no varchar concept.

Tests

  • LanceArrowUtilsSuite: 4 unit tests covering CharType/VarcharType → Arrow type mapping with largeVarTypes on/off
  • BaseCharVarcharRoundtripTest: E2E via V2 catalog — DDL create + insert + read back, covering CHAR, VARCHAR, mixed columns, and NULLs
  • CharVarcharRoundtripTest subclass in Spark 3.4 and 3.5 modules; Spark 4.0 and 4.1 inherit it automatically via shared test-source configuration

- Add BaseCharVarcharRoundtripTest with SQL DDL E2E tests through V2 catalog
  (CHAR, VARCHAR, and mixed columns with NULL handling)
- Add CharVarcharRoundtripTest subclasses for Spark 3.4 and 3.5 modules
- Add unit tests for CharType/VarcharType → Arrow Utf8/LargeUtf8 mapping
  in LanceArrowUtilsSuite
- Remove old char/varchar tests from BaseSparkDataTypeRoundtripTest that
  used the DataFrame API path (which normalizes types to StringType)
- Fix inaccurate comment about V2 write path behavior
The LanceArrowWriter.createFieldWriter bug (value-equality pattern
`case (StringType, ...)` does not match CharType/VarcharType) affects
all Spark versions, not just 3.4/3.5. Add E2E test subclasses for
the 4.0 and 4.1 modules to match.
@github-actions github-actions Bot added the bug Something isn't working label Apr 18, 2026
…ules

These modules already compile 3.5's test sources via build-helper-maven-plugin
(add-java-test-source), so adding a second copy causes "already defined" errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant