Skip to content

Conversation

@paultiq
Copy link
Contributor

@paultiq paultiq commented Dec 17, 2025

Fixes #5759

This PR maps string_views to strings and binary_views to binary so that to_substrait will no longer raise ArrowNotImplementedError when constructing the substrait.

Vortex supports expressions over views and Arrow compute doesn't, but to_substrait raises ArrowNotImplementedError based on Arrow compute kernels... regardless of the backend.

I, Paul Timmins <paul@iqmo.com>, hereby add my Signed-off-by to this commit: 8ae0c04

Signed-off-by: Paul Timmins <paul@iqmo.com>
@connortsui20 connortsui20 requested a review from danking December 17, 2025 23:26
@codecov
Copy link

codecov bot commented Dec 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.95%. Comparing base (41bb2bf) to head (efdf7fe).
⚠️ Report is 1 commits behind head on develop.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great and we should merge! Can you follow up with a test that writes and reads through DuckDB a Vortex file containing strings? I think paultiq included one at one point in one of his GitHub issues.

@paultiq
Copy link
Contributor Author

paultiq commented Dec 18, 2025

@danking Do you want that test case in this PR or a separate? Happy to do either.

@danking
Copy link
Contributor

danking commented Dec 18, 2025

Separate is fine! Sorry I referred to you in the third person! GitHub mobile UI had me turned around haha

  I, Paul Timmins <paul@iqmo.com>, hereby add my Signed-off-by to this commit: de97c5b

  Signed-off-by: Paul Timmins <paul@iqmo.com>
I, Paul Timmins <paul@iqmo.com>, hereby add my Signed-off-by to this commit: de97c5b
I, Paul Timmins <paul@iqmo.com>, hereby add my Signed-off-by to this commit: 5214824

Signed-off-by: Paul Timmins <paul@iqmo.com>
@joseph-isaacs
Copy link
Contributor

@paultiq think we can get this over the line

@paultiq
Copy link
Contributor Author

paultiq commented Jan 5, 2026

@paultiq think we can get this over the line

AFAIK, this is ready to merge, was there something else for this PR?

@joseph-isaacs
Copy link
Contributor

The CI / Python lint is failing, we need to fix that

auto-merge was automatically disabled January 7, 2026 14:23

Head branch was pushed to by a user without write access

@joseph-isaacs
Copy link
Contributor

Why did you ignore all the warnings?

@paultiq
Copy link
Contributor Author

paultiq commented Jan 7, 2026

Why did you ignore all the warnings?

The root reason is lack of type stubs for pyarrow, which is being addressed in apache/arrow#32609

The ignores are consistent with the existing code:

substrait_object = ExtendedExpression() # pyright: ignore[reportUnknownVariableType]
substrait_object.ParseFromString(arrow_expression.to_substrait(schema)) # pyright: ignore[reportUnknownMemberType]
expressions = extended_expression(substrait_object) # pyright: ignore[reportUnknownArgumentType]

And existing test cases:

for field in vxf.dtype.to_arrow_schema() # pyright: ignore[reportUnknownVariableType]
if _has_mean(field.type) # pyright: ignore[reportUnknownMemberType, reportUnknownArgumentType]

(I'll fix that DCO in a sec...)

@paultiq
Copy link
Contributor Author

paultiq commented Jan 7, 2026

The one question would be whether the direct test of _schema_for_substrait is OK (hence the reportPrivateUsage).
reportPrivateUsage is used in a few places in the project, although not in any test cases, so that'd be a judgement call.

test_expression.py tests both paths: the "private" _schema_for_substrait, as well as the "public" arrow_to_vortex path.

So, the choice would be either to allow the ignore: reportPrivateUsage or drop the direct test of _schema_for_substrait.

I, Paul Timmins <paul@iqmo.com>, hereby add my Signed-off-by to this
commit: d80d54d

Signed-off-by: Paul Timmins <paul@iqmo.com>
I, Paul Timmins <paul@iqmo.com>, hereby add my Signed-off-by to this
commit: d53be9c

Signed-off-by: Paul Timmins <paul@iqmo.com>
@paultiq
Copy link
Contributor Author

paultiq commented Jan 7, 2026

(sorry for the multiple DCO commits, just not in my normal habits)

The remaining DCO warning is for 3107e20, @danking's commit.

@danking danking merged commit 8ffca2a into vortex-data:develop Jan 7, 2026
46 of 47 checks passed
@danking
Copy link
Contributor

danking commented Jan 7, 2026

Thank you @paultiq! Sorry for the friction in our process currently. I'll keep an eye on the pyarrow types issue, thanks for flagging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Arrow Expressions on Vortex Datasets raise ArrowNotImplementedError on string_views

3 participants