-
Notifications
You must be signed in to change notification settings - Fork 142
Description
Hi Vortex, I am not sure this is the desired behavior. For example, if we compress a LargeBinary or LargeUtf8 Arrow Array into Vortex's ConstantArray and then canonicalize it back, we will get Binary or Utf8 Arrow Array. This is because VarBinArray::from_iter always uses the u32 offsets builder:
This can be reproduced by running the round_trip_arrow_compressed test. It is ignored but Arrow now supports comparing Structs:
https://github.com/spiraldb/vortex/blob/e75606de2624a9c5b73ee0176fb56582fad9aebe/bench-vortex/src/lib.rs#L264-L268
The taxi dataset has a field store_and_fwd_flag which is mostly N. It is reasonable for a ConstantArray to just use u32 offset but if we have a ChunkedArray where the first chunk is Constant and the second chunk is not, we may have inconsistent Arrow schema between output RecordBatches? (while this may be the problem of Arrow missing a logical type)