Skip to content

File written by polars.DataFrame.write_ipc read incorrectly #540

@ForceBru

Description

@ForceBru

Python code that writes the file:

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["polars<=1.21.0"]
# ///
import polars as pl

pl.DataFrame({'text': "this is some text".split()}).write_ipc("data.arrow")

Polars can read this file:

>>> import polars as pl
>>> pl.read_ipc("data.arrow")
shape: (4, 1)
┌──────┐
│ text │
│ ---  │
│ str  │
╞══════╡
│ this │
│ is   │
│ some │
│ text │
└──────┘
>>>

Arrow.jl reads garbage:

julia> import Pkg; Pkg.status()
Status `~/tmp/Project.toml`
  [69666777] Arrow v2.8.0
  [a93c6f00] DataFrames v1.7.0

julia> using DataFrames; import Arrow

julia> DataFrame(Arrow.Table("./data.arrow"))
4×1 DataFrame
 Row │ text     
     │ String?  
─────┼──────────
   1 │ W1\0\0
   2\xf2\xff
   3\v\0\b\0
   4\b\0\b\0

julia> 

Issue: this is not at all what Polars wrote to the file


Other data types are read properly:

> cat arrow_bug.py
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["polars<=1.21.0"]
# ///
from datetime import date
import polars as pl

pl.DataFrame({
    'text': "this is some text".split(),
    'date': [date(2025,1,i+1) for i in range(4)],
    'float': [float(i) for i in range(4)],
    'int': list(range(4))
}).write_ipc("dates.arrow")
> ./arrow_bug.py
> julia --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.3 (2025-01-21)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using DataFrames; import Arrow

julia> DataFrame(Arrow.Table("dates.arrow"))
4×4 DataFrame
 Row │ text      date        float     int    
     │ String?   Date?       Float64?  Int64? 
─────┼────────────────────────────────────────
   1 │ W1\0\0    2025-01-01       0.0       0
   2 │ \xf2\xff  2025-01-02       1.0       1
   3 │ \v\0\b\0  2025-01-03       2.0       2
   4 │ \b\0\b\0  2025-01-04       3.0       3

julia> 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions