dataset write api #1025
-
|
why is dataset is designed to write a single parquet file, i mean it throws an error if write to non empty directory into s3, but usually tables consists of many files, am i did something wrong? but i do not see an option to write multiple parquet files into a directory using dataset |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
|
Hi @dbelozerovx1, arrow-java's Dataset implementation supports writing a dataset out to multiple files like you expect, see https://github.com/apache/arrow-java/blob/main/dataset/src/test/java/org/apache/arrow/dataset/file/TestDatasetFileWriter.java#L66 for an example. If you want to share the code you have that isn't working, I can take a look. |
Beta Was this translation helpful? Give feedback.
PR review capacity is pretty limited at the moment but PRs are always welcome. Supporting all options at once would probably be the better approach rather than picking and choosing.