Append method [in memory vs disk size]

### Apache Iceberg version

0.8.1

### Please describe the bug 🐞

During my initial exploration of append() method.
The flow of execution is roughly as follows append() -> _dataframe_to_data_files() -> bin_pack_arrow_table()

1. In the bin_pack_arrow_table, we consider its in memory size using .nbytes 
2. we use write.target-file-size-bytes this seems to consider in memory size rather than disk memory. 
3. As I loaded a 300MB parquet file as pyarrow table it occupied 21637095875 bytes in memory. after ingestion this had 50 parquet files in MinIO. [20GB//512 roughly 50 files]

It would be better if we consider disk memory of files to be written rather than in memory as considering in memory could actually result in many small files 

### Willingness to contribute

- [x] I can contribute a fix for this bug independently
- [x] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Append method [in memory vs disk size] #1994

Apache Iceberg version

Please describe the bug 🐞

Willingness to contribute

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Append method [in memory vs disk size] #1994

Description

Apache Iceberg version

Please describe the bug 🐞

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions