Iceberg-rust 0.3.0
The main objective of 0.3.0 is to have a working read path (non-exhaustive list :)
Blocking issues:
Nice to have (related to the query plan optimizations above):
State of catalog integration:
For the release after that, I think the commit path is going to be important.
Iceberg-rust 0.4.0 and beyond
Nice to have for the 0.3.0 release, but not required. Of course, open for debate.
Commit path
The commit path entails writing a new metadata JSON.
Metadata tables
Metadata tables are used to inspect the table. Having these tables also allows easy implementation of the maintenance procedures since you can easily list all the snapshots, and expire the ones that are older than a certain threshold.
Write support
Most of the work in write support is around generating the correct Iceberg metadata. Some decisions can be made, for example first supporting only FastAppends, and only V2 metadata.
It is common to have multiple snapshots in a single commit to the catalog. For example, an overwrite operation of a partition can be a delete + append operation. This makes the implementation easier since you can separate the problems, and tackle them one by one. Also, for the roadmap it makes it easier since their operations can be developed in parallel.
Future topics
Contribute
If you want to contribute to the upcoming milestone, feel free to comment on this issue. If there is anything unclear or missing, feel free to reach out here as well 👍
Iceberg-rust 0.3.0
The main objective of 0.3.0 is to have a working read path (non-exhaustive list :)
field_summary: Skipping data on the highest level by pruning away manifests:ManifestEvaluator, used to filter manifests in table scans #322TableScanin flight by @sdd in Implement manifest filtering inTableScan#323102: partitionstructExpressionEvaluator#358partition-specschema to the102: partitionstruct and evaluates it.TableScanInclusiveMetricsEvaluator#347ManifestEvaluator, used to filter manifests in table scans #322partition_filtersfromManifestEvaluator#360fn plan_files()#362TableScanBlocking issues:
org.apache.iceberg:iceberg-spark-runtime-3.5_2.13:1.5.0#338field-id's missing in generated Avro files #353Nullinstead of-1#352Nice to have (related to the query plan optimizations above):
DELETEmanifests that contain unrelated delete files.(Tracking issues of aligning storage support with iceberg-java #408)
State of catalog integration:
For the release after that, I think the commit path is going to be important.
Iceberg-rust 0.4.0 and beyond
Nice to have for the 0.3.0 release, but not required. Of course, open for debate.
Commit path
The commit path entails writing a new metadata JSON.
Metadata tables
Metadata tables are used to inspect the table. Having these tables also allows easy implementation of the maintenance procedures since you can easily list all the snapshots, and expire the ones that are older than a certain threshold.
Write support
Most of the work in write support is around generating the correct Iceberg metadata. Some decisions can be made, for example first supporting only FastAppends, and only V2 metadata.
It is common to have multiple snapshots in a single commit to the catalog. For example, an overwrite operation of a partition can be a delete + append operation. This makes the implementation easier since you can separate the problems, and tackle them one by one. Also, for the roadmap it makes it easier since their operations can be developed in parallel.
Future topics
Contribute
If you want to contribute to the upcoming milestone, feel free to comment on this issue. If there is anything unclear or missing, feel free to reach out here as well 👍