Fix Azure-incompatible file paths in PyArrowFile#2683
Fix Azure-incompatible file paths in PyArrowFile#2683NikitaMatskevich wants to merge 1 commit intoapache:mainfrom
Conversation
|
hey @NikitaMatskevich maybe we should open an issue and move the discussion there :) Im not sure if i understand the underlying issue and what is not working. Could you provide some more details? |
|
Hi @kevinjqliu , thanks for looking into it! I copy-pasted the description to the issue: #2698 and added a concrete example of what happens and why it is surely a bug. |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that's incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Rationale for this change
Starting from version 20, Pyarrow has support for Azure filesystems.
Azure table locations are typically of this format: "abfss://<bucket_name>@<account_name>.<dfs|blob>.core.windows.net//
/<file_path>". When creating a PyArrowFile, we simply retrieve table location and append table-relative path to it. This generates a path with "@<account_name>.<dfs|blob>.core.windows.net" part in it, which cannot be read/written by Pyarrow library. One has to truncate this part from Azure uris.The proposed fix is just to start a conversation around the issue. I am not 100% sure how and where this should be fixed.
We know this issue does not occur with Fsspec.
Are these changes tested?
Hard to test, because with Azurite it works fine (unlike "real" Azure, Azurite does not have this part in uris). Do you have any ideas of an integration test in mind?