WIP: Initial exploration for migration from minio to seaweedfs#50104
WIP: Initial exploration for migration from minio to seaweedfs#50104raulcd wants to merge 1 commit into
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
There was a problem hiding this comment.
Pull request overview
This WIP PR starts an initial migration of Arrow’s CI/test infrastructure from MinIO to SeaweedFS for S3-compatible filesystem testing, primarily by swapping installation steps and adjusting the Python S3 test server fixture.
Changes:
- Add
ci/scripts/install_seaweedfs.shand switch multiple CI jobs/docker images to install SeaweedFS (pinned to 4.31) instead of MinIO. - Update Python S3 test fixture to start
weed serverand adjust a few S3 policy constants/comments to be less MinIO-specific. - Update C++ CI gating logic for S3 tests and related workflows (though C++ S3 tests still appear MinIO-backed today).
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| python/pyarrow/tests/util.py | Adds TODO for SeaweedFS migration in limited-user S3 helper (still MinIO/mc-based). |
| python/pyarrow/tests/test_fs.py | Renames MinIO-specific policy constant to a generic name. |
| python/pyarrow/tests/test_dataset.py | Renames policy constant and updates a comment to reference SeaweedFS. |
| python/pyarrow/tests/conftest.py | Switches the Python S3 test server fixture from MinIO to SeaweedFS (weed). |
| docs/source/developers/continuous_integration/docker.rst | Updates CI docker docs to mention SeaweedFS installer script. |
| dev/tasks/python-wheels/github.osx.yml | Switches macOS wheel CI setup from MinIO install to SeaweedFS install. |
| ci/scripts/r_install_system_dependencies.sh | Switches R CI dependency install from MinIO to SeaweedFS. |
| ci/scripts/install_seaweedfs.sh | New installer script for SeaweedFS binaries across platforms/architectures. |
| ci/scripts/cpp_test.sh | Changes S3 test exclusion gating from minio to weed. |
| ci/docker/ubuntu-24.04-cpp.dockerfile | Switches installed S3 test server dependency from MinIO to SeaweedFS. |
| ci/docker/ubuntu-24.04-cpp-minimal.dockerfile | Switches installed S3 test server dependency from MinIO to SeaweedFS. |
| ci/docker/ubuntu-22.04-cpp.dockerfile | Switches installed S3 test server dependency from MinIO to SeaweedFS. |
| ci/docker/ubuntu-22.04-cpp-minimal.dockerfile | Switches installed S3 test server dependency from MinIO to SeaweedFS. |
| ci/docker/linux-r.dockerfile | Copies SeaweedFS installer instead of MinIO installer into the R image build context. |
| ci/docker/linux-apt-r.dockerfile | Switches installed S3 test server dependency from MinIO to SeaweedFS. |
| ci/docker/fedora-42-r-clang.dockerfile | Copies SeaweedFS installer instead of MinIO installer into the R image build context. |
| ci/docker/fedora-42-cpp.dockerfile | Switches installed S3 test server dependency from MinIO to SeaweedFS. |
| ci/docker/debian-experimental-cpp.dockerfile | Switches installed S3 test server dependency from MinIO to SeaweedFS. |
| ci/docker/debian-13-cpp.dockerfile | Switches installed S3 test server dependency from MinIO to SeaweedFS. |
| ci/docker/conda-cpp.dockerfile | Switches installed S3 test server dependency from MinIO to SeaweedFS. |
| ci/docker/alpine-linux-3.22-r.dockerfile | Copies SeaweedFS installer instead of MinIO installer into the R image build context. |
| ci/docker/alpine-linux-3.22-cpp.dockerfile | Switches installed S3 test server dependency from MinIO to SeaweedFS. |
| .github/workflows/r.yml | Updates workflow paths and install step to use SeaweedFS installer. |
| .github/workflows/r_extra.yml | Updates workflow paths to use SeaweedFS installer. |
| .github/workflows/python.yml | Switches macOS Python CI setup from MinIO install to SeaweedFS install. |
| .github/workflows/cpp.yml | Switches workflow path filters and macOS install step to SeaweedFS; Windows MinGW job still downloads MinIO with a TODO. |
| .github/workflows/cpp_extra.yml | Updates workflow path filters to use SeaweedFS installer. |
Comments suppressed due to low confidence (1)
python/pyarrow/tests/util.py:380
- With the S3 test server switching away from MinIO, this helper can now try to run
mc admin ...against a non-MinIO backend (SeaweedFS) whenminio/mchappen to be installed locally, leading to hard test failures instead of a clear skip. It would be safer to explicitly detect that the spawned server process is MinIO before attempting MinIO admin configuration, and otherwise skip with an explicit reason until SeaweedFS policy/user setup is implemented.
# TODO: Migrate this to match seaweedfs not minio.
def _configure_s3_limited_user(s3_server, policy, username, password):
"""
Attempts to use the mc command to configure the minio server
with a special user limited:limited123 which does not have
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if ! type weed >/dev/null 2>&1; then | ||
| exclude_tests+=("arrow-s3fs-test") | ||
| fi |
| if [ "$ARROW_S3" == "ON" ] && [ -f "${ARROW_SOURCE_HOME}/ci/scripts/install_seaweedfs.sh" ] && [ "`which wget`" ]; then | ||
| "${ARROW_SOURCE_HOME}/ci/scripts/install_seaweedfs.sh" 4.31 /usr/local | ||
| fi |
TBD
This is just an initial CI check to see failures
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
This PR includes breaking changes to public APIs. (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.)
This PR contains a "Critical Fix". (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)