Skip to content

WIP: Initial exploration for migration from minio to seaweedfs#50104

Draft
raulcd wants to merge 1 commit into
apache:mainfrom
raulcd:initial-exploration-seaweedfs
Draft

WIP: Initial exploration for migration from minio to seaweedfs#50104
raulcd wants to merge 1 commit into
apache:mainfrom
raulcd:initial-exploration-seaweedfs

Conversation

@raulcd
Copy link
Copy Markdown
Member

@raulcd raulcd commented Jun 5, 2026

TBD

This is just an initial CI check to see failures

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

This PR includes breaking changes to public APIs. (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.)

This PR contains a "Critical Fix". (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)

Copilot AI review requested due to automatic review settings June 5, 2026 10:27
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This WIP PR starts an initial migration of Arrow’s CI/test infrastructure from MinIO to SeaweedFS for S3-compatible filesystem testing, primarily by swapping installation steps and adjusting the Python S3 test server fixture.

Changes:

  • Add ci/scripts/install_seaweedfs.sh and switch multiple CI jobs/docker images to install SeaweedFS (pinned to 4.31) instead of MinIO.
  • Update Python S3 test fixture to start weed server and adjust a few S3 policy constants/comments to be less MinIO-specific.
  • Update C++ CI gating logic for S3 tests and related workflows (though C++ S3 tests still appear MinIO-backed today).

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
python/pyarrow/tests/util.py Adds TODO for SeaweedFS migration in limited-user S3 helper (still MinIO/mc-based).
python/pyarrow/tests/test_fs.py Renames MinIO-specific policy constant to a generic name.
python/pyarrow/tests/test_dataset.py Renames policy constant and updates a comment to reference SeaweedFS.
python/pyarrow/tests/conftest.py Switches the Python S3 test server fixture from MinIO to SeaweedFS (weed).
docs/source/developers/continuous_integration/docker.rst Updates CI docker docs to mention SeaweedFS installer script.
dev/tasks/python-wheels/github.osx.yml Switches macOS wheel CI setup from MinIO install to SeaweedFS install.
ci/scripts/r_install_system_dependencies.sh Switches R CI dependency install from MinIO to SeaweedFS.
ci/scripts/install_seaweedfs.sh New installer script for SeaweedFS binaries across platforms/architectures.
ci/scripts/cpp_test.sh Changes S3 test exclusion gating from minio to weed.
ci/docker/ubuntu-24.04-cpp.dockerfile Switches installed S3 test server dependency from MinIO to SeaweedFS.
ci/docker/ubuntu-24.04-cpp-minimal.dockerfile Switches installed S3 test server dependency from MinIO to SeaweedFS.
ci/docker/ubuntu-22.04-cpp.dockerfile Switches installed S3 test server dependency from MinIO to SeaweedFS.
ci/docker/ubuntu-22.04-cpp-minimal.dockerfile Switches installed S3 test server dependency from MinIO to SeaweedFS.
ci/docker/linux-r.dockerfile Copies SeaweedFS installer instead of MinIO installer into the R image build context.
ci/docker/linux-apt-r.dockerfile Switches installed S3 test server dependency from MinIO to SeaweedFS.
ci/docker/fedora-42-r-clang.dockerfile Copies SeaweedFS installer instead of MinIO installer into the R image build context.
ci/docker/fedora-42-cpp.dockerfile Switches installed S3 test server dependency from MinIO to SeaweedFS.
ci/docker/debian-experimental-cpp.dockerfile Switches installed S3 test server dependency from MinIO to SeaweedFS.
ci/docker/debian-13-cpp.dockerfile Switches installed S3 test server dependency from MinIO to SeaweedFS.
ci/docker/conda-cpp.dockerfile Switches installed S3 test server dependency from MinIO to SeaweedFS.
ci/docker/alpine-linux-3.22-r.dockerfile Copies SeaweedFS installer instead of MinIO installer into the R image build context.
ci/docker/alpine-linux-3.22-cpp.dockerfile Switches installed S3 test server dependency from MinIO to SeaweedFS.
.github/workflows/r.yml Updates workflow paths and install step to use SeaweedFS installer.
.github/workflows/r_extra.yml Updates workflow paths to use SeaweedFS installer.
.github/workflows/python.yml Switches macOS Python CI setup from MinIO install to SeaweedFS install.
.github/workflows/cpp.yml Switches workflow path filters and macOS install step to SeaweedFS; Windows MinGW job still downloads MinIO with a TODO.
.github/workflows/cpp_extra.yml Updates workflow path filters to use SeaweedFS installer.
Comments suppressed due to low confidence (1)

python/pyarrow/tests/util.py:380

  • With the S3 test server switching away from MinIO, this helper can now try to run mc admin ... against a non-MinIO backend (SeaweedFS) when minio/mc happen to be installed locally, leading to hard test failures instead of a clear skip. It would be safer to explicitly detect that the spawned server process is MinIO before attempting MinIO admin configuration, and otherwise skip with an explicit reason until SeaweedFS policy/user setup is implemented.
# TODO: Migrate this to match seaweedfs not minio.
def _configure_s3_limited_user(s3_server, policy, username, password):
    """
    Attempts to use the mc command to configure the minio server
    with a special user limited:limited123 which does not have

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ci/scripts/cpp_test.sh
Comment on lines +53 to 55
if ! type weed >/dev/null 2>&1; then
exclude_tests+=("arrow-s3fs-test")
fi
Comment on lines +56 to 58
if [ "$ARROW_S3" == "ON" ] && [ -f "${ARROW_SOURCE_HOME}/ci/scripts/install_seaweedfs.sh" ] && [ "`which wget`" ]; then
"${ARROW_SOURCE_HOME}/ci/scripts/install_seaweedfs.sh" 4.31 /usr/local
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants