Spark: Add Spark 4.2 module (copy of Spark 4.1)#16751
Open
rahulsmahadev wants to merge 1 commit into
Open
Conversation
Collaborator
|
@rahulsmahadev @manuzhang has been working on this #14984 please coordinate with that PR so we don't have duplicated work. |
Contributor
Author
Ah I see I didn't realize there was already a PR when I spoke to @szehon-ho today |
Collaborator
Your PR description references it. Please coordinate with @manuzhang to help add any missing pieces or help with reviews. Thanks. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First step toward Spark 4.2 support: this PR adds
spark/v4.2as a mechanical, byte-identical copy ofspark/v4.1— zero content changes, no version bumps, no build wiring.spark/v4.1is untouched.Why a copy-only PR
This is intentionally split into two PRs so the follow-up PR containing the actual Spark-4.2-specific changes (version bumps, API fixes, build wiring) has a small, reviewable diff instead of being buried in ~150k lines of copied code.
Because the copy is byte-identical, git's copy/rename detection (
git log --follow -C,git blame -C -C) links everyspark/v4.2file back to its fullv4.1/v4.0/v3.5history — and this holds even under squash-merge, which does not preserve thegit mv+ copy-back commit pairs used previously. Verified on this branch:git blame -C -Conspark/v4.2/.../SparkCatalog.javaattributes lines to the original 2020–2024 commits, not to the copy commit.Precedent note: Spark 4.0 (#13059) and Spark 4.1 (#14155) were introduced as
Move X as Y+Copy back Y as X+Initial supportcommit triplets, rebase-merged to preserve the rename pair. This PR deliberately uses a plain byte-identical copy instead, which achieves the same history preservation independent of merge strategy; the "initial support" content will come as the follow-up PR. Related: #14984 takes the established single-PR approach for 4.2.0 (RC) — happy to coordinate or defer to whichever structure maintainers prefer.Build impact: none
The new directory is invisible to the build until explicitly registered:
gradle.propertiesgates versions viasystemProp.knownSparkVersions=3.5,4.0,4.1(anddefaultSparkVersions=4.1) —4.2is not listed.settings.gradleonly includes spark subprojects inside explicitif (sparkVersions.contains("X"))blocks; there is no globbing ofspark/*.spark/build.gradleonly doesapply from: file("$projectDir/vX/build.gradle")for enabled versions, sospark/v4.2/build.gradleis never applied.spark: ['3.5', '4.0', '4.1'].CI and releases are therefore unaffected until the follow-up PR wires the module up. The RAT license check passes since every file is an identical copy of an already-licensed file (
dev/.rat-excludesis glob-based, not path-specific).Verification
diff -r spark/v4.1 spark/v4.2→ empty (exit 0)spark/v4.1, 627 inspark/v4.2, no symlinks, all trackedspark/v4.2/**(627 files, +149,822 lines)Follow-up PR
Spark-4.2-specific changes come next, mirroring the v4.1 "initial support" commit: add
4.2toknownSparkVersions/defaultSparkVersions,settings.gradle+spark/build.gradle+jmh.gradlewiring,gradle/libs.versions.tomlentries,.github/workflows/spark-ci.ymlmatrix,.gitignorebenchmark paths,dev/stage-binaries.sh, version-string bumps insidespark/v4.2, and any API fixes Spark 4.2 requires.This pull request and its description were written by Isaac.