Skip to content

[AURON #2221] Remove hard-coded Iceberg scan class name detection using type-based check#2226

Open
guixiaowen wants to merge 4 commits intoapache:masterfrom
guixiaowen:foreIceberg_2221
Open

[AURON #2221] Remove hard-coded Iceberg scan class name detection using type-based check#2226
guixiaowen wants to merge 4 commits intoapache:masterfrom
guixiaowen:foreIceberg_2221

Conversation

@guixiaowen
Copy link
Copy Markdown
Contributor

@guixiaowen guixiaowen commented May 3, 2026

…erg table types.

Which issue does this PR close?

Closes #2221

Rationale for this change

This PR removes string-based detection and replaces it with type-based checking.

The current implementation relies on hard-coded class name strings to detect Iceberg scan types:

`
if (!scanClassName.startsWith("org.apache.iceberg.spark.source.")) {
return None
}

if (scanClassName == "org.apache.iceberg.spark.source.SparkChangelogScan") {
return None
}

if (className != "org.apache.iceberg.spark.source.SparkInputPartition") {
return None
}
`

This approach introduces tight coupling to Iceberg internal class naming and has several drawbacks:

Fragile to upstream refactoring (class/package rename)
Lacks type safety
Hard to maintain and extend

Notes on Changelog Scan

ChangelogScan is not a subclass of SparkBatchQueryScan

What changes are included in this PR?

Three conditional scenarios were modified to avoid hard-coded logic in the evaluations.

Benefits
Removes hard-coded class name dependency
Improves type safety and readability
More robust against Iceberg internal refactoring
Cleaner and more maintainable logic

Are there any user-facing changes?

No changes.

How was this patch tested?

Depends on existing unit tests.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Auron’s Iceberg integration to avoid brittle string-based detection of Iceberg scan/partition classes, moving toward type-based checks to better tolerate Iceberg refactoring.

Changes:

  • Replaced hard-coded scan.getClass.getName prefix/equality checks with a class-based gate for identifying Iceberg batch scans.
  • Replaced hard-coded SparkInputPartition class-name equality with a class-based check.
  • Added a small utility under org.apache.iceberg.spark.source to expose Class[_] handles for (likely) package-private Iceberg classes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/auron/iceberg/IcebergScanSupport.scala Swaps string-based Iceberg scan/partition detection to class-based checks.
thirdparty/auron-iceberg/src/main/scala/org/apache/iceberg/spark/source/AuronIcebergSourceUtil.scala Introduces helper to access Class objects for Iceberg source types.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


// Changelog scan carries row-level changes; not supported by native COW-only path.
if (scanClassName == "org.apache.iceberg.spark.source.SparkChangelogScan") {
if (!(scan.getClass == AuronIcebergSourceUtil.getClassOfSparkBatchQueryScan)) {
Comment on lines 196 to 200
val className = partition.getClass.getName
// Only accept Iceberg SparkInputPartition to access task groups.
if (className != "org.apache.iceberg.spark.source.SparkInputPartition") {
if (partition.getClass
!= AuronIcebergSourceUtil.getClassOfSparkInputPartition()) {
return None
Comment on lines 50 to 53
// Only handle Iceberg scans; other sources must stay on Spark's path.
if (!scanClassName.startsWith("org.apache.iceberg.spark.source.")) {
return None
}

// Changelog scan carries row-level changes; not supported by native COW-only path.
if (scanClassName == "org.apache.iceberg.spark.source.SparkChangelogScan") {
if (!(scan.getClass == AuronIcebergSourceUtil.getClassOfSparkBatchQueryScan)) {
return None
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove hard-coded Iceberg scan class name detection using type-based check

2 participants