Skip to content

fix: route file-not-found errors through SparkError JSON path#3699

Open
andygrove wants to merge 10 commits intoapache:mainfrom
andygrove:fix/file-not-found-error-handling
Open

fix: route file-not-found errors through SparkError JSON path#3699
andygrove wants to merge 10 commits intoapache:mainfrom
andygrove:fix/file-not-found-error-handling

Conversation

@andygrove
Copy link
Member

Which issue does this PR close?

Closes #3314.

Rationale for this change

When native_datafusion scan encounters missing files with ignoreMissingFiles=false, it throws a CometNativeException with an error message like "Object at location ... not found". CometExecIterator pattern-matches this message and wraps it in a SparkException with a plain FileNotFoundException cause. However, Spark SQL tests expect the cause to be SparkFileNotFoundException (which is private[spark]), so the .asInstanceOf[SparkFileNotFoundException] cast fails.

What changes are included in this PR?

  • Rust: Add FileNotFound variant to SparkError enum. Detect file-not-found errors in throw_exception() (both the DataFusionError::External branch and the generic catch-all) and convert them to SparkError::FileNotFound, which is serialized as JSON and thrown as CometQueryExecutionException.
  • Shim layer: Add "FileNotFound" case to all three ShimSparkErrorConverter implementations (Spark 3.4, 3.5, 4.0) that calls QueryExecutionErrors.readCurrentFileNotFoundError(), producing a proper SparkFileNotFoundException.
  • CometExecIterator: Remove the old fileNotFoundPattern regex matching from the CometNativeException handler since file-not-found errors are now handled through the JSON/shim path.

How are these changes tested?

The fix enables the existing Spark SQL tests (FileBasedDataSourceSuite and SimpleSQLViewSuite) that were previously skipped for native_datafusion to pass. The diff workarounds (assume() and IgnoreCometNativeDataFusion tags) can be removed in a follow-up once these tests are confirmed passing in CI.

Detect file-not-found errors from DataFusion's object store on the
native side and convert them to SparkError::FileNotFound, which is
serialized as JSON via CometQueryExecutionException. The shim layer
then creates a proper SparkFileNotFoundException using
QueryExecutionErrors.readCurrentFileNotFoundError(), producing the
exact exception type that Spark tests expect.

Previously, file-not-found errors arrived as CometNativeException and
were pattern-matched in CometExecIterator to create a SparkException
with a plain FileNotFoundException cause. Tests that cast the cause
to SparkFileNotFoundException (which is private[spark]) would fail.

Closes apache#3314
Remove the assume() skip for native_datafusion in
FileBasedDataSourceSuite and the IgnoreCometNativeDataFusion tag
in SimpleSQLViewSuite, since file-not-found errors now produce
the correct SparkFileNotFoundException type.
Remove CometConf import from FileBasedDataSourceSuite and
IgnoreCometNativeDataFusion import from SQLViewSuite that
became unused after removing the test skips.
Extract file path from native error message and format it as
"File <path> does not exist" to match the Hadoop FileNotFoundException
message format that Spark tests expect.
readCurrentFileNotFoundError was removed in Spark 4.0. Construct
SparkFileNotFoundException directly instead, which is accessible
from the shim package.
….5.8 diff

The SPARK-25207 test expects a specific error message for duplicate fields
in case-insensitive mode, but native DataFusion produces a different
schema error. Update the test to accept either message format.
Add IgnoreCometNativeDataFusion tag to SPARK-25207 test instead of
trying to accept both error messages. A separate PR will fix the
underlying issue.
@andygrove andygrove marked this pull request as ready for review March 15, 2026 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[native_datafusion] [Spark SQL Tests] Missing files error handling differs from Spark

1 participant