AWS: Handle S3 Table Bucket purge gracefully in GlueCatalog (#14449)#16073
AWS: Handle S3 Table Bucket purge gracefully in GlueCatalog (#14449)#16073yadavay-amzn wants to merge 1 commit intoapache:mainfrom
Conversation
…4449) When calling GlueCatalog.dropTable() with purge=true on a table in an S3 Table Bucket, the purge fails because S3 Table Buckets do not allow manual file deletion. This change wraps CatalogUtil.dropTableData() in a try-catch so that purge failures are logged as warnings instead of propagating and failing the entire drop operation. Closes apache#14449
840236c to
e996c80
Compare
| LOG.info("Glue table {} data purged", identifier); | ||
| try { | ||
| CatalogUtil.dropTableData(ops.io(), lastMetadata); | ||
| LOG.info("Glue table {} data purged", identifier); |
There was a problem hiding this comment.
Is there any way to check whether the target table exists in S3 Table bucket?
There was a problem hiding this comment.
The location could be checked for S3 Table Bucket ARN patterns, but catching the exception is more robust as it handles any case where purge fails (permissions, bucket policies, etc.) without needing to enumerate all possible URI formats.
Looks like this also aligns with the Trino approach you linked!
Happy to add a URI check if you'd prefer a more targeted approach though.
There was a problem hiding this comment.
I was wondering if we could do both (S3 Table check + try-catch) to avoid redundant S3 requests and warning logs. I think we should keep the try-catch regardless of S3 Table because it may fail for other reasons.
The "Enumerate all possible URI formats" approach doesn't look straightforward. Only adding try-catch looks good to me.
| LOG.info("Glue table {} data purged", identifier); | ||
| } catch (Exception e) { | ||
| LOG.warn( | ||
| "Failed to purge data for table: {}, continuing drop without purge", identifier, e); |
There was a problem hiding this comment.
The table has already been dropped by the time we reach this line, so this change makes sense to me.
The Trino Iceberg connector also suppresses failures when it cannot delete data using the Glue catalog:
| try { | ||
| CatalogUtil.dropTableData(ops.io(), lastMetadata); | ||
| LOG.info("Glue table {} data purged", identifier); | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
is it possible to catch the Specific exception rather than catch all Exception ?
When calling
GlueCatalog.dropTable()withpurge=trueon a table in an S3 Table Bucket, the purge fails because S3 Table Buckets do not allow manual file deletion.This change wraps
CatalogUtil.dropTableData()in a try-catch so that purge failures are logged as warnings instead of propagating and failing the entire drop operation. The table is still successfully dropped from the Glue catalog.Closes #14449