-
Notifications
You must be signed in to change notification settings - Fork 3.2k
AWS: Handle S3 Table Bucket purge gracefully in GlueCatalog (#14449) #16073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -356,8 +356,13 @@ public boolean dropTable(TableIdentifier identifier, boolean purge) { | |
| .build()); | ||
| LOG.info("Successfully dropped table {} from Glue", identifier); | ||
| if (purge && lastMetadata != null) { | ||
| CatalogUtil.dropTableData(ops.io(), lastMetadata); | ||
| LOG.info("Glue table {} data purged", identifier); | ||
| try { | ||
| CatalogUtil.dropTableData(ops.io(), lastMetadata); | ||
| LOG.info("Glue table {} data purged", identifier); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any way to check whether the target table exists in S3 Table bucket?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The location could be checked for S3 Table Bucket ARN patterns, but catching the exception is more robust as it handles any case where purge fails (permissions, bucket policies, etc.) without needing to enumerate all possible URI formats. Happy to add a URI check if you'd prefer a more targeted approach though.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was wondering if we could do both (S3 Table check + try-catch) to avoid redundant S3 requests and warning logs. I think we should keep the try-catch regardless of S3 Table because it may fail for other reasons. The "Enumerate all possible URI formats" approach doesn't look straightforward. Only adding try-catch looks good to me. |
||
| } catch (Exception e) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it possible to catch the Specific exception rather than catch all Exception ?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Intentionally catching broad
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm hesitant against this broad of an exception. We should be strict in what we are catching here it's a purge that other use cases could hit. |
||
| LOG.warn( | ||
| "Failed to purge data for table: {}, continuing drop without purge", identifier, e); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The table has already been dropped by the time we reach this line, so this change makes sense to me. The Trino Iceberg connector also suppresses failures when it cannot delete data using the Glue catalog: |
||
| } | ||
| } | ||
| LOG.info("Dropped table: {}", identifier); | ||
| return true; | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,7 +26,10 @@ | |
| import java.util.UUID; | ||
| import java.util.concurrent.atomic.AtomicInteger; | ||
| import org.apache.iceberg.BaseMetastoreTableOperations; | ||
| import org.apache.iceberg.CatalogUtil; | ||
| import org.apache.iceberg.Schema; | ||
| import org.apache.iceberg.TableMetadata; | ||
| import org.apache.iceberg.TableOperations; | ||
| import org.apache.iceberg.aws.AwsProperties; | ||
| import org.apache.iceberg.aws.s3.S3FileIOProperties; | ||
| import org.apache.iceberg.catalog.Namespace; | ||
|
|
@@ -40,6 +43,7 @@ | |
| import org.apache.iceberg.util.LockManagers; | ||
| import org.junit.jupiter.api.BeforeEach; | ||
| import org.junit.jupiter.api.Test; | ||
| import org.mockito.MockedStatic; | ||
| import org.mockito.Mockito; | ||
| import org.mockito.invocation.InvocationOnMock; | ||
| import org.mockito.stubbing.Answer; | ||
|
|
@@ -312,6 +316,44 @@ public void testDropTable() { | |
| glueCatalog.dropTable(TableIdentifier.of("db1", "t1")); | ||
| } | ||
|
|
||
| @Test | ||
| public void testDropTableWithPurgeFailure() { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Were you able to test this functionality against a real s3tables catalog? |
||
| Map<String, String> properties = Maps.newHashMap(); | ||
| properties.put( | ||
| BaseMetastoreTableOperations.TABLE_TYPE_PROP, | ||
| BaseMetastoreTableOperations.ICEBERG_TABLE_TYPE_VALUE); | ||
| Mockito.doReturn( | ||
| GetTableResponse.builder() | ||
| .table( | ||
| Table.builder().databaseName("db1").name("t1").parameters(properties).build()) | ||
| .build()) | ||
| .when(glue) | ||
| .getTable(Mockito.any(GetTableRequest.class)); | ||
| Mockito.doReturn( | ||
| GetDatabaseResponse.builder().database(Database.builder().name("db1").build()).build()) | ||
| .when(glue) | ||
| .getDatabase(Mockito.any(GetDatabaseRequest.class)); | ||
| Mockito.doReturn(DeleteTableResponse.builder().build()) | ||
| .when(glue) | ||
| .deleteTable(Mockito.any(DeleteTableRequest.class)); | ||
|
|
||
| TableMetadata mockMetadata = Mockito.mock(TableMetadata.class); | ||
| TableOperations mockOps = Mockito.mock(TableOperations.class); | ||
| Mockito.when(mockOps.current()).thenReturn(mockMetadata); | ||
|
|
||
| GlueCatalog spyCatalog = Mockito.spy(glueCatalog); | ||
| Mockito.doReturn(mockOps).when(spyCatalog).newTableOps(Mockito.any(TableIdentifier.class)); | ||
|
|
||
| try (MockedStatic<CatalogUtil> mockedCatalogUtil = Mockito.mockStatic(CatalogUtil.class)) { | ||
| mockedCatalogUtil | ||
| .when(() -> CatalogUtil.dropTableData(Mockito.any(), Mockito.any())) | ||
| .thenThrow(new RuntimeException("Cannot delete files from S3 Table Bucket")); | ||
|
|
||
| boolean result = spyCatalog.dropTable(TableIdentifier.of("db1", "t1"), true); | ||
| assertThat(result).isTrue(); | ||
| } | ||
| } | ||
|
|
||
| @Test | ||
| public void testRenameTable() { | ||
| AtomicInteger counter = new AtomicInteger(1); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, s3tables is purging data behind the scenes right. So in the case of
dropTable(identifier, false)we should throw and force the user to be specific about purging.