Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -356,8 +356,13 @@ public boolean dropTable(TableIdentifier identifier, boolean purge) {
.build());
LOG.info("Successfully dropped table {} from Glue", identifier);
if (purge && lastMetadata != null) {
CatalogUtil.dropTableData(ops.io(), lastMetadata);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, s3tables is purging data behind the scenes right. So in the case of dropTable(identifier, false) we should throw and force the user to be specific about purging.

LOG.info("Glue table {} data purged", identifier);
try {
CatalogUtil.dropTableData(ops.io(), lastMetadata);
LOG.info("Glue table {} data purged", identifier);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to check whether the target table exists in S3 Table bucket?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The location could be checked for S3 Table Bucket ARN patterns, but catching the exception is more robust as it handles any case where purge fails (permissions, bucket policies, etc.) without needing to enumerate all possible URI formats.
Looks like this also aligns with the Trino approach you linked!

Happy to add a URI check if you'd prefer a more targeted approach though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we could do both (S3 Table check + try-catch) to avoid redundant S3 requests and warning logs. I think we should keep the try-catch regardless of S3 Table because it may fail for other reasons.

The "Enumerate all possible URI formats" approach doesn't look straightforward. Only adding try-catch looks good to me.

} catch (Exception e) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to catch the Specific exception rather than catch all Exception ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally catching broad Exception here — the table metadata has already been dropped by this point, so the try-catch is a safety net to prevent any unexpected failure from blocking the drop operation. Narrowing to a specific exception risks missing edge cases from different IO implementations (S3, GCS, HDFS, etc.). This also aligns with the Trino approach that @ebyhr linked.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant against this broad of an exception. We should be strict in what we are catching here it's a purge that other use cases could hit.

LOG.warn(
"Failed to purge data for table: {}, continuing drop without purge", identifier, e);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table has already been dropped by the time we reach this line, so this change makes sense to me.

The Trino Iceberg connector also suppresses failures when it cannot delete data using the Glue catalog:

https://github.com/trinodb/trino/blob/5a116341b53f9f3a3b29b8b405773010e307e40b/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/glue/TrinoGlueCatalog.java#L676-L696

}
}
LOG.info("Dropped table: {}", identifier);
return true;
Expand Down
42 changes: 42 additions & 0 deletions aws/src/test/java/org/apache/iceberg/aws/glue/TestGlueCatalog.java
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,10 @@
import java.util.UUID;
import java.util.concurrent.atomic.AtomicInteger;
import org.apache.iceberg.BaseMetastoreTableOperations;
import org.apache.iceberg.CatalogUtil;
import org.apache.iceberg.Schema;
import org.apache.iceberg.TableMetadata;
import org.apache.iceberg.TableOperations;
import org.apache.iceberg.aws.AwsProperties;
import org.apache.iceberg.aws.s3.S3FileIOProperties;
import org.apache.iceberg.catalog.Namespace;
Expand All @@ -40,6 +43,7 @@
import org.apache.iceberg.util.LockManagers;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.mockito.MockedStatic;
import org.mockito.Mockito;
import org.mockito.invocation.InvocationOnMock;
import org.mockito.stubbing.Answer;
Expand Down Expand Up @@ -312,6 +316,44 @@ public void testDropTable() {
glueCatalog.dropTable(TableIdentifier.of("db1", "t1"));
}

@Test
public void testDropTableWithPurgeFailure() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you able to test this functionality against a real s3tables catalog?

Map<String, String> properties = Maps.newHashMap();
properties.put(
BaseMetastoreTableOperations.TABLE_TYPE_PROP,
BaseMetastoreTableOperations.ICEBERG_TABLE_TYPE_VALUE);
Mockito.doReturn(
GetTableResponse.builder()
.table(
Table.builder().databaseName("db1").name("t1").parameters(properties).build())
.build())
.when(glue)
.getTable(Mockito.any(GetTableRequest.class));
Mockito.doReturn(
GetDatabaseResponse.builder().database(Database.builder().name("db1").build()).build())
.when(glue)
.getDatabase(Mockito.any(GetDatabaseRequest.class));
Mockito.doReturn(DeleteTableResponse.builder().build())
.when(glue)
.deleteTable(Mockito.any(DeleteTableRequest.class));

TableMetadata mockMetadata = Mockito.mock(TableMetadata.class);
TableOperations mockOps = Mockito.mock(TableOperations.class);
Mockito.when(mockOps.current()).thenReturn(mockMetadata);

GlueCatalog spyCatalog = Mockito.spy(glueCatalog);
Mockito.doReturn(mockOps).when(spyCatalog).newTableOps(Mockito.any(TableIdentifier.class));

try (MockedStatic<CatalogUtil> mockedCatalogUtil = Mockito.mockStatic(CatalogUtil.class)) {
mockedCatalogUtil
.when(() -> CatalogUtil.dropTableData(Mockito.any(), Mockito.any()))
.thenThrow(new RuntimeException("Cannot delete files from S3 Table Bucket"));

boolean result = spyCatalog.dropTable(TableIdentifier.of("db1", "t1"), true);
assertThat(result).isTrue();
}
}

@Test
public void testRenameTable() {
AtomicInteger counter = new AtomicInteger(1);
Expand Down
Loading