Pangolin supports cataloging external table formats like Delta Lake, Apache Hudi, and Apache Paimon as Generic Assets. This allows you to maintain a unified discovery layer across your entire data estate, even if some tables are managed by different engines (e.g., Databricks, EMR, Flink).
Unlike Iceberg tables, which Pangolin manages natively (handling commits, snapshots, and manifest files), other modern table formats are treated as governed pointers.
What Pangolin Does:
- ✅ Discovery: Tables appear in search results with their specific type (e.g.,
DeltaTable). - ✅ Governance: Apply RBAC policies and tags to these tables.
- ✅ Lineage: Track them as sources or sinks in your data pipelines.
- ✅ Metadata: Store custom properties (e.g.,
managed_by: databricks,last_compacted: 2023-10-01).
What Pangolin Does NOT Do:
- ❌ Transaction Management: Pangolin does not process Delta/Hudi commits. The query engine (Spark/Flink/Trino) handles the ACID guarantees.
- ❌ Schema Evolution: You cannot change the schema of a Delta table via Pangolin's API.
Developed by Databricks, widely used in Spark ecosystems.
- Root Path: The directory containing the
_delta_logfolder. - Best Practice: Point Pangolin to the table root, not the
_delta_logfolder itself.- ✅
s3://bucket/warehouse/db/my_table - ❌
s3://bucket/warehouse/db/my_table/_delta_log
- ✅
Popular for streaming and upsert-heavy workloads.
- Root Path: The directory containing the
.hoodiefolder. - Best Practice: Use properties to note the table type (Copy-on-Write vs Merge-on-Read).
A streaming data lake format (formerly Flink Table Store).
- Root Path: The root directory of the Paimon table.
You have existing pipelines writing Delta tables using Databricks. You want these tables to be discoverable by analysts in Pangolin alongside your native Iceberg tables.
- Navigate to the Data Explorer.
- Drill down into the Catalog and Namespace where you want to register the table.
- Click the "Register Asset" button (top-right).
- Fill in the details:
- Name:
customer_churn_prediction - Type: Select
Delta Lake Table(or Hudi/Paimon). - Location:
s3://finance-data/delta/churn_preds/
- Name:
- (Optional) Add Properties:
owner:data-science-teamupdate_frequency:daily
- Click Register.
# Register a Delta Table
curl -X POST http://localhost:8080/api/v1/catalogs/analytics/namespaces/gold/assets \
-H "Authorization: Bearer <token>" \
-H "X-Pangolin-Tenant: <tenant-id>" \
-H "Content-Type: application/json" \
-d '{
"name": "customer_churn_prediction",
"kind": "DeltaTable",
"location": "s3://finance-data/delta/churn_preds/",
"properties": {
"provider": "delta",
"managed_by": "databricks-job-123"
}
}'If you use tools like Apache XTable to translate metadata between formats (e.g., Delta -> Iceberg), you can register the same data location twice with different semantics:
- Primary: Register as
DeltaTable(the source of truth). - Read-Replica: Register as
IcebergTable(using the XTable-generated metadata) if you want Pangolin to serve it to Iceberg clients.
If your Delta tables use UniForm (Universal Format) to generate Iceberg metadata:
- You should register the table as an
IcebergTablein Pangolin pointing to themetadatadirectory generated by UniForm if you want full Iceberg client compatibility. - Registering it as a
DeltaTableis still useful for discovery but avoids Pangolin trying to serve the Iceberg metadata directly.
Q: Can I run SQL queries on these tables via Pangolin? A: No. Pangolin is a catalog, not a query engine. You use Dremio, Trino, Spark, or StarRocks to query the tables. Pangolin ensures you can find them and know where they are.
Q: Do these tables show up in Iceberg clients?
A: No. Generic assets are filtered out of the standard Iceberg REST API responses (loadTable) because they don't have valid Iceberg metadata. They only appear in Pangolin's asset search and discovery APIs.