Pangolin is not just for Iceberg tables. It serves as a Universal Data Catalog, allowing you to govern, tag, and discover any file-based asset in your data lake—from ML models and video datasets to raw CSVs and PDF documentation.
Generic Assets are managed entities in the catalog that point to a location (S3/Azure/GCP) but don't strictly follow the Iceberg table format. They share the same governance features as tables:
- RBAC: Control who can view or modify the asset.
- Discovery: Full-text search and tagging.
- Lineage: Track relationships (e.g., "This ML Model was trained on Table X").
- Branching: Include generic assets in your data branches.
Pangolin natively supports the following types:
- Data Files:
ParquetTable,CsvTable,JsonTable - AI/ML:
MlModel - Modern Formats:
DeltaTable,HudiTable,ApachePaimon,Lance,Vortex,Nimble - Media:
VideoFile,ImageFile - System:
View,DbConnString,Directory,Other
Currently, Generic Assets are managed primarily via the REST API.
Endpoint: POST /api/v1/catalogs/{catalog}/namespaces/{namespace}/assets
Catalog a trained model artifact stored in S3.
curl -X POST http://localhost:8080/api/v1/catalogs/analytics/namespaces/ml_models/assets \
-H "Authorization: Bearer <token>" \
-H "X-Pangolin-Tenant: <tenant-id>" \
-H "Content-Type: application/json" \
-d '{
"name": "fraud_detection_v2",
"kind": "MlModel",
"location": "s3://my-datalake/models/fraud_v2.tar.gz",
"properties": {
"framework": "pytorch",
"version": "2.1.0",
"accuracy": "0.985",
"training_date": "2023-11-15",
"trained_by": "alice@example.com"
}
}'Catalog a folder of raw video footage for ingestion.
curl -X POST http://localhost:8080/api/v1/catalogs/raw_data/namespaces/videos/assets \
-d '{
"name": "cctv_logs_2023",
"kind": "VideoFile",
"location": "s3://raw-zone/cctv/2023/",
"properties": {
"format": "mp4",
"codec": "h264",
"retention": "30_days"
}
}'Catalog an existing Delta Lake table for discovery (without managing it via Iceberg).
curl -X POST http://localhost:8080/api/v1/catalogs/analytics/namespaces/silver/assets \
-d '{
"name": "marketing_leads",
"kind": "DeltaTable",
"location": "abfss://data@account.dfs.core.windows.net/delta/leads",
"properties": {
"managed_by": "databricks"
}
}'Generic assets appear alongside Iceberg tables in search results.
Search API:
GET /api/v1/assets/search?q=fraudResponse:
[
{
"id": "uuid...",
"name": "fraud_detection_v2",
"kind": "MlModel",
"catalog": "analytics",
"namespace": "ml_models"
},
{
"id": "uuid...",
"name": "fraud_transactions",
"kind": "IcebergTable",
"catalog": "analytics",
"namespace": "finance"
}
]You can interactively register and manage generic assets directly from the Data Explorer.
- Navigate: Go to Data Explorer and drill down into a Catalog and Namespace.
- Register: Click the "Register Asset" button in the top-right corner.
- Configure:
- Name: Enter a unique name for the asset.
- Type: Select from the dropdown (e.g.,
ML Model,Video File,Delta Table). - Location: Provide the full S3/Azure/GCS URI (e.g.,
s3://my-bucket/path/to/asset). - Properties: Add key-value pairs for metadata (e.g.,
accuracy: 0.99,owner: data-team).
- Submit: Click Register. The asset will now be discoverable in search.
- Modern Table Formats: Specific guide for cataloging Delta Lake, Hudi, and Paimon.
- Asset Management: General governance guide.
Generic Assets leverage the same permission model as tables.
- READ: Required to view metadata or search for the asset.
- WRITE: Required to update properties or location.
Example Policy: Grant Data Scientists access to register models.
{
"effect": "Allow",
"action": "RegisterAsset",
"resource": "catalog/analytics/namespace/ml_models/*"
}