Digital Twin Builder for Databricks
OntoBricks is a web application that transforms Databricks tables into a materialized knowledge graph. It lets you design ontologies (OWL), map them to Unity Catalog tables via R2RML, materialize triples into a Delta or LadybugDB triple store, reason over the graph (OWL 2 RL, SWRL, SHACL), and query it through an auto-generated GraphQL API. The entire pipeline — from metadata import to a queryable knowledge graph — can run in four clicks using LLM-powered automation.
Please note that all projects in the /databrickslabs github account are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.
Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.
OntoBricks uses uv for dependency management. All dependencies are declared in pyproject.toml.
# Clone the repository
git clone <repository-url>
cd OntoBricks
# Install dependencies (uv resolves them from pyproject.toml)
uv sync
# Or use the setup script
scripts/setup.sh- Python 3.10 or higher
- Databricks workspace access with a Personal Access Token
- A SQL Warehouse ID
- A Unity Catalog Volume for the domain registry
# Configure credentials
cp .env.example .env
# Edit .env with your Databricks host, token, and warehouse ID
# Start the application
scripts/start.sh
# Open http://localhost:8000# Install and configure the Databricks CLI
pip install databricks-cli
databricks configure --token
# Deploy
make deploy
# Or: scripts/deploy.shAfter deployment, bind the sql-warehouse and volume resources in the Databricks Apps UI (Compute > Apps > ontobricks > Resources). If the registry volume is empty, open the app and click Settings > Registry > Initialize.
First deploy only:
make deployrunsscripts/bootstrap-app-permissions.shautomatically, which grants each app's service principalCAN_MANAGEon itself. Without that grant the middleware cannot read the app's own ACL and every first-time visitor — including the deployingCAN_MANAGEuser — lands on the access-denied page. If you deploy viadatabricks bundle deploydirectly, runmake bootstrap-permsonce afterwards (it is idempotent).
See Deployment Guide for the full checklist including resource configuration and permissions.
- Ensure all tests pass:
make test - Update the version in
pyproject.toml - Commit, tag, and push:
git add -A && git commit -m "Release vX.Y.Z"
git tag vX.Y.Z
git push origin main --tags- Deploy the new version:
make deploy
| Step | Action | What Happens |
|---|---|---|
| 1 | Import Metadata (Domain > Metadata) | Fetches table and column metadata from Unity Catalog |
| 2 | Generate Ontology (Ontology > Wizard) | LLM designs entities, relationships, and attributes from your metadata |
| 3 | Auto-Map (Mapping > Auto-Map) | LLM generates SQL mappings for every entity and relationship |
| 4 | Synchronize (Digital Twin > Status) | Executes mappings and populates the triple store |
- Design an ontology visually using the OntoViz canvas, or import OWL/RDFS/industry standards (FIBO, CDISC, IOF)
- Map ontology entities to Databricks tables with column-level precision
- Build the Digital Twin — materializes triples into the triple store (incremental by default)
- Query through the GraphQL playground or explore the interactive knowledge graph
- Reason over the graph — run OWL 2 RL inference, SWRL rules, SHACL validation, and constraint checks
- Two-phase search — preview matching entities in a flat list, then select specific ones to expand into the full graph with relationships and neighbors
- Configurable search depth — control the maximum traversal depth and entity cap for graph expansion
- Bridge navigation — follow cross-domain bridges to automatically switch domains and focus on the target entity in the knowledge graph
- Data cluster detection — detect communities in the knowledge graph using Louvain, Label Propagation, or Greedy Modularity algorithms; available client-side (Graphology) for the visible subgraph and server-side (NetworkX) for the full graph; cluster results can be visualized with color-by-cluster mode and collapsed into super-nodes
- Data quality violation limits — cap the number of violations displayed per rule (configurable via dropdown, default 10) for faster quality checks
- Per-rule progress tracking — SWRL inference and data quality checks report progress for each individual rule
The Ontology Model view includes a floating AI Assistant (bottom-right of the canvas) that lets you modify your ontology through natural language commands — add entities, remove orphans, list relationships, and more. Conversation history is maintained within the session.
- Deep-linked sidebar sections — shareable URLs, browser Back/Forward support
- Breadcrumb navigation — always see your position (Registry > Domain > Ontology > Section)
- Keyboard shortcuts —
Cmd/Ctrl+Ssave,Cmd/Ctrl+Ksearch,?help overlay - SQL connection pooling — reusable database connections, no per-query TLS handshake
- CSRF protection — double-submit cookie for all state-changing requests
- Structured JSON logging — set
LOG_FORMAT=jsonfor production-grade observability
OntoBricks exposes the knowledge graph to LLM agents via the Model Context Protocol. Deploy the companion mcp-ontobricks app and connect from Cursor, Claude Desktop, or the Databricks Playground.
Promote domains between Databricks environments with the
scripts/registry_transfer.sh command-line tool — export a curated subset
of domains/versions from a source registry into a .zip, then preview and
commit it into the target registry. No UI, no HTTP endpoint. See
Registry Import / Export (CLI) for the full
reference and examples.
Full documentation is available in docs/. For a comprehensive feature list and architecture details, see INFO.md.