Strata

Content-addressed notebooks for ML and data workflows.

Strata Notebook is an interactive notebook where every cell output is an artifact. Same code + same inputs = instant cache hit. Change one cell, and only that cell and its dependents re-execute — everything else is served from the artifact store in milliseconds.

Docs: forge-labs-dev.github.io/strata

Quick Start

# Docker (recommended)
docker compose up -d --build
# Then open http://localhost:8765

# Or from source
uv sync
cd frontend && npm ci && npm run build && cd ..
uv run strata-server
# Then open http://localhost:8765

Install as a dependency

PyPI publishing is pending. Until then, install directly from Git:

# Strata core (materialization, artifact store, Iceberg scanning):
pip install "strata @ git+https://github.com/forge-labs-dev/strata.git"

# Strata Notebook adds DataFrame/Series/ndarray serialization, display
# outputs, and the cloudpickle-backed object codec:
pip install "strata[notebook] @ git+https://github.com/forge-labs-dev/strata.git"

# Or with uv:
uv add "strata[notebook] @ git+https://github.com/forge-labs-dev/strata.git"

# Pin to a specific commit for reproducibility:
pip install "strata @ git+https://github.com/forge-labs-dev/strata.git@<sha>"

Notebook Features

Content-addressed caching — same code + same inputs = instant cache hit, zero recomputation
Automatic dependency tracking — DAG built from variable analysis, no manual wiring
Cascade execution — change upstream code, downstream cells auto-invalidate
Distributed workers — annotate @worker gpu-fly and the cell dispatches to a remote GPU
Prompt cells — LLM-powered cells with {{ variable }} template injection
AI assistant — streaming chat with conversation memory, agent mode for autonomous notebook building
Environment management — per-notebook Python venvs via uv, isolated from each other
Rich outputs — DataFrames, matplotlib plots, markdown, images
Cell operations — reorder, duplicate, fold, keyboard shortcuts
Headless runner — strata run ./my-notebook for CI and scheduled execution

The Cache Advantage

Every notebook platform re-executes from scratch when you change one cell. Strata doesn't. The artifact store deduplicates by provenance hash — if the code and inputs haven't changed, the result is served instantly.

First run:     load data (10s) → clean (3s) → train (20s) → evaluate (1s)  = 34s
Change model:  load data (✓)   → clean (✓)  → train (20s) → evaluate (1s)  = 21s
Re-run:        load data (✓)   → clean (✓)  → train (✓)   → evaluate (✓)   = <1s

This is not a feature bolted on — it's the architecture. Every cell execution is a materialize(inputs, transform) → artifact operation. The cache is correct by construction because it's keyed on content, not time.

Distributed Execution

Each cell can declare which worker it runs on via a single annotation:

# @worker my-gpu
embeddings = model.encode(abstracts, batch_size=256)

You define workers in notebook.toml — each one points at an HTTP endpoint that implements the Strata executor protocol. A worker can be a GPU box on RunPod, a DataFusion cluster on Fly, a beefy EC2 instance, or anything else that speaks HTTP. The notebook routes the cell to the declared worker at execution time, and the UI shows a live "dispatching → my-gpu" badge while it runs.

No deployment code, no infrastructure glue. Bring your own compute, one annotation per cell.

Strata Core

The notebook is built on Strata Core, a standalone materialization and artifact layer that can be used independently as a Python library and REST API:

from strata import StrataClient

client = StrataClient()
artifact = client.materialize(
    inputs=["file:///warehouse#db.events"],
    transform={"executor": "scan@v1", "params": {"columns": ["id", "value"]}},
)
table = client.fetch(artifact.uri)  # Arrow table, cached by provenance

Core provides: provenance-based deduplication, immutable versioned artifacts, lineage tracking, Iceberg table scanning with row-group caching, pluggable blob storage (local/S3/GCS/Azure), multi-tenancy, trusted proxy auth, and an executor protocol for external compute.

Core documentation →

Architecture

┌─────────────────────────────────────────────┐
│ Notebook UI (Vue.js + WebSocket)            │
│ cells, DAG view, AI assistant, workers      │
└─────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────┐
│ Notebook Backend (FastAPI)                  │
│ session, cascade, executor, prompt cells    │
└─────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────┐
│ Strata Core                                 │
│ materialize, artifacts, lineage, dedupe     │
└─────────────────────────────────────────────┘

The notebook is an orchestration layer over Core. It decides what to run next (cascade planning, staleness tracking). The cell harness is an executor. Core decides whether results already exist and persists them.

Development

uv sync                          # Install deps + build Rust extension
uv run pytest                    # Run all tests
pre-commit run --all-files       # Lint + format
cd frontend && npm run dev       # Frontend dev server (hot reload)

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 341 Commits
.devcontainer		.devcontainer
.docker/service-mode		.docker/service-mode
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
examples		examples
frontend		frontend
grafana		grafana
observability		observability
rust		rust
scripts		scripts
src/strata		src/strata
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
docker-compose.observability.yml		docker-compose.observability.yml
docker-compose.service.yml		docker-compose.service.yml
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
fly.toml		fly.toml
mkdocs.yml		mkdocs.yml
mockup-mounts-workers.html		mockup-mounts-workers.html
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Strata

Quick Start

Install as a dependency

Notebook Features

The Cache Advantage

Distributed Execution

Strata Core

Architecture

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Strata

Quick Start

Install as a dependency

Notebook Features

The Cache Advantage

Distributed Execution

Strata Core

Architecture

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages