Skip to content

phlohouse/phlo

Repository files navigation

Phlo

Modern data lakehouse platform. Plugin-driven. Storage-agnostic.

CI PyPI Python 3.11+

Features

  • Template-first projectsphlo init creates focused starters for CSV, REST APIs, dbt medallion projects, Sling replication, and observability demos
  • Decorator-driven ingestionphlo-dlt and phlo-sling register assets without hand-written Dagster boilerplate
  • Type-safe quality contractsphlo-pandera schemas validate data before it lands in managed tables
  • Capability plugins — packages contribute services, CLI commands, assets, resources, catalogs, hooks, and Observatory surfaces through Python entry points
  • Storage-agnostic data plane — Iceberg, Delta, ClickHouse, Trino, MinIO, RustFS, Nessie, and PostgreSQL can be composed as needed
  • Operator surfacesphlo-api, Observatory, MCP, PostgREST, Hasura, Superset, pgweb, and observability packages expose runtime state and actions
  • Local-first operationsphlo services generates and runs the project stack through Docker or Podman

What It Looks Like

from pathlib import Path

import pandas as pd
import pandera.pandas as pa
from pandera.typing import Series

from phlo_dlt.decorator import phlo_ingestion


class EventsSchema(pa.DataFrameModel):
    id: Series[int] = pa.Field(ge=1)
    name: Series[str]
    value: Series[int] = pa.Field(ge=0)

    class Config:
        strict = True
        coerce = True


@phlo_ingestion(
    table_name="events",
    unique_key="id",
    validation_schema=EventsSchema,
    group="demo",
    freshness_hours=(1, 24),
)
def events(partition_date: str):
    return pd.read_csv(Path("data/events.csv"))

Prerequisites

  • uv — Python package manager
  • Docker — Container runtime

Quick Start

# Install with default plugins
uv pip install phlo[defaults]

# Initialize a runnable local starter
phlo init my-project --template csv-batch
cd my-project

# Generate service configuration, start services, and materialize
phlo services init
phlo services start
phlo materialize dlt_events --partition 2026-05-04

Documentation

Full documentation source lives under docs/index.md. The published site is generated with pymdx, matching the GitHub Pages workflow:

pymdx generate src/phlo --docs docs --output docs-site
pymdx build docs-site

Primary entry points:

Development

uv pip install -e .    # Install Phlo in dev mode
make check             # Lint, format, typecheck, and test (parallel)

# Services
phlo services start    # Start infrastructure
phlo services stop     # Stop services
phlo services logs -f  # View logs

# Individual gates
uv run ruff check .    # Lint
uv run ruff format .   # Format
uv run ty check        # Typecheck
uv run pytest          # Test

Architecture

Phlo is a monorepo of composable packages — install only what you need:

Layer Packages
Orchestration phlo-dagster
Ingestion phlo-dlt, phlo-sling
Quality phlo-pandera
Transforms phlo-dbt
Table formats phlo-iceberg, phlo-delta, phlo-clickhouse
Infrastructure phlo-traefik, phlo-postgres, phlo-oauth2-proxy
Storage phlo-minio, phlo-rustfs
Catalog phlo-nessie, phlo-openmetadata
Query phlo-trino
Observability phlo-otel, phlo-clickstack, phlo-grafana, phlo-prometheus, phlo-loki, phlo-alloy, phlo-alerting
UI phlo-observatory, phlo-pgweb, phlo-superset
API phlo-api, phlo-mcp, phlo-hasura, phlo-postgrest
Dev/Test phlo-testing

About

Modern data lakehouse platform — plugin-driven, storage-agnostic, with decorator-driven ingestion and write-audit-publish patterns

Topics

Resources

Stars

Watchers

Forks

Contributors