Primary repository for the Data Engineering team at NYC Department of City Planning (DCP). We build and maintain geospatial and tabular data products for internal and external use.
Also maintained: Product Metadata — specifications for DCP datasets.
| Path | Purpose |
|---|---|
dcpy/ |
Core Python package: lifecycle orchestration, connectors, utilities |
products/ |
One folder per data product — code, dbt models, recipe files, README |
ingest_templates/ |
YAML specs for extracting and archiving source datasets |
apps/ |
Docker Compose services: Dagster, QA app, notebook server |
docs/ |
Technical reference (see below) |
experimental/ |
Sandbox for prototyping; not production code |
Each product lives under products/<name>/ and follows a standard pipeline from source data to public distribution:
Ingest → Build → Draft → QA → Publish
- Ingest — extract source datasets from APIs or files and archive to
edm-recipes(S3) - Build — load archived data into Postgres, run dbt/SQL transforms
- Draft — promote build output to the S3
draftfolder; run automated QA checks - QA — domain experts and GIS team review; address issues and rebuild as needed
- Publish — promote approved draft to the
publishfolder for distribution
For the full workflow including GIS team review and issue tracking conventions, see docs/data-update-workflow.md.
See the Developer Setup wiki page for environment setup (Docker dev container recommended; manual uv/venv also documented).
- dbt project conventions — model layers, materialization, geometry standards, linting
- dcpy package structure — module layers and import rules
- Bash scripts & CLI tools — available utilities on
PATH - Data update workflow — full build-to-publish lifecycle
The wiki covers team and operational content: About Us · Cloud Infrastructure · Data Catalog · Data Glossary · Developer Conventions · Environment Management · Product pages
