Skip to content

NYCPlanning/data-engineering

Repository files navigation

Data Engineering Team

nightly qa dcpy test coverage

Primary repository for the Data Engineering team at NYC Department of City Planning (DCP). We build and maintain geospatial and tabular data products for internal and external use.

Also maintained: Product Metadata — specifications for DCP datasets.

Repo structure

Path Purpose
dcpy/ Core Python package: lifecycle orchestration, connectors, utilities
products/ One folder per data product — code, dbt models, recipe files, README
ingest_templates/ YAML specs for extracting and archiving source datasets
apps/ Docker Compose services: Dagster, QA app, notebook server
docs/ Technical reference (see below)
experimental/ Sandbox for prototyping; not production code

Data products

Each product lives under products/<name>/ and follows a standard pipeline from source data to public distribution:

Ingest → Build → Draft → QA → Publish

  1. Ingest — extract source datasets from APIs or files and archive to edm-recipes (S3)
  2. Build — load archived data into Postgres, run dbt/SQL transforms
  3. Draft — promote build output to the S3 draft folder; run automated QA checks
  4. QA — domain experts and GIS team review; address issues and rebuild as needed
  5. Publish — promote approved draft to the publish folder for distribution

Workflow diagram

For the full workflow including GIS team review and issue tracking conventions, see docs/data-update-workflow.md.

Getting started

See the Developer Setup wiki page for environment setup (Docker dev container recommended; manual uv/venv also documented).

Technical reference (docs/)

Documentation & team resources (wiki)

The wiki covers team and operational content: About Us · Cloud Infrastructure · Data Catalog · Data Glossary · Developer Conventions · Environment Management · Product pages

About

Primary repository for NYC DCP's Data Engineering team

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors