Skip to content

Add calculate variable usage analytics#1499

Merged
anth-volk merged 17 commits into
mainfrom
codex/calculate-variable-usage-analytics
May 12, 2026
Merged

Add calculate variable usage analytics#1499
anth-volk merged 17 commits into
mainfrom
codex/calculate-variable-usage-analytics

Conversation

@anth-volk
Copy link
Copy Markdown
Collaborator

@anth-volk anth-volk commented May 8, 2026

Fixes #1498
Fixes #1500

Summary

Adds privacy-safe variable usage analytics for authenticated /calculate requests and introduces Alembic-managed schema changes for the analytics database. The existing visits request-count analytics remains intact.

Calculate variable analytics

  • Adds calculate_requests linked to visits.id.
  • Adds grouped calculate_request_variables rows for variable name, source, entity type, period granularity, counts, and availability status.
  • Extracts variables before validation and deprecated-input dropping so unsupported and deprecated allowlisted variables are captured.
  • Avoids storing household values, entity IDs, member relationships, axis bounds, exact period keys, request bodies, or response bodies.
  • Stores canonical entity types, such as person, tax_unit, or household, and collapses unknown caller-supplied groups to unknown.
  • Caps stored analytics variable names at 250 characters plus ... and stores variable_name_truncated=true when capping occurs.
  • Flattens variable extraction helpers so entity-group walking, variable filtering, and accumulator writes are separated.

Required analytics behavior

  • ANALYTICS__ENABLED=false skips analytics entirely.
  • ANALYTICS__ENABLED=true makes analytics a required runtime dependency.
  • API startup fails if the analytics DB is unreachable, missing required tables/columns, or not stamped at the configured Alembic head.
  • Request-time analytics write failures propagate like other API failures instead of silently continuing.

Alembic migration infrastructure

  • Adds Alembic config, env wiring, and ordered revisions for the analytics database.
  • Adds a hand-authored baseline revision for the existing production visits table.
  • Regenerates the calculate variable usage schema migration with uv run alembic revision --autogenerate --rev-id 20260508_0002 -m "calculate variable usage" against a temp DB upgraded to the baseline revision.
  • Adds 20260512_0003 with uv run alembic revision --autogenerate --rev-id 20260512_0003 -m "cap analytics variable names", then adjusts the generated boolean column to backfill existing rows safely.
  • Adds model-agnostic AI guidance: AGENTS.md is the shared entry point, CLAUDE.md and .github/copilot-instructions.md are thin adapters, and database migration guidance lives at docs/engineering/skills/database-migrations.md.
  • Requires future schema migrations to be created with uv run alembic revision --autogenerate -m "Description", with only the documented existing-production-schema baseline exception.
  • Removes production reliance on db.create_all() for analytics schema changes.
  • Requires the analytics database to be stamped at the current Alembic head before the API starts with analytics enabled.
  • Adds deploy workflow migration steps before App Engine staging/production deploys.
  • Fails staging/production deploys before App Engine deployment if required analytics DB secrets are missing while ANALYTICS__ENABLED=true.
  • Enables analytics for staging deploys so staging exercises the migrated runtime path.

Request validation

  • Invalid axis names now return 400 before variable validation can raise Python membership errors.

Test structure

  • Moves reusable patching and mock setup from touched test files into tests/fixtures/.
  • Registers those reusable fixture modules from the root tests/conftest.py, avoiding nested pytest plugin loaders.
  • Keeps endpoint analytics assertions in test files while fixture modules own the database/JWT/tracer patching.

Manual rollout required

Existing analytics databases that already have visits but no alembic_version table must be stamped exactly once before the workflow migration step can run successfully. Run this manually before the first deploy that includes Alembic; otherwise uv run alembic upgrade head will try to create the existing visits table and fail:

uv run alembic stamp 20260508_0001
uv run alembic upgrade head

This should be done first for staging, then for production after staging verification. Staging and production must configure USER_ANALYTICS_DB_CONNECTION_NAME, USER_ANALYTICS_DB_USERNAME, and USER_ANALYTICS_DB_PASSWORD; missing secrets fail the deploy before the App Engine version is created.

Deferred

Daily rollups are intentionally left out of this PR and should be handled separately with offline aggregation or atomic database upsert semantics.

Tests

  • uv run ruff format --check .
  • uv run ruff check alembic policyengine_household_api/data/analytics_setup.py policyengine_household_api/decorators/analytics.py policyengine_household_api/utils/config_loader.py tests/unit/data/test_alembic_migrations.py
  • ANALYTICS_DATABASE_URL=sqlite:////private/tmp/household_api_alembic_smoke.db uv run alembic upgrade head
  • uv run pytest -q tests/unit/data/test_alembic_migrations.py tests/unit/data/test_analytics_setup.py
  • uv run pytest -q tests/unit/decorators/test_analytics.py tests/unit/endpoints/test_household.py
  • uv run pytest -q --confcutdir=tests/unit/utils tests/unit/utils/test_variable_usage_analytics.py
  • uv run pytest -q tests/unit/data/test_alembic_migrations.py tests/unit/utils/test_config_loader.py
  • uv run pytest -q --timeout=150 .github/scripts tests/to_refactor tests/unit
  • uv run pytest -q tests/unit/data/test_analytics_setup.py tests/unit/data/test_alembic_migrations.py tests/unit/decorators/test_analytics.py
  • uv run ruff format --check policyengine_household_api/data/analytics_setup.py tests/unit/data/test_analytics_setup.py
  • uv run ruff check policyengine_household_api/data/analytics_setup.py
  • uv run pytest -q tests/unit/data/test_analytics_setup.py tests/unit/data/test_alembic_migrations.py tests/unit/decorators/test_analytics.py tests/unit/endpoints/test_household.py tests/unit/utils/test_variable_usage_analytics.py
  • uv run pytest -q tests/unit/data/test_analytics_setup.py tests/unit/decorators/test_analytics.py tests/unit/endpoints/test_household.py tests/unit/data/test_alembic_migrations.py tests/unit/utils/test_variable_usage_analytics.py
  • uv run ruff format --check policyengine_household_api/data/analytics_setup.py policyengine_household_api/decorators/analytics.py tests/unit/data/test_analytics_setup.py tests/unit/decorators/test_analytics.py
  • uv run ruff check policyengine_household_api/data/analytics_setup.py policyengine_household_api/decorators/analytics.py
  • ANALYTICS_DATABASE_URL=sqlite:////tmp/household_api_alembic_smoke_autogen.db uv run alembic upgrade head
  • uv run pytest -q tests/unit/data/test_alembic_migrations.py tests/unit/data/test_analytics_setup.py tests/unit/decorators/test_analytics.py tests/unit/endpoints/test_household.py tests/unit/utils/test_variable_usage_analytics.py
  • uv run ruff check alembic/versions/20260508_0002_calculate_variable_usage.py policyengine_household_api/data/analytics_setup.py policyengine_household_api/decorators/analytics.py
  • uv run ruff format --check alembic/versions/20260508_0002_calculate_variable_usage.py policyengine_household_api/data/analytics_setup.py policyengine_household_api/decorators/analytics.py
  • uv run pytest -q tests/unit/data/test_alembic_migrations.py
  • uv run pytest -q tests/unit/data/test_alembic_migrations.py tests/unit/data/test_analytics_setup.py tests/unit/decorators/test_analytics.py tests/unit/endpoints/test_household.py tests/unit/utils/test_variable_usage_analytics.py
  • uv run ruff check policyengine_household_api/utils/variable_usage_analytics.py tests/fixtures/data/analytics_setup.py tests/fixtures/data/analytics_setup_patches.py tests/fixtures/decorators/analytics_patches.py tests/fixtures/endpoints/household.py tests/conftest.py
  • uv run ruff check tests/conftest.py tests/unit/data/test_analytics_setup.py tests/unit/decorators/test_analytics.py tests/unit/endpoints/test_household.py
  • uv run pytest -q tests/unit/data/test_analytics_setup.py tests/unit/decorators/test_analytics.py tests/unit/endpoints/test_household.py tests/unit/utils/test_variable_usage_analytics.py tests/unit/data/test_alembic_migrations.py
  • uv run ruff check policyengine_household_api/endpoints/household.py policyengine_household_api/utils/variable_validation.py policyengine_household_api/utils/variable_usage_analytics.py policyengine_household_api/decorators/analytics.py policyengine_household_api/data/models.py policyengine_household_api/data/analytics_setup.py alembic/versions/20260512_0003_cap_analytics_variable_names.py tests/fixtures/data/analytics_setup_patches.py tests/unit/data/test_alembic_migrations.py tests/unit/endpoints/test_household.py tests/unit/utils/test_variable_usage_analytics.py
  • ANALYTICS_DATABASE_URL=sqlite:////private/tmp/household_api_alembic_verify_20260512_0003.db uv run alembic upgrade head
  • uv run pytest -q tests/unit/endpoints/test_household.py tests/unit/utils/test_variable_validation.py tests/unit/utils/test_variable_usage_analytics.py tests/unit/data/test_alembic_migrations.py tests/unit/data/test_analytics_setup.py tests/unit/decorators/test_analytics.py
  • git diff --check

@anth-volk anth-volk requested a review from hua7450 May 12, 2026 16:51
@anth-volk anth-volk marked this pull request as ready for review May 12, 2026 20:39
@anth-volk anth-volk merged commit a367069 into main May 12, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Manage analytics database schema with Alembic Track privacy-safe variable usage for calculate requests

2 participants