Structured error reporting + parallel fetch for zeeker 0.7.0#7
Merged
Conversation
Resource failures now surface with full tracebacks and per-resource attribution instead of flat error strings, and the output is designed for both humans (rich table, colored streaming lines) and AI agents (--json, stable [OK]/[FAIL]/[SKIP] prefixes, CWD-relative paths in tracebacks). Core changes: - types.py: add ResourceOutcome / BuildReport dataclasses; extend ValidationResult additively with tracebacks, report, records - processor.py, deployer.py: capture traceback.format_exc() at every except, so real Python stacks survive to the CLI - builder.py: aggregate a BuildReport with per-resource timings and status; accept an optional progress_callback so the CLI can drive streaming output - helpers.py: render_build_report / render_resource_event — JSON, rich-table TTY, and plain non-TTY outputs share a single schema - cli.py: -v/--verbose and --json on build and deploy; documented exit codes (0 success, 1 resource failure, 2 fatal) Based on user feedback from a week of real builds, two watchability features go in with the refactor: - --fail-on-empty: treats resources that returned [] as failures (exit 1). Addresses the silent-success case where a resource swallows its own exception and produces zero rows. - --progress-file <path>: atomically overwrites a JSON snapshot of BuildReport after each resource finishes. Lets trigger-and-wait callers distinguish "host asleep" from "build running" without parsing stdout. Test suite updated (54 CLI tests, 353 total) and rich>=13.0 added. Cross-resource parallelism and --post-hook explicitly left for follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Targets four friction points from a week of real builds:
1. Cross-resource parallelism (--parallel N)
DatabaseBuilder.build_database gains max_parallel. When > 1, fetch_data()
calls for all resources are pre-warmed under asyncio.gather +
Semaphore(N) before the existing sequential insert loop runs. Sync
fetchers run in the default ThreadPoolExecutor so they participate
alongside async ones. DB writes stay strictly sequential to avoid
SQLite contention. A 4-resource project sleeping 0.3s per fetch drops
from ~1.2s to <1.0s with --parallel 4 (test_builder_parallel.py).
2. Mid-pipeline hook (--post-hook CMD)
New commands/post_hook.py runs a shell command after a successful
build with ZEEKER_DB_PATH, ZEEKER_DB_NAME, ZEEKER_PROJECT_PATH,
ZEEKER_BUILD_STATUS, and ZEEKER_BUILD_REPORT env vars set. Non-zero
hook exit propagates to CLI exit 1. Outcome is part of BuildReport
(and therefore JSON output and --progress-file).
3. Safer --sync-from-s3 (--force-sync)
Sync refuses to download when a local DB already exists; the user
must pass --force-sync to opt in. Byte-level hash comparisons were
considered but every build writes to meta tables, so "diverged"
detection is unreliable. Conservative existence check matches the
user's ask ("making this explicit would make the workflow safer").
4. Transform traceback hygiene
_apply_transformation now returns (data, traceback_str). When a
transform raises, the full Python stack flows into result.tracebacks
and is surfaced via -v / --json.
Also converts two silent except-pass blocks in builder.py
(schema-sample fetch and fragments-context fetch) into warnings so
users can see why those paths bailed.
Four new test modules added; two existing tests updated for the tuple
return shape and the force-sync default.
Agent-team attempt note: four Wave-1 streams were planned to run as
parallel worktree-isolated agents, but the worktrees fired from a
pre-structured-reporting commit and hit a read-before-edit hook that
wasn't working across their contexts. Executed all four streams plus
Wave 2 integration inline from main instead.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Minor release covering: - Structured error reporting for build/deploy (BuildReport, --json, --verbose, --progress-file, --fail-on-empty) - Cross-resource parallel fetch phase (--parallel N) - Post-build shell hook (--post-hook CMD) with ZEEKER_* env vars - Safer --sync-from-s3 via explicit --force-sync opt-in - Transform traceback preservation and quieter silent-failure surfaces Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Releasing zeeker 0.7.0 with a large CLI UX upgrade driven by a week of real-world build feedback. Three commits:
6139382— Structured error reporting infrastructure (BuildReport,ResourceOutcome, tracebacks end-to-end)bc97738— Parallel fetch, post-hook, force-sync, transform tracebackb04f51e— Version bump to 0.7.0New CLI flags on
zeeker build-v/--verbose--jsonBuildReportJSON (agent-friendly)--progress-file PATH--fail-on-emptyfetch_data()returns as failures (exit 1)--parallel N--post-hook CMDZEEKER_DB_PATH,ZEEKER_BUILD_REPORT, etc.--force-sync--sync-from-s3to overwrite an existing local DB (now refused by default)zeeker deployalso gains-v/--verboseand--json.Documented exit codes
0— all succeeded1— resource failure, FTS setup failure, or post-hook non-zero2— fatal (schema conflict, DB open, config error, diverged local DB)Behavioural fixes
_apply_transformationnow returns(data, traceback)so transform-raise no longer silently produces a generic "transformation failed".except: passblocks inbuilder.py(schema-sample fetch, fragments-context fetch) now surface as warnings rather than swallowing.Performance example
4-resource project with
await asyncio.sleep(0.3)per fetch: ~1.2s sequential → <1.0s with--parallel 4(verified intest_builder_parallel.py).Backwards compatibility
ValidationResultextended additively (tracebacks,report,records) — existing consumers untouched.build_database(...)kwargs all default-compatible (max_parallel=1,force_sync=False,progress_callback=None).rich>=13.0dependency.Test plan
uv run pytest packages/zeeker/tests/ --no-cov— 273 passed, 1 skippeduv run pytestacross all three packages — 373 passeduv run black packages/zeeker/— cleanuv run zeeker build --helpshows all new flagstest_builder_parallel.py(4 tests)test_post_hook.py(4 tests)test_processor_transform.py(3 tests)test_s3_sync_divergence.py(5 tests)sg-gov-newsrooms-zeekerwith--parallel 4to confirm the 900s timeout is gonezeeker build --json | jqon a partial-failure project — verify schemazeeker build --post-hook 'echo "$ZEEKER_DB_NAME" > /tmp/hook.log'— verify env vars reach the hookzeeker build --sync-from-s3on a project with a local DB — verify it refuses;--force-syncproceedsRelease steps after merge
v0.7.0on merged commit.v0.7.0 — Structured error reporting + parallel fetch).publish-zeeker.yml) fires on release creation and pushes to PyPI.🤖 Generated with Claude Code