Skip to content

Add deliverer, OpenAPI, and v2 decision metrics#78

Merged
chrisbliss18 merged 11 commits into
v2from
stack-06-deliverer-openapi-ops
Apr 28, 2026
Merged

Add deliverer, OpenAPI, and v2 decision metrics#78
chrisbliss18 merged 11 commits into
v2from
stack-06-deliverer-openapi-ops

Conversation

@chrisbliss18
Copy link
Copy Markdown
Contributor

Stacked PR 6 of 9.

Base: stack-05-docs-tooling-coverage
Head: stack-06-deliverer-openapi-ops
Previous PR: #77

Summary:

  • Guards delivery workers behind an owner host and introduces the standalone deliverer entrypoint.
  • Claims deliveries with row locks.
  • Measures the v2 downtime decision flow and related projection/parity signals.
  • Blesses JSON-over-HTTP for Veriflier transport.
  • Publishes the route-driven OpenAPI contract, component schemas, and client-generation smoke validation.
  • Documents outbound credential encryption and deliverer rollout policy.

Review notes:
This PR moves the project from single-process delivery assumptions toward the planned deliverer split and adds the API contract artifacts needed by downstream users.

Chris Jean added 11 commits April 27, 2026 19:07
Add DELIVERY_OWNER_HOST so API_PORT can be enabled without automatically making every API-capable monitor an outbound delivery owner. The old behavior remains when the owner is unset, but startup and validate-config now warn about that fallback.

Wire the guard through webhook and alert-contact worker startup, keep alert-contact send-test dispatchers available to the API, and document the production deployment shape while row-level delivery claiming remains deferred.
Emit StatsD counters and timers around the local-failure to Seems Down path, Veriflier escalation, per-Veriflier RPC and vote outcomes, quorum decisions, confirmed down, probe-cleared recovery, and false alarms.

This gives production v2 the timing and outcome data needed to compare the current main-server-plus-Veriflier design against the future probe-agent candidates without changing eventstore rows or notification semantics.
Count active sites in the owned bucket range whose legacy site_status projection no longer matches the authoritative open HTTP event, then emit a gauge plus warning and error counters while the shadow projection is enabled.

This keeps v2 production drift visible before legacy readers move fully to the event tables and the projection can be disabled.
Move the existing webhook and alert-contact worker wiring behind an internal deliverer package and add a jetmon-deliverer command that can run outbound delivery without the monitor, API, dashboard, or bucket ownership loop.

The embedded jetmon2 path still uses the same workers through the shared package, so this creates the process boundary without changing delivery semantics. The standalone binary remains single-active per database cluster until row claiming replaces the current soft locks.
Wrap webhook and alert-contact ClaimReady in short transactions that select ready rows FOR UPDATE, write the in-flight lease before commit, and then let workers dispatch outside the transaction.

This prevents active-active delivery workers from claiming the same pending row while preserving the existing retry-leasing behavior and the project’s MySQL 5.7 compatibility target.
Make port and VERIFLIER_PORT the canonical names for the v2 Veriflier transport while keeping grpc_port and VERIFLIER_GRPC_PORT as compatibility aliases for existing configs and Docker environments.

The protoc toolchain is not part of the current build path, so the proto file remains a schema reference instead of a v2 production dependency.
Move API route registration into a shared route table so the mux and OpenAPI generator use the same source of truth.

Expose GET /api/v1/openapi.json with route metadata for methods, scopes, idempotency, path parameters, and generic responses while leaving detailed schemas as the next contract-hardening step.
Capture the migration path from the current plaintext v2 credential model to envelope-style encryption at rest for webhook secrets and alert-contact destinations.

Also trim completed implementation items out of the active roadmap so the remaining queue reflects post-v2 hardening work.
Add an operator runbook for moving outbound delivery from embedded API workers to the standalone jetmon-deliverer process.

The runbook calls out the DELIVERY_OWNER_HOST guard, single-owner rollback path, active-active config constraints, and rollout checks so the remaining roadmap work is deployment packaging rather than policy design.
Extend the route registry with request and response schema names so the OpenAPI document can reference concrete component schemas instead of generic objects.

The schema generator now derives components from the handler structs, including list envelopes and write request bodies, while tests ensure every routed schema reference is present.
Add OpenAPI tests that recursively resolve component refs and type-check an in-memory Go client built from published operation IDs and component schema names.

Update the API and roadmap docs to describe this repo-local CI guard while leaving consumer-specific generator validation as a future follow-up.
@chrisbliss18 chrisbliss18 changed the base branch from stack-05-docs-tooling-coverage to v2 April 28, 2026 14:54
@chrisbliss18 chrisbliss18 merged commit 3c76947 into v2 Apr 28, 2026
@chrisbliss18 chrisbliss18 deleted the stack-06-deliverer-openapi-ops branch April 28, 2026 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant