diff --git a/CLAUDE.md b/CLAUDE.md index a00e398..4659903 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,6 +1,6 @@ # IT Service Health Dashboard -Internal web dashboard aggregating real-time health of ~30 SaaS services in an enterprise IT environment. Polls Statuspage.io JSON API, a cloud productivity suite's JSON feed, a chat vendor's native status API, and RSS/Atom feeds. Enriches with dependency mapping and templated impact statements. Displays a unified status board with timeline view, posts alerts to Slack. Deployed on a Mac Mini on the internal network. +Private web dashboard aggregating real-time health of ~30 SaaS services in an enterprise IT environment. Polls Statuspage.io JSON API, a cloud productivity suite's JSON feed, a chat vendor's native status API, and RSS/Atom feeds. Enriches with dependency mapping and templated impact statements. Displays a unified status board with timeline view and posts alerts to Slack. Designed for self-hosted, private-network deployment. ## Roadmap @@ -14,7 +14,7 @@ Historical v1 spec: [IMPLEMENTATION-ROADMAP.md](./IMPLEMENTATION-ROADMAP.md) — - **Frontend:** React 19 (Vite 8+) + Tailwind CSS 4+; FastAPI serves the built static files - **Observability:** structlog (JSON), prometheus-client, sentry-sdk[fastapi], Healthchecks.io - **Resilience:** stamina (retries) + purgatory (per-host circuit breakers) -- **Production process manager:** launchd (macOS); Caddy in front for HTTPS + header auth +- **Production process manager:** OS service manager; reverse proxy in front for HTTPS + header auth ## Build / Test / Run @@ -58,14 +58,14 @@ Open `http://localhost:8000`. | Cloud productivity suite | Custom JSON feed + RSS | Has its own status dashboard, not Statuspage.io | | Chat vendor status | Vendor JSON status endpoint | Dedicated JSON status API | | Database | SQLite + Litestream | Demo-scale + ~1s RPO; Postgres deferred to >100 writes/s | -| Auth | Bearer token on admin endpoints; internal-network-only for reads | Bearer token required for write endpoints — the internal network alone is insufficient | -| Hosting | Mac Mini + Caddy | Always-on, internal-network-accessible; Caddy adds HTTPS + header auth | +| Auth | Bearer token on admin endpoints; private access controls for reads | Bearer token required for write endpoints — read-path access controls alone are insufficient | +| Hosting | Self-hosted private deployment | Always-on private access; reverse proxy adds HTTPS + header auth | | Dep graph layout | Force-directed (react-force-graph-2d) | Dagre hierarchical layout is deferred; force-directed is current default | | LLM layer | Deferred (post-Phase-7) | Template-based summaries sufficient for v2 | ## Feature Gates (off by default) -Phase 7 code is in-tree but gated. Flip only when a public endpoint (Cloudflare Tunnel / Caddy allowlist / ngrok) is available: +Phase 7 code is in-tree but gated. Flip only when a signed callback endpoint is available: - `WEBHOOKS_ENABLED` — `POST /api/webhooks/statuspage/{service_id}` (HMAC-SHA256; `backend/app/router_webhooks.py`). Bypasses flap suppression; writes directly through the alerting pipeline. - `SLACK_ACK_ENABLED` — `POST /api/slack/interactivity` (v0 signing-secret; `backend/app/router_slack.py`). @@ -84,7 +84,7 @@ All new work must map to an active phase in PRODUCTION-ROADMAP.md. Splunk, Thous ## What This Project Is -Internal web dashboard that aggregates real-time health status of ~30 SaaS services supported by an enterprise IT team. Polls vendor status pages via Statuspage.io JSON API, a cloud productivity suite's JSON feed, a chat vendor's native status API, and RSS/Atom feeds. Enriches with dependency mapping and templated impact statements. Displays a unified status board with timeline view and posts alerts to Slack. Deployed on a Mac Mini on the internal network. Designed for IT engineers (deep triage) and IT leadership / company-wide visibility (situational awareness). +Private web dashboard that aggregates real-time health status of ~30 SaaS services supported by an enterprise IT team. Polls vendor status pages via Statuspage.io JSON API, a cloud productivity suite's JSON feed, a chat vendor's native status API, and RSS/Atom feeds. Enriches with dependency mapping and templated impact statements. Displays a unified status board with timeline view and posts alerts to Slack. Designed for self-hosted private deployment and for IT engineers (deep triage) plus IT leadership / company-wide visibility (situational awareness). ## Current State @@ -99,7 +99,7 @@ Main also includes a parallel UX sprint that shipped alongside Phase 5: **Phase 7 partially landed:** - **Statuspage inbound webhook** (`POST /api/webhooks/statuspage/{service_id}`, HMAC-SHA256, optional replay protection) — code in `backend/app/router_webhooks.py`, gated by `WEBHOOKS_ENABLED` (default false). Writes directly through the alerting pipeline, bypassing flap suppression. - **Slack ack flow** (`POST /api/slack/interactivity`, v0 signing-secret) — code in `backend/app/router_slack.py`, gated by `SLACK_ACK_ENABLED` (default false). Block Kit messages only include the Acknowledge button when the flag is true. -- Both features require a public endpoint (Cloudflare Tunnel / Caddy allowlist / ngrok) before flipping the flag. They ship off-by-default so the main app is unaffected. +- Both features require a signed callback endpoint before flipping the flag. They ship off-by-default so the main app is unaffected. **Phase 7 further landed** — postmortem automation (`POSTMORTEMS_ENABLED`), SLO fuel-gauge view + multi-burn-rate alerting (`SLO_BURN_RATE_ENABLED`), and Slack `/itstatus` slash command (`SLACK_SLASH_ENABLED`) all shipped, feature-gated off by default. **Still open:** LLM-layer impact statements, Splunk/JSM/ThousandEyes integration. @@ -115,7 +115,7 @@ Main also includes a parallel UX sprint that shipped alongside Phase 5: - **Config:** PyYAML 6.0+ - **Data validation:** Pydantic 2.10+ - **Frontend:** React 19 (Vite 8+) + Tailwind CSS 4+ -- **Process manager:** launchd (macOS) for production +- **Process manager:** OS service manager for production ## How To Run @@ -144,7 +144,7 @@ Open `http://localhost:8000` in your browser. - Do not start work that isn't in a PRODUCTION-ROADMAP.md phase. If it doesn't fit, discuss first. - Do not integrate Splunk, ThousandEyes, Datadog, or JSM — those are Phase 7+. - Do not build an LLM integration yet — post-Phase-7. -- Do not remove the bearer-token auth on admin endpoints once added. The internal network is not sufficient for write endpoints. +- Do not remove the bearer-token auth on admin endpoints once added. Read-path access controls are not sufficient for write endpoints. - Do not use synchronous I/O — all network calls must be async. - Do not hardcode service definitions in Python — they live in services.yaml. - Do not use slack-sdk — use raw httpx POST for webhook simplicity. diff --git a/README.md b/README.md index aefadc4..d726921 100644 --- a/README.md +++ b/README.md @@ -5,8 +5,8 @@ Real-time status monitoring dashboard for ~30 SaaS services used across an enter ## Project status - **v1 (demo-ready) — SHIPPED.** All original spec delivered: polling, normalization, change detection, Slack alerting, React UI, dependency graph, timeline, SLA tracking, incident clustering, auto reports. -- **v2 (production-ready) — SHIPPED.** Phases 0–6 of the production roadmap complete: bearer-token auth, vendor resilience (stamina + purgatory), alert quality (flap suppression, dedup, tier routing, dependency correlation, maintenance windows, flapping-badge UI), observability (structlog, Prometheus `/metrics`, Sentry, Healthchecks.io dead-man's switch), data lifecycle (production pragmas, retention, Litestream streaming + daily `VACUUM INTO` snapshot), UX productionization (severity-sorted grid, distinct poller-broken state, a11y + keyboard nav, Executive/Engineer view toggle, PWA, `recharts` SLA trend), and platform polish (CI, pre-commit, hardened launchd plist, Caddy, Keychain secrets). **378 tests passing.** -- **v2 Phase 2B + Phase 7 — in tree, gated off.** Statuspage inbound webhook receiver (`WEBHOOKS_ENABLED`), Slack ack flow (`SLACK_ACK_ENABLED`), postmortem drafts (`POSTMORTEMS_ENABLED`), SLO fuel-gauge + multi-burn-rate alerting (`SLO_BURN_RATE_ENABLED`), and Slack `/itstatus` slash command (`SLACK_SLASH_ENABLED`) all shipped with tests but default off. Flip each flag once its prerequisites are in place (public endpoint for Slack features; postmortems need only a writable `POSTMORTEMS_DIR`). +- **v2 (production-ready) — SHIPPED.** Phases 0–6 of the production roadmap complete: bearer-token auth, vendor resilience (stamina + purgatory), alert quality (flap suppression, dedup, tier routing, dependency correlation, maintenance windows, flapping-badge UI), observability (structlog, Prometheus `/metrics`, Sentry, Healthchecks.io dead-man's switch), data lifecycle (production pragmas, retention, Litestream streaming + daily `VACUUM INTO` snapshot), UX productionization (severity-sorted grid, distinct poller-broken state, a11y + keyboard nav, Executive/Engineer view toggle, PWA, `recharts` SLA trend), and platform polish (CI, pre-commit, service supervision, reverse-proxy posture, OS-backed secret storage). **378 tests passing.** +- **v2 Phase 2B + Phase 7 — in tree, gated off.** Statuspage inbound webhook receiver (`WEBHOOKS_ENABLED`), Slack ack flow (`SLACK_ACK_ENABLED`), postmortem drafts (`POSTMORTEMS_ENABLED`), SLO fuel-gauge + multi-burn-rate alerting (`SLO_BURN_RATE_ENABLED`), and Slack `/itstatus` slash command (`SLACK_SLASH_ENABLED`) all shipped with tests but default off. Flip each flag only after the deployment has the required signed callback reachability; postmortems need only a writable `POSTMORTEMS_DIR`. - **v2 Phase 7 remainder — optional.** LLM-layer impact statements; log-aggregation / ITSM / synthetic-monitoring integrations. Not on a fixed schedule; add as demand emerges. **Active roadmap:** [PRODUCTION-ROADMAP.md](./PRODUCTION-ROADMAP.md) — exit-criteria detail for every phase. @@ -60,13 +60,11 @@ Open `http://localhost:8000` in your browser. ## Accessing the Dashboard -The dashboard runs on a Mac Mini on the internal network. Access it at: - -``` -http://:8000 -``` - -No authentication required — internal-network access is the security boundary. +For local development, open `http://localhost:8000` after starting the backend. +For a private deployment, serve the read dashboard behind your organization's +normal access controls and keep admin writes protected by bearer-token auth. The +public repo intentionally describes the deployment shape, not a real host, +machine, or network boundary. ## Service Categories @@ -96,7 +94,7 @@ finance, sales, marketing, networking, support). For services without automated polling (e.g. an identity provider, an HR system, or any service with no public status API), update status via curl. **Admin endpoints require a bearer token** (set `ADMIN_API_TOKEN` in your env). ```bash -export TOKEN="your-admin-token" +export TOKEN="" # Set a service to degraded curl -X POST http://localhost:8000/api/admin/status \ @@ -126,7 +124,7 @@ Valid statuses: `operational`, `degraded`, `partial_outage`, `major_outage`, `un | `SLACK_WEBHOOK_URL` | _(none)_ | Slack incoming webhook URL for ops-alert channel notifications | | `DATABASE_PATH` | `data.db` | SQLite database file path | | `POLL_INTERVAL_SECONDS` | `60` | How often to poll vendor status pages (1–3600) | -| `HOST` | `127.0.0.1` | Server bind address (`0.0.0.0` for network access) | +| `HOST` | `127.0.0.1` | Server bind address; override only for a controlled private deployment | | `PORT` | `8000` | Server port | | `LOG_LEVEL` | `INFO` | Logging level | | `ADMIN_API_TOKEN` | _(none)_ | Bearer token required for `/api/admin/*` endpoints. If unset, admin endpoints refuse all requests. | @@ -190,85 +188,39 @@ cd frontend && npm run dev Frontend dev server at `localhost:5173` proxies `/api/*` to `localhost:8000`. -## Production Deployment (Mac Mini) - -```bash -# 1. Clone and set up (same as Quick Start steps 1-4) - -# 2. Configure environment -cp .env.example backend/.env -# Edit backend/.env: set HOST=0.0.0.0, SLACK_WEBHOOK_URL= - -# 3. Update plist paths -# Edit com.company.it-health-dashboard.plist: -# - Replace /path/to/ with actual project path -# - Add SLACK_WEBHOOK_URL - -# 4. Install launchd service -sudo cp com.company.it-health-dashboard.plist /Library/LaunchDaemons/ -sudo launchctl bootstrap system /Library/LaunchDaemons/com.company.it-health-dashboard.plist - -# 5. Verify -curl http://localhost:8000/api/health - -# 6. Open firewall (if needed) -sudo /usr/libexec/ApplicationFirewall/socketfilterfw --add $(which python3) -``` +## Private Deployment Notes -Manage the service: -```bash -# Stop -sudo launchctl bootout system/com.company.it-health-dashboard +The production path is intentionally self-hosted and private-network oriented: -# Start -sudo launchctl bootstrap system /Library/LaunchDaemons/com.company.it-health-dashboard.plist +- Run the FastAPI process under an OS service manager. +- Put a reverse proxy in front for TLS and request headers. +- Store tokens and webhook secrets in the host secret manager, not in git. +- Keep read access behind the organization access controls. +- Require bearer-token auth for every admin/write endpoint. +- Monitor `/api/health`, `/healthz`, `/metrics`, and the heartbeat job. -# View logs -tail -f /var/log/it-health-dashboard.log -``` +Exact host paths, service-manager commands, firewall posture, and log locations +belong in a private runbook, not in the public README. ## Backup & Disaster Recovery (Litestream) -SQLite is the primary store; [Litestream](https://litestream.io) streams WAL frames to an external replica (S3, SFTP, or a second disk) so the dashboard survives a Mac Mini failure. +SQLite is the primary store; [Litestream](https://litestream.io) streams WAL +frames to an external replica so the dashboard can recover from host failure. ### Setup -```bash -# 1. Install the binary -brew install benbjohnson/litestream/litestream - -# 2. Customize the config template (pick one replica destination) -cp deploy/litestream.yml.example /opt/it-health/deploy/litestream.yml -$EDITOR /opt/it-health/deploy/litestream.yml - -# 3. Validate the config before loading it -litestream validate -config /opt/it-health/deploy/litestream.yml - -# 4. Install the sidecar launchd daemon -cp deploy/com.company.it-health-dashboard-litestream.plist.example \ - /Library/LaunchDaemons/com.company.it-health-dashboard-litestream.plist -sudo launchctl bootstrap system /Library/LaunchDaemons/com.company.it-health-dashboard-litestream.plist +Use the checked-in config template as a starting point, keep the real replica +destination out of git, validate the config before enabling the sidecar, and +monitor snapshots as part of routine operations. -# 5. Confirm replication is working -litestream snapshots -config /opt/it-health/deploy/litestream.yml -``` - -Litestream RPO is ~1 second — after the initial snapshot, every WAL frame ships as it's written. +Litestream RPO is ~1 second — after the initial snapshot, every WAL frame ships as it is written. ### Restore -```bash -# 1. Stop the main app so the DB isn't being written to -sudo launchctl bootout system/com.company.it-health-dashboard - -# 2. Restore from replica (picks up the latest snapshot + WAL frames) -litestream restore -config /opt/it-health/deploy/litestream.yml \ - -o /opt/it-health/data.db \ - /opt/it-health/data.db - -# 3. Start the app — it applies pending migrations on boot and resumes polling -sudo launchctl bootstrap system /Library/LaunchDaemons/com.company.it-health-dashboard.plist -``` +Restore procedure: stop writers, restore the latest snapshot plus WAL frames +into the configured database location, restart the service, and let startup +migrations run. Keep the exact command sequence in a private runbook because it +depends on the host service manager, paths, and replica destination. ### Data retention @@ -297,7 +249,7 @@ The retention job runs every `RETENTION_INTERVAL_HOURS` (default 168 = weekly) a | `/api/services/graph` | GET | Service dependency graph (nodes + links) for visualization | | `/api/services/slo` | GET | Per-service SLO snapshot: error-budget remaining + active burn-rate breaches | | `/api/admin/status` | POST | Manual status update (requires `Authorization: Bearer $ADMIN_API_TOKEN`) | -| `/healthz` | GET | Dead-man's switch — 200 fresh / 503 stale. Hit by launchd + Healthchecks.io. | +| `/healthz` | GET | Dead-man's switch — 200 fresh / 503 stale. Hit by the service supervisor + Healthchecks.io. | | `/metrics` | GET | Prometheus text exposition. | | `/api/webhooks/statuspage/{id}` | POST | Inbound Statuspage subscriber webhook, HMAC-verified. 404 unless `WEBHOOKS_ENABLED=true`. | | `/api/slack/interactivity` | POST | Slack block-actions receiver (ack button). 404 unless `SLACK_ACK_ENABLED=true`. | diff --git a/docs/PORTFOLIO-DISPOSITION.md b/docs/PORTFOLIO-DISPOSITION.md index e33f614..f22d141 100644 --- a/docs/PORTFOLIO-DISPOSITION.md +++ b/docs/PORTFOLIO-DISPOSITION.md @@ -45,7 +45,7 @@ Only `origin` (`saagpatel/ITServiceHealth`). Clean migration state. maintenance windows, observability (structlog + Prometheus `/metrics` + Sentry + Healthchecks.io dead-man's switch), Litestream streaming + daily `VACUUM INTO` snapshot, PWA, - hardened launchd plist, Caddy reverse proxy, Keychain secrets + hardened service supervision, reverse proxy posture, OS-backed secrets - **356 tests passing** - `PRODUCTION-ROADMAP.md` + `IMPLEMENTATION-ROADMAP.md` on canonical main @@ -73,7 +73,7 @@ automation + SLO views + multi-burn-rate alerting + Slack slash command + per-service webhook overrides all shipped in the last 6 merges). **356 tests passing.** Phase 2B + Phase 7 webhooks (Statuspage inbound + Slack ack) are gated off pending a public -reachability path (Cloudflare Tunnel / Caddy allowlist). +signed callback reachability path. --- @@ -81,14 +81,14 @@ reachability path (Cloudflare Tunnel / Caddy allowlist). Joins **self-hosted service cluster** as the second member. RedditSentimentAnalyzer (R10) founded the cluster with personal -self-hosted infrastructure (launchd + nginx). ITServiceHealth +self-hosted infrastructure. ITServiceHealth extends: | Aspect | RedditSentimentAnalyzer | **ITServiceHealth** | |---|---|---| | Audience | Operator-personal | **Operator's employer (enterprise IT)** | -| Reachability | launchd + nginx | **launchd + Caddy + Cloudflare Tunnel (planned)** | -| Secrets | Standard | **macOS Keychain** | +| Reachability | Private deployment | **signed callback path (planned)** | +| Secrets | Standard | **OS-backed secret storage** | | Observability | Basic | **structlog + Prometheus `/metrics` + Sentry + Healthchecks.io dead-man's switch** | | Data lifecycle | Standard SQLite | **Litestream streaming + daily VACUUM INTO snapshot** | | Alerting | Reddit polling | **5-state vendor polling + Slack Block Kit + dependency-graph impact statements** | @@ -130,8 +130,8 @@ Operational concerns: 1. **Phase 2B + Phase 7 webhooks reachability** — Statuspage inbound webhook receiver + Slack ack flow shipped with HMAC - verification but gated off. Flip when Cloudflare Tunnel or - Caddy allowlist is in place. + verification but gated off. Flip when signed callback + reachability is in place. 2. **Phase 7 remainder polish** — postmortem variants, SLO views, multi-burn-rate alerting all shipped; remainder is operator- cadence demand-driven. @@ -141,10 +141,10 @@ Operational concerns: Healthchecks.io dead-man's switch catches silent breakage. 4. **Litestream snapshot verification** — daily `VACUUM INTO` provides recovery; verify restore path periodically. -5. **Keychain secret rotation** — bearer tokens + Slack - credentials + Sentry DSN + vendor API keys all in Keychain; +5. **Secret rotation** — bearer tokens + Slack + credentials + Sentry DSN + vendor API keys all in OS-backed storage; document rotation cadence. -6. **launchd plist hardening + Caddy config** verification on +6. **Service supervision + reverse proxy config** verification on major macOS updates. No public unblock — this serves the operator's employer @@ -158,13 +158,13 @@ internally. |---|---| | Portfolio status | `Active (self-hosted service, corporate-context)` | | Audience | **Enterprise IT** (operator's employer) | -| Distribution model | **Self-hosted on operator infrastructure** (launchd + Caddy + Cloudflare Tunnel) | +| Distribution model | **Self-hosted on operator infrastructure** with signed callback reachability planned | | Review cadence | Active — Phase 7 polish + Phase 2B gating + operational maintenance | -| Resurface conditions | (a) Phase 2B webhook gating decision, (b) vendor API breakage, (c) macOS update breaks launchd or Caddy, (d) Keychain secret rotation cadence, (e) v3 scope packet | +| Resurface conditions | (a) Phase 2B webhook gating decision, (b) vendor API breakage, (c) OS/runtime drift breaks service supervision or reverse proxy, (d) secret rotation cadence, (e) v3 scope packet | | Co-batch with | Self-hosted service cluster — **now 2 repos** (personal + corporate-context) | | Sub-shape | **Corporate-context self-hosted service** (new) | | Special concern | **Vendor status API breakage monitoring.** Healthchecks.io dead-man's switch is the load-bearing observability layer. | -| Special concern | **Phase 2B webhook reachability** — gated off until Cloudflare Tunnel or Caddy allowlist in place. | +| Special concern | **Phase 2B webhook reachability** — gated off until signed callback reachability is in place. | | Special concern | **Litestream + VACUUM INTO snapshot** — verify restore path periodically. | | Special concern | **Corporate context** — operator's employer relies on this; ship discipline higher than personal projects. | @@ -176,7 +176,7 @@ internally. 2. Working tree clean — no stash needed. 3. **Re-read `PRODUCTION-ROADMAP.md`** for current Phase 7 state. 4. Run `pytest` — expect 356 tests passing. -5. Verify launchd plist + Caddy config still functional. +5. Verify service supervision + reverse proxy config still functional. 6. Verify Healthchecks.io dead-man's switch is being pinged. 7. Check Litestream stream + most recent daily snapshot. 8. Verify vendor status API integrations (Statuspage / chat platform / @@ -191,7 +191,7 @@ internally. | `origin/main` tip | `cc15c9a` perf(logging): offload file I/O to QueueListener thread (#28) | | Last substantive feat | `4932246` feat(alerting): vendor_incident_id extraction + per-service webhook override (#27) | | Default branch | `main` | -| Build system | Python + FastAPI + SQLite + React + Caddy reverse proxy + launchd + macOS Keychain | +| Build system | Python + FastAPI + SQLite + React + reverse proxy + service supervision + OS-backed secrets | | Service count | ~30 SaaS services monitored | | Test count | **356 tests passing** | | Audience | **Enterprise IT (operator's employer)** — corporate-context self-hosted | @@ -200,4 +200,4 @@ internally. | Data lifecycle | Litestream streaming + daily `VACUUM INTO` snapshot | | Active arc | Phase 7 polish + Phase 2B webhook gating | | Migration state | No `legacy-origin` remote | -| Distinguishing feature | **Second self-hosted service cluster member; introduces corporate-context sub-shape.** Substantially more operational maturity than RedditSentimentAnalyzer (Keychain + Litestream + Caddy + observability stack). Active Phase 7 cadence. | +| Distinguishing feature | **Second self-hosted service cluster member; introduces corporate-context sub-shape.** Substantially more operational maturity than RedditSentimentAnalyzer (OS-backed secrets + Litestream + reverse proxy + observability stack). Active Phase 7 cadence. | diff --git a/docs/architecture-diagram/build_diagram.py b/docs/architecture-diagram/build_diagram.py index eda2445..37f9d36 100644 --- a/docs/architecture-diagram/build_diagram.py +++ b/docs/architecture-diagram/build_diagram.py @@ -434,10 +434,10 @@ def arrow(x1, y1, x2, y2, *, color=TEXT_MUTED, lw=1.3): ha="center", ) edge_lines = [ - "Caddy · HTTPS + header auth", - "internal-network-only read path", + "reverse proxy · HTTPS + header auth", + "private read path", "Bearer-token write path", - "launchd · Mac Mini 24/7", + "private self-hosted runtime", ] for i, line in enumerate(edge_lines): text(edge_x + 1.0, 25.2 - i * 1.4, line, size=6.8, color=TEXT_SECONDARY) diff --git a/docs/case-study/pulse.html b/docs/case-study/pulse.html index cbf94c2..5b136f3 100644 --- a/docs/case-study/pulse.html +++ b/docs/case-study/pulse.html @@ -521,8 +521,8 @@

03 · What production-ready meantSeven phases of un
Phase 6 · Platform polish
GitHub Actions CI with ruff and mypy - --strict, a hardened launchd plist, Caddy in front for - HTTPS plus header auth, and Keychain for secrets on the Mac + --strict, hardened service supervision, a reverse proxy in front for + HTTPS plus header auth, and OS-backed secret storage Mini it runs on.
@@ -693,7 +693,7 @@

05 · Alert hygieneOne message per real incident. N

The ack flow lives behind a feature flag: off by default because it requires a public reachability path for Slack's interactivity - endpoint, which the internal-network-only deployment does not yet provide. The + endpoint, which the private deployment does not yet provide. The code is in the tree; the HMAC verification is implemented; the tests pass; the switch flips when the network path does.

@@ -741,7 +741,7 @@

06 · ObservabilityIf the dashboard were down, I wo every thirty seconds. If the beats stop, an external observer notices — even if the dashboard's own monitoring has broken. The /healthz endpoint returns 503 if the last heartbeat - was more than a hundred and twenty seconds ago, so launchd and + was more than a hundred and twenty seconds ago, so service supervision and any external liveness probe agree on what "alive" means.

diff --git a/docs/design-brief/brief.md b/docs/design-brief/brief.md index 80b0a65..0dc3177 100644 --- a/docs/design-brief/brief.md +++ b/docs/design-brief/brief.md @@ -2,7 +2,7 @@ ## What Pulse is -An internal status dashboard that aggregates real-time health of ~30 SaaS services used by Enterprise IT (identity, collaboration, productivity, CRM, video, telephony, and ITSM tools). Polls vendor status endpoints every 60 seconds, detects state changes, fires Slack alerts, renders a unified board with timeline + dependency graph + SLA history. Served from a Mac Mini on the internal network at `http://:8000`. +A private status dashboard that aggregates real-time health of ~30 SaaS services used by Enterprise IT (identity, collaboration, productivity, CRM, video, telephony, and ITSM tools). Polls vendor status endpoints every 60 seconds, detects state changes, fires Slack alerts, renders a unified board with timeline + dependency graph + SLA history. Designed for self-hosted private deployment rather than a public SaaS surface. ## Who uses it @@ -91,7 +91,7 @@ Claude Design should produce proposals for each: ## Tech constraints for the handoff - **React 19 + Vite + Tailwind 4** — Tailwind is v4 syntax (`@theme`, not `tailwind.config.js`). Handoff CSS variables map cleanly to `@theme` tokens. -- **IBM Plex via `@fontsource`** self-hosted — no CDN fonts (deploy is internal-network-only). If the new typography is a commercial/Google font, we need a self-hostable equivalent or a Fontsource package. +- **IBM Plex via `@fontsource`** self-hosted — no CDN fonts (deploy is private/self-hosted). If the new typography is a commercial/Google font, we need a self-hostable equivalent or a Fontsource package. - **Lucide icons** — easy to keep or swap. - **`recharts`** for SLA trend — any chart restyling should fit inside recharts' props, not require migration. - **No new component framework** — we're not adopting shadcn/ui or MUI just for this refresh. Tailwind + plain React components only. diff --git a/docs/executive-view-redesign/IMPLEMENTATION-ROADMAP.md b/docs/executive-view-redesign/IMPLEMENTATION-ROADMAP.md index 5fa5636..1aebcf6 100644 --- a/docs/executive-view-redesign/IMPLEMENTATION-ROADMAP.md +++ b/docs/executive-view-redesign/IMPLEMENTATION-ROADMAP.md @@ -152,7 +152,7 @@ Project uses JSX + JSDoc, not TypeScript. Provide JSDoc typedefs at the top of ` | Service | Endpoint | Method | Auth | Rate Limit | Pagination | Purpose | |---------|----------|--------|------|------------|------------|---------| -| Pulse backend | `/api/summary` | GET | none (internal network boundary) | n/a — local | none | overall_status, active_incidents[], counts | +| Pulse backend | `/api/summary` | GET | private access controls | n/a — local | none | overall_status, active_incidents[], counts | | Pulse backend | `/api/services` | GET | none | n/a — local | none | per-service status, poller_health, category | | Pulse backend | `/api/services/sla` | GET | none | n/a — local | none | uptime_24h / uptime_7d / uptime_30d per service | | Pulse backend | `/api/services/sla/history?days=30` | GET | none | n/a — local | none | daily uptime points per service, 30-day window | @@ -183,7 +183,7 @@ Expected (already installed): `recharts@^3.8.1`, `lucide-react@^1.8.0`, `@tailwi **Out of scope:** - Engineer view (`ServiceGrid`, `ServiceDetail`, `DependencyGraph`, `Timeline`, `ServiceTile`) -- Backend endpoints, pollers, alerting, observability, SQLite, launchd, Caddy, Litestream +- Backend endpoints, pollers, alerting, observability, SQLite, service supervision, reverse proxy, Litestream - `ViewContext` itself — reused as-is - PWA manifest, service worker, reload prompt - Any work gated behind `WEBHOOKS_ENABLED` or `SLACK_ACK_ENABLED` @@ -195,9 +195,9 @@ Expected (already installed): `recharts@^3.8.1`, `lucide-react@^1.8.0`, `@tailwi ## Security & Credentials -- Credential storage: not applicable — all endpoints are read-only and the internal network is the security boundary per root `CLAUDE.md`. +- Credential storage: not applicable for this read-only view; private access controls are covered by the root `CLAUDE.md`. - Data boundaries: nothing leaves the browser except the existing same-origin XHRs. -- Encryption: n/a — same-origin HTTPS once Caddy terminates TLS in production; not a concern for this feature. +- Encryption: n/a — same-origin HTTPS once the reverse proxy terminates TLS in production; not a concern for this feature. - Token rotation: n/a — no admin endpoints touched. --- diff --git a/docs/pitch-deck/build_deck.py b/docs/pitch-deck/build_deck.py index d3157ab..7625ac5 100644 --- a/docs/pitch-deck/build_deck.py +++ b/docs/pitch-deck/build_deck.py @@ -403,7 +403,7 @@ def slide_what(): ), ( "v2 · phase 5-6", - "TanStack-style polling · Executive/Engineer toggle · PWA · a11y · CI · Caddy · Keychain", + "TanStack-style polling · Executive/Engineer toggle · PWA · a11y · CI · reverse proxy · OS-backed secrets", ), ] row_h = Inches(0.72)