saagpatel · saagpatel · Jun 14, 2026 · Jun 14, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -1,6 +1,6 @@
 # IT Service Health Dashboard
 
-Internal web dashboard aggregating real-time health of ~30 SaaS services in an enterprise IT environment. Polls Statuspage.io JSON API, a cloud productivity suite's JSON feed, a chat vendor's native status API, and RSS/Atom feeds. Enriches with dependency mapping and templated impact statements. Displays a unified status board with timeline view, posts alerts to Slack. Deployed on a Mac Mini on the internal network.
+Private web dashboard aggregating real-time health of ~30 SaaS services in an enterprise IT environment. Polls Statuspage.io JSON API, a cloud productivity suite's JSON feed, a chat vendor's native status API, and RSS/Atom feeds. Enriches with dependency mapping and templated impact statements. Displays a unified status board with timeline view and posts alerts to Slack. Designed for self-hosted, private-network deployment.
 
 ## Roadmap
 
@@ -14,7 +14,7 @@ Historical v1 spec: [IMPLEMENTATION-ROADMAP.md](./IMPLEMENTATION-ROADMAP.md) —
 - **Frontend:** React 19 (Vite 8+) + Tailwind CSS 4+; FastAPI serves the built static files
 - **Observability:** structlog (JSON), prometheus-client, sentry-sdk[fastapi], Healthchecks.io
 - **Resilience:** stamina (retries) + purgatory (per-host circuit breakers)
-- **Production process manager:** launchd (macOS); Caddy in front for HTTPS + header auth
+- **Production process manager:** OS service manager; reverse proxy in front for HTTPS + header auth
 
 ## Build / Test / Run
 
@@ -58,14 +58,14 @@ Open `http://localhost:8000`.
 | Cloud productivity suite | Custom JSON feed + RSS | Has its own status dashboard, not Statuspage.io |
 | Chat vendor status | Vendor JSON status endpoint | Dedicated JSON status API |
 | Database | SQLite + Litestream | Demo-scale + ~1s RPO; Postgres deferred to >100 writes/s |
-| Auth | Bearer token on admin endpoints; internal-network-only for reads | Bearer token required for write endpoints — the internal network alone is insufficient |
-| Hosting | Mac Mini + Caddy | Always-on, internal-network-accessible; Caddy adds HTTPS + header auth |
+| Auth | Bearer token on admin endpoints; private access controls for reads | Bearer token required for write endpoints — read-path access controls alone are insufficient |
+| Hosting | Self-hosted private deployment | Always-on private access; reverse proxy adds HTTPS + header auth |
 | Dep graph layout | Force-directed (react-force-graph-2d) | Dagre hierarchical layout is deferred; force-directed is current default |
 | LLM layer | Deferred (post-Phase-7) | Template-based summaries sufficient for v2 |
 
 ## Feature Gates (off by default)
 
-Phase 7 code is in-tree but gated. Flip only when a public endpoint (Cloudflare Tunnel / Caddy allowlist / ngrok) is available:
+Phase 7 code is in-tree but gated. Flip only when a signed callback endpoint is available:
 
 - `WEBHOOKS_ENABLED` — `POST /api/webhooks/statuspage/{service_id}` (HMAC-SHA256; `backend/app/router_webhooks.py`). Bypasses flap suppression; writes directly through the alerting pipeline.
 - `SLACK_ACK_ENABLED` — `POST /api/slack/interactivity` (v0 signing-secret; `backend/app/router_slack.py`).
@@ -84,7 +84,7 @@ All new work must map to an active phase in PRODUCTION-ROADMAP.md. Splunk, Thous
 
 ## What This Project Is
 
-Internal web dashboard that aggregates real-time health status of ~30 SaaS services supported by an enterprise IT team. Polls vendor status pages via Statuspage.io JSON API, a cloud productivity suite's JSON feed, a chat vendor's native status API, and RSS/Atom feeds. Enriches with dependency mapping and templated impact statements. Displays a unified status board with timeline view and posts alerts to Slack. Deployed on a Mac Mini on the internal network. Designed for IT engineers (deep triage) and IT leadership / company-wide visibility (situational awareness).
+Private web dashboard that aggregates real-time health status of ~30 SaaS services supported by an enterprise IT team. Polls vendor status pages via Statuspage.io JSON API, a cloud productivity suite's JSON feed, a chat vendor's native status API, and RSS/Atom feeds. Enriches with dependency mapping and templated impact statements. Displays a unified status board with timeline view and posts alerts to Slack. Designed for self-hosted private deployment and for IT engineers (deep triage) plus IT leadership / company-wide visibility (situational awareness).
 
 ## Current State
 
@@ -99,7 +99,7 @@ Main also includes a parallel UX sprint that shipped alongside Phase 5:
 **Phase 7 partially landed:**
 - **Statuspage inbound webhook** (`POST /api/webhooks/statuspage/{service_id}`, HMAC-SHA256, optional replay protection) — code in `backend/app/router_webhooks.py`, gated by `WEBHOOKS_ENABLED` (default false). Writes directly through the alerting pipeline, bypassing flap suppression.
 - **Slack ack flow** (`POST /api/slack/interactivity`, v0 signing-secret) — code in `backend/app/router_slack.py`, gated by `SLACK_ACK_ENABLED` (default false). Block Kit messages only include the Acknowledge button when the flag is true.
-- Both features require a public endpoint (Cloudflare Tunnel / Caddy allowlist / ngrok) before flipping the flag. They ship off-by-default so the main app is unaffected.
+- Both features require a signed callback endpoint before flipping the flag. They ship off-by-default so the main app is unaffected.
 
 **Phase 7 further landed** — postmortem automation (`POSTMORTEMS_ENABLED`), SLO fuel-gauge view + multi-burn-rate alerting (`SLO_BURN_RATE_ENABLED`), and Slack `/itstatus` slash command (`SLACK_SLASH_ENABLED`) all shipped, feature-gated off by default. **Still open:** LLM-layer impact statements, Splunk/JSM/ThousandEyes integration.
 
@@ -115,7 +115,7 @@ Main also includes a parallel UX sprint that shipped alongside Phase 5:
 - **Config:** PyYAML 6.0+
 - **Data validation:** Pydantic 2.10+
 - **Frontend:** React 19 (Vite 8+) + Tailwind CSS 4+
-- **Process manager:** launchd (macOS) for production
+- **Process manager:** OS service manager for production
 
 ## How To Run
 
@@ -144,7 +144,7 @@ Open `http://localhost:8000` in your browser.
 - Do not start work that isn't in a PRODUCTION-ROADMAP.md phase. If it doesn't fit, discuss first.
 - Do not integrate Splunk, ThousandEyes, Datadog, or JSM — those are Phase 7+.
 - Do not build an LLM integration yet — post-Phase-7.
-- Do not remove the bearer-token auth on admin endpoints once added. The internal network is not sufficient for write endpoints.
+- Do not remove the bearer-token auth on admin endpoints once added. Read-path access controls are not sufficient for write endpoints.
 - Do not use synchronous I/O — all network calls must be async.
 - Do not hardcode service definitions in Python — they live in services.yaml.
 - Do not use slack-sdk — use raw httpx POST for webhook simplicity.

diff --git a/README.md b/README.md
@@ -5,8 +5,8 @@ Real-time status monitoring dashboard for ~30 SaaS services used across an enter
 ## Project status
 
 - **v1 (demo-ready) — SHIPPED.** All original spec delivered: polling, normalization, change detection, Slack alerting, React UI, dependency graph, timeline, SLA tracking, incident clustering, auto reports.
-- **v2 (production-ready) — SHIPPED.** Phases 0–6 of the production roadmap complete: bearer-token auth, vendor resilience (stamina + purgatory), alert quality (flap suppression, dedup, tier routing, dependency correlation, maintenance windows, flapping-badge UI), observability (structlog, Prometheus `/metrics`, Sentry, Healthchecks.io dead-man's switch), data lifecycle (production pragmas, retention, Litestream streaming + daily `VACUUM INTO` snapshot), UX productionization (severity-sorted grid, distinct poller-broken state, a11y + keyboard nav, Executive/Engineer view toggle, PWA, `recharts` SLA trend), and platform polish (CI, pre-commit, hardened launchd plist, Caddy, Keychain secrets). **378 tests passing.**
-- **v2 Phase 2B + Phase 7 — in tree, gated off.** Statuspage inbound webhook receiver (`WEBHOOKS_ENABLED`), Slack ack flow (`SLACK_ACK_ENABLED`), postmortem drafts (`POSTMORTEMS_ENABLED`), SLO fuel-gauge + multi-burn-rate alerting (`SLO_BURN_RATE_ENABLED`), and Slack `/itstatus` slash command (`SLACK_SLASH_ENABLED`) all shipped with tests but default off. Flip each flag once its prerequisites are in place (public endpoint for Slack features; postmortems need only a writable `POSTMORTEMS_DIR`).
+- **v2 (production-ready) — SHIPPED.** Phases 0–6 of the production roadmap complete: bearer-token auth, vendor resilience (stamina + purgatory), alert quality (flap suppression, dedup, tier routing, dependency correlation, maintenance windows, flapping-badge UI), observability (structlog, Prometheus `/metrics`, Sentry, Healthchecks.io dead-man's switch), data lifecycle (production pragmas, retention, Litestream streaming + daily `VACUUM INTO` snapshot), UX productionization (severity-sorted grid, distinct poller-broken state, a11y + keyboard nav, Executive/Engineer view toggle, PWA, `recharts` SLA trend), and platform polish (CI, pre-commit, service supervision, reverse-proxy posture, OS-backed secret storage). **378 tests passing.**
+- **v2 Phase 2B + Phase 7 — in tree, gated off.** Statuspage inbound webhook receiver (`WEBHOOKS_ENABLED`), Slack ack flow (`SLACK_ACK_ENABLED`), postmortem drafts (`POSTMORTEMS_ENABLED`), SLO fuel-gauge + multi-burn-rate alerting (`SLO_BURN_RATE_ENABLED`), and Slack `/itstatus` slash command (`SLACK_SLASH_ENABLED`) all shipped with tests but default off. Flip each flag only after the deployment has the required signed callback reachability; postmortems need only a writable `POSTMORTEMS_DIR`.
 - **v2 Phase 7 remainder — optional.** LLM-layer impact statements; log-aggregation / ITSM / synthetic-monitoring integrations. Not on a fixed schedule; add as demand emerges.
 
 **Active roadmap:** [PRODUCTION-ROADMAP.md](./PRODUCTION-ROADMAP.md) — exit-criteria detail for every phase.
@@ -60,13 +60,11 @@ Open `http://localhost:8000` in your browser.
 
 ## Accessing the Dashboard
 
-The dashboard runs on a Mac Mini on the internal network. Access it at:
-
-```
-http://<host>:8000
-```
-
-No authentication required — internal-network access is the security boundary.
+For local development, open `http://localhost:8000` after starting the backend.
+For a private deployment, serve the read dashboard behind your organization's
+normal access controls and keep admin writes protected by bearer-token auth. The
+public repo intentionally describes the deployment shape, not a real host,
+machine, or network boundary.
 
 ## Service Categories
 
@@ -96,7 +94,7 @@ finance, sales, marketing, networking, support).
 For services without automated polling (e.g. an identity provider, an HR system, or any service with no public status API), update status via curl. **Admin endpoints require a bearer token** (set `ADMIN_API_TOKEN` in your env).
 
 ```bash
-export TOKEN="your-admin-token"
+export TOKEN="<demo-admin-token>"
 
 # Set a service to degraded
 curl -X POST http://localhost:8000/api/admin/status \
@@ -126,7 +124,7 @@ Valid statuses: `operational`, `degraded`, `partial_outage`, `major_outage`, `un
 | `SLACK_WEBHOOK_URL` | _(none)_ | Slack incoming webhook URL for ops-alert channel notifications |
 | `DATABASE_PATH` | `data.db` | SQLite database file path |
 | `POLL_INTERVAL_SECONDS` | `60` | How often to poll vendor status pages (1–3600) |
-| `HOST` | `127.0.0.1` | Server bind address (`0.0.0.0` for network access) |
+| `HOST` | `127.0.0.1` | Server bind address; override only for a controlled private deployment |
 | `PORT` | `8000` | Server port |
 | `LOG_LEVEL` | `INFO` | Logging level |
 | `ADMIN_API_TOKEN` | _(none)_ | Bearer token required for `/api/admin/*` endpoints. If unset, admin endpoints refuse all requests. |
@@ -190,85 +188,39 @@ cd frontend && npm run dev
 
 Frontend dev server at `localhost:5173` proxies `/api/*` to `localhost:8000`.
 
-## Production Deployment (Mac Mini)
-
-```bash
-# 1. Clone and set up (same as Quick Start steps 1-4)
-
-# 2. Configure environment
-cp .env.example backend/.env
-# Edit backend/.env: set HOST=0.0.0.0, SLACK_WEBHOOK_URL=<your-url>
-
-# 3. Update plist paths
-# Edit com.company.it-health-dashboard.plist:
-#   - Replace /path/to/ with actual project path
-#   - Add SLACK_WEBHOOK_URL
-
-# 4. Install launchd service
-sudo cp com.company.it-health-dashboard.plist /Library/LaunchDaemons/
-sudo launchctl bootstrap system /Library/LaunchDaemons/com.company.it-health-dashboard.plist
-
-# 5. Verify
-curl http://localhost:8000/api/health
-
-# 6. Open firewall (if needed)
-sudo /usr/libexec/ApplicationFirewall/socketfilterfw --add $(which python3)
-```
+## Private Deployment Notes
 
-Manage the service:
-```bash
-# Stop
-sudo launchctl bootout system/com.company.it-health-dashboard
+The production path is intentionally self-hosted and private-network oriented:
 
-# Start
-sudo launchctl bootstrap system /Library/LaunchDaemons/com.company.it-health-dashboard.plist
+- Run the FastAPI process under an OS service manager.
+- Put a reverse proxy in front for TLS and request headers.
+- Store tokens and webhook secrets in the host secret manager, not in git.
+- Keep read access behind the organization access controls.
+- Require bearer-token auth for every admin/write endpoint.
+- Monitor `/api/health`, `/healthz`, `/metrics`, and the heartbeat job.
 
-# View logs
-tail -f /var/log/it-health-dashboard.log
-```
+Exact host paths, service-manager commands, firewall posture, and log locations
+belong in a private runbook, not in the public README.
 
 ## Backup & Disaster Recovery (Litestream)
 
-SQLite is the primary store; [Litestream](https://litestream.io) streams WAL frames to an external replica (S3, SFTP, or a second disk) so the dashboard survives a Mac Mini failure.
+SQLite is the primary store; [Litestream](https://litestream.io) streams WAL
+frames to an external replica so the dashboard can recover from host failure.
 
 ### Setup
 
-```bash
-# 1. Install the binary
-brew install benbjohnson/litestream/litestream
-
-# 2. Customize the config template (pick one replica destination)
-cp deploy/litestream.yml.example /opt/it-health/deploy/litestream.yml
-$EDITOR /opt/it-health/deploy/litestream.yml
-
-# 3. Validate the config before loading it
-litestream validate -config /opt/it-health/deploy/litestream.yml
-
-# 4. Install the sidecar launchd daemon
-cp deploy/com.company.it-health-dashboard-litestream.plist.example \
-   /Library/LaunchDaemons/com.company.it-health-dashboard-litestream.plist
-sudo launchctl bootstrap system /Library/LaunchDaemons/com.company.it-health-dashboard-litestream.plist
+Use the checked-in config template as a starting point, keep the real replica
+destination out of git, validate the config before enabling the sidecar, and
+monitor snapshots as part of routine operations.
 
-# 5. Confirm replication is working
-litestream snapshots -config /opt/it-health/deploy/litestream.yml
-```
-
-Litestream RPO is ~1 second — after the initial snapshot, every WAL frame ships as it's written.
+Litestream RPO is ~1 second — after the initial snapshot, every WAL frame ships as it is written.
 
 ### Restore
 
-```bash
-# 1. Stop the main app so the DB isn't being written to
-sudo launchctl bootout system/com.company.it-health-dashboard
-
-# 2. Restore from replica (picks up the latest snapshot + WAL frames)
-litestream restore -config /opt/it-health/deploy/litestream.yml \
-                   -o /opt/it-health/data.db \
-                   /opt/it-health/data.db
-
-# 3. Start the app — it applies pending migrations on boot and resumes polling
-sudo launchctl bootstrap system /Library/LaunchDaemons/com.company.it-health-dashboard.plist
-```
+Restore procedure: stop writers, restore the latest snapshot plus WAL frames
+into the configured database location, restart the service, and let startup
+migrations run. Keep the exact command sequence in a private runbook because it
+depends on the host service manager, paths, and replica destination.
 
 ### Data retention
 
@@ -297,7 +249,7 @@ The retention job runs every `RETENTION_INTERVAL_HOURS` (default 168 = weekly) a
 | `/api/services/graph` | GET | Service dependency graph (nodes + links) for visualization |
 | `/api/services/slo` | GET | Per-service SLO snapshot: error-budget remaining + active burn-rate breaches |
 | `/api/admin/status` | POST | Manual status update (requires `Authorization: Bearer $ADMIN_API_TOKEN`) |
-| `/healthz` | GET | Dead-man's switch — 200 fresh / 503 stale. Hit by launchd + Healthchecks.io. |
+| `/healthz` | GET | Dead-man's switch — 200 fresh / 503 stale. Hit by the service supervisor + Healthchecks.io. |
 | `/metrics` | GET | Prometheus text exposition. |
 | `/api/webhooks/statuspage/{id}` | POST | Inbound Statuspage subscriber webhook, HMAC-verified. 404 unless `WEBHOOKS_ENABLED=true`. |
 | `/api/slack/interactivity` | POST | Slack block-actions receiver (ack button). 404 unless `SLACK_ACK_ENABLED=true`. |