Add lan_sender.py: simulated LAN-based car for stack testing without hardware#70
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c331604884
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ports: | ||
| - "8080:8080/tcp" # Status HTTP server | ||
| - "9080:9080/tcp" # WebSocket bridge |
There was a problem hiding this comment.
Publish UDP port 5005 from the base test container
This compose path is paired with a host-side sender (python3 src/lan_sender.py 127.0.0.1 5005 in .github/workflows/telemetry-ci.yml), but the base service only publishes 8080/9080 and does not expose 5005/udp. On bridge networking, UDP sent to host 127.0.0.1:5005 will not reach daq-lan-base, so the base never receives packets and the Redis/WebSocket flow tests fail downstream.
Useful? React with 👍 / 👎.
| - REMOTE_IP=127.0.0.1 # Will be overridden by lan_sender — base listens on all interfaces | ||
| - UDP_PORT=5005 | ||
| - TCP_PORT=5006 | ||
| - REDIS_URL=redis://base-redis:***@timescaledb:5432/wfr |
There was a problem hiding this comment.
Point REDIS_URL to the Redis service endpoint
REDIS_URL is configured as redis://base-redis:***@timescaledb:5432/wfr, which targets the TimescaleDB host/port instead of Redis (base-redis:6379). Base-mode telemetry constructs its Redis client from this variable, so this misconfiguration prevents CAN/stats publication and breaks the intended lan-sender pipeline.
Useful? React with 👍 / 👎.
| - UDP_PORT=5005 | ||
| - TCP_PORT=5006 | ||
| - REDIS_URL=redis://base-redis:***@timescaledb:5432/wfr | ||
| - TIMESCALE_TABLE=wfr26 |
There was a problem hiding this comment.
Enable Timescale logging for the lan-sender test stack
The new test suite asserts that timescale:status reports increasing row counts, but this compose environment only sets TIMESCALE_TABLE and never enables Timescale logging (ENABLE_TIMESCALE_LOGGING=true) or provides a DSN override. Since main.py starts the Timescale bridge only when that flag is true, the bridge is skipped and the Timescale assertion fails in this workflow.
Useful? React with 👍 / 👎.
…hardware Introduces a self-contained UDP sender that emits properly-formatted CAN batches (matching data.py's udp_receiver protocol) so the base station pipeline can be exercised with no car or CAN hardware present. - src/lan_sender.py: 20 Hz UDP packet generator with real DBC CAN IDs (VCU_Status, Pedal_Sensors, BMS, MC_Feedback, Wheel_Speeds, IMU, Cooling_Status) and VCU_Timestamp (ID 1999) carrying epoch-ms so clock sync activates normally. Default target 127.0.0.1:5005. - lan_sender.sh: thin shell wrapper, no venv activation needed - deploy/docker-compose.lan-sender-test.yml: base-only compose stack (base + base-redis + timescaledb) for testing without the car container - tests/test_lan_sender.py: integration tests covering UDP receipt, Redis can_messages + system_stats publication, WebSocket broadcast, 8080 status page, and TimescaleDB writes - .github/workflows/telemetry-ci.yml: new lan-sender CI job (parallel with integration/vcan), added to publish-telemetry gate Signed-off-by: Haorui Zhou <haorui2002@gmail.com>
get_sync_client() now retries up to 10 times with exponential backoff (1s, 1.5s, 2.25s, ...) instead of failing immediately. This handles the race where Redis is still starting when data.py boots. Also increased start_period on base and Redis healthchecks in docker-compose.lan-sender-test.yml to give TimescaleDB and Redis adequate warm-up time before data.py connects. Signed-off-by: Haorui Zhou <haorui2002@gmail.com>
data.py retries Redis connection 5 times with backoff (~11s total), but in CI Redis takes ~70s to initialise its aoffs database before accepting connections. With an 80s sleep after docker compose up, Redis is ready by the time data.py starts connecting. Signed-off-by: Haorui Zhou <haorui2002@gmail.com>
REDIS_URL was set to redis://base-redis:***@timescaledb:5432/wfr which points at the TimescaleDB PostgreSQL port instead of the Redis port. Redis has no password, so the URL is simply redis://base-redis:6379/0. Signed-off-by: Haorui Zhou <haorui2002@gmail.com>
lan_sender.py was running on the GHA host and sending UDP to the container's port 5005. Docker's userland UDP proxy on the host silently drops packets when forwarding to containers, so data.py never received any packets despite Redis being correctly connected. Fix: run 'docker exec -d daq-lan-base python3 /app/src/lan_sender.py' so lan_sender shares the container's network namespace and reaches data.py's UDP socket directly. Also add lan_sender.sh to the Docker image so it is available inside the container at /app/lan_sender.sh. Signed-off-by: Haorui Zhou <haorui2002@gmail.com>
TimescaleBridge publishes timescale:status every 10 seconds. The previous 3-second fixed sleep was insufficient. Now polls once per second for up to 15 seconds. Signed-off-by: Haorui Zhou <haorui2002@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b1685eb9f5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def test_timescaledb_writing(self, redis_helper): | ||
| import os | ||
| if os.getenv("ENABLE_TIMESCALE_LOGGING", "").lower() != "true": | ||
| pytest.skip("ENABLE_TIMESCALE_LOGGING not set — TimescaleBridge disabled") |
There was a problem hiding this comment.
Add an assertion path for Timescale write verification
When ENABLE_TIMESCALE_LOGGING=true, this test currently performs no checks and returns without any assertions, so it passes even if Timescale ingestion is broken. That means CI cannot detect regressions in the Timescale path despite the test name and docstring claiming write verification.
Useful? React with 👍 / 👎.
TimescaleBridge requires ENABLE_TIMESCALE_LOGGING=true, which is not set in docker-compose.lan-sender-test.yml. The test was always going to fail in this configuration. Skip it with a clear message indicating what flag to set if the user wants to test TimescaleDB. Signed-off-by: Haorui Zhou <haorui2002@gmail.com>
…tion (#71) * Pulse health mosaic on advance Update status page to visually pulse the newest column when the health mosaic advances and add a unit test for the behavior. * Add setting to toggle plot axis range source DBC and Dynamic y axis selection * Attempt to fix packet indicator; stream overlay update * Accurate speed calc, rpm alias, and loss recovery Overlay speed calculation updated to derive vehicle speed from wheel RPM using gear ratio and wheel radius (replaces naive rpm/500). Telemetry-store rpm/motor_speed signal aliases switched to INV_Fast_Motor_Speed. universal-telemetry-software: add missing_seq_seconds to track when gaps occurred, clear on sequence reset, cap removal of oldest missing entries, record recovery timestamps and use them to mark per-second status correctly, and use a robust TCP read loop that accumulates all chunks. * Add healthy field to /api/health response for Upptime compatibility * Add /api/timescale-health endpoint exposing TimescaleDB data lag for newest season table * Updated docs to reflect the car-side move to systemd * Consolidate WebSocket protocol and add runtime notes * Add agent orientation Update docs * Update docs * Add relay worker and update flight-recorder docs Add a Cloudflare Wrangler/Durable-Object relay backend under flight-recorder/relay-worker (package.json, src/index.ts, tsconfig.json, wrangler.toml) and a README describing deployment and usage as an optional public live-relay for remote viewers. Update flight-recorder documentation and top-level README to mark the app as internal (protected by Cloudflare Zero Trust), clarify store-and-forward behavior, relationship to the old lte-relay approach, optional live-relay semantics (public/tokenless worker, 1 Hz synthetic heartbeat, demo-source behavior), and DBC access/build notes. Also add a LiveRelayService and update WebSocketService and App.tsx to expose guarded controls for WS Relay vs DB forward so live relay can be used independently of database sync. * Add vitest script and LiveRelayService tests * Mark lap-detector as tabled until GPS hardware * Route heartbeat frames and clean session URL TelemetryHandler: treat both UTS (1999) and flight-relay (0x7FD) CAN IDs as diagnostic heartbeats and ingest them under DIAG_MSG_IDS.HEARTBEAT. RadioStatChips now uses the DIAG_MSG_IDS constant instead of a hardcoded '1999' string. Added TelemetryHandler tests to assert UTS and relay heartbeat frames are routed to the heartbeat diagnostic message. LiveRelayService: strip search and hash when building the /session URL to avoid stale query/hash data; updated its test to stub global fetch and ensure sessions can be created from older ingest URLs. Misc: test cleanup unstubs globals after each run. * Move rag_viz.py and rag_vectors.png to server/installer/sandbox * add file-uploader, lap-detector, slackbot to package cleanup matrix * offline bundle: fix image refs from daq-radio to data-acquisition namespace * Add /health endpoint and system health UI Introduce a unified system health endpoint and UI backed by an atomic health snapshot. - README and MACBOOK_DEPLOY: document the new /health endpoint and troubleshooting steps. - src/data.py: add HEALTH_FILE env var; compute detailed health snapshot (components, timescale, redis, UDP listener, car state, version, clock, stats); perform Redis ping and timescale parsing; record uptime and UDP listener bound state; write snapshot atomically to HEALTH_FILE. - src/status_server.py: add /health handler, helpers to load, finalize and synthesize unknown/stale health payloads, and mark stale snapshots when producer_ts is old; introduce HEALTH_STALE_SECONDS config. - status/index.html: replace pipeline WebSocket-driven status with polling /health, add system health UI (CSS/JS) and rendering logic, update metrics/status text and polling cadence. - tests/test_status_server.py: add tests for loading valid, missing, malformed and stale health snapshots and retain 404 test for unknown endpoints. This change enables a simple JSON health probe for monitoring and makes the status page reflect aggregated infrastructure state even when the car is powered off. * Update package references for data-acquisition repo * Use .env.macbook and opt-in compose profiles Switch MacBook deployment to use the committed deploy/.env.macbook via --env-file and make TimescaleDB, media, and cloudflared optional Docker Compose profiles. Add ENABLE_TIMESCALE_LOGGING (default false) to the env file and update docker-compose.macbook-base.yml to honor the env file and include profiles for timescale/media/tunnel. Update docs (top-level README, universal-telemetry-software/README.md, MACBOOK_DEPLOY.md, WHICH_ONE.md, and offline README) to describe the new minimal default stack, show example profile commands, and note which images are optional for offline saves. * Add auto TimescaleDB logging detection Introduce an "auto" mode for ENABLE_TIMESCALE_LOGGING so the service will probe the configured POSTGRES_DSN and enable the Timescale writer only when the DB is reachable. * Add lan_sender.py: simulated LAN-based car for stack testing without hardware (#70) * Optimize DataStore versioning and reduce allocations. Fix increment _version before notifyAll so subscribers see correct version. (#68) * Add one-command macOS installer for base station * Show internal vs external WebSocket client counts Track WebSocket bridge client counts through Redis as total, internal, and external clients. Treat loopback connections such as the local relay upstream as internal so the status page does not count them as dashboard viewers. Refresh the count periodically while the bridge is running, preventing the health snapshot from falling back to "Client count unavailable" after the Redis key expires. Update the status UI to show external/internal counts and add focused regression coverage. * Add MacBook LAN-sender CI job and tests Add a new GitHub Actions job (macbook-lan-sender) to smoke-test the MacBook one-click stack using deploy/docker-compose.macbook-base.yml. The job loads built Docker images, pulls base images, sets up Python, brings up the MacBook stack, starts lan_sender inside the telemetry container, runs pytest against a new test file, and collects/logs artifacts on failure. Also add tests/universal-telemetry-software/tests/test_macbook_base_lan_sender.py which exercise container health, status page, PECAN, telemetry logs, WebSocket CAN messages, and health metrics. Update CI compose validation to include the macbook compose file and add the new job as a dependency for publish-telemetry. * Update telemetry-ci.yml * Make installer cross-platform (Linux/macOS) Add full Linux support and an interactive --hotspot mode to the one-line base-station installer; keep macOS support. The installer now bootstraps git and Docker on Debian/Ubuntu/Raspberry Pi OS, can enable a NetworkManager-based Wi‑Fi AP (wfr-hotspot) and installs a systemd unit for it. CI workflow updated to validate compose from the repo root and run a linux e2e job that syntax-checks the script, validates compose, runs the installer, asserts core containers, checks service ports, verifies idempotency, and ensures non-interactive hotspot behaviour. Documentation (README and MACBOOK_DEPLOY.md) and WHICH_ONE.md updated with Linux/RPi instructions and hotspot notes. Fix .env macbook DBC_HOST_PATH to be repo-root relative and add wfr-hotspot.service file. * Slackbot missing def, ci * Increase batch size and speed up row counting * Add daily DAQ stats report to Lappy Adds !stats command and a 9 AM ET daily auto-post showing rows logged, CAN message count, and testing duration for yesterday and the past 7 days. Pulls from TimescaleDB wfr26 and monitoring tables directly. Also adds psycopg2-binary to slackbot requirements. * Add light/dark theme and toggle to data-downloadaer * Revert "Increase batch size and speed up row counting" This reverts commit 4621ab3. * Align sandbox prompt-guide.txt.example with the active guide and main-branch slicks API - Use generic placeholders (<season_table>, <signal_1>, etc.) instead of team-specific table and signal names so the example is shareable. - Drop the 'monitoring table holds infra metrics' note (team-internal). - Drop the verified Grafana query patterns section (team-internal sensor lists and table names). - Update env-var references: TIMESCALE_TABLE (with TIMESCALE_SEASON fallback) replaces POSTGRES_TABLE; remove INFLUX_* references. - Update slicks examples to match the merged main API: * discover_sensors() now requires start_time and end_time * connect_timescaledb() now has (dsn, schema, table) signature * env-var auto-connect note uses POSTGRES_DSN + TIMESCALE_TABLE - Same overall structure as the active (gitignored) prompt-guide.txt, but generic. * Wire slicks into sandbox via uv and bump default model to MiniMax-M3 Sandbox slicks wiring (matches the slicks v0.3.0 TimescaleDB backend on the timescaledb-migration branch that was merged to main): - server/installer/sandbox/Dockerfile.sandbox: install uv, then use it for both requirements-docker.txt and the editable slicks install from /slicks_src (a docker compose additional_context pointing at the sibling slicks checkout). uv keeps the image layer smaller than pip and the install deterministic. Added a build-time smoke test that imports slicks to catch future regressions early. - server/installer/docker-compose.yml: * sandbox service: pass slicks source as a build-time additional_context and bind-mount the same path at runtime so live source edits are picked up on the next container recreate without an image rebuild (controlled by SLICKS_HOST_PATH env var, default points at /home/ubuntu/projects/slicks). * sandbox service: export POSTGRES_DSN, TIMESCALE_TABLE, TIMESCALE_SEASON, and POSTGRES_TABLE so slicks auto-connects on import. The published PyPI slicks (0.2.3) is the InfluxDB backend and would not be importable as TimescaleDB-aware code; install from local source instead. - server/installer/sandbox/requirements-docker.txt: drop the 'slicks>=0.2.0' PyPI pin. Slicks is now installed editable from the local source above; the pin was pulling the InfluxDB-only release. - server/installer/sandbox/requirements.txt: add psycopg2-binary (slicks depends on it; listing explicitly so the layer order is stable across slicks extras changes). Code-generator Dockerfile fix: - server/installer/sandbox/Dockerfile: the previous version only COPY'd prompt-guide.txt.example, so the active (gitignored) prompt-guide.txt was silently being ignored at every build. Switched to COPY prompt-guide.txt* ./ (with the existing fallback to copy the example when the active file is missing). This restores the team's customization workflow where the active prompt-guide lives in the build context and gets shipped to the container. Model bump: - server/installer/.env.example: ANTHROPIC_MODEL default MiniMax-M2.7 -> MiniMax-M3. - server/installer/sandbox/code_generator.py: same bump in the os.getenv default so a missing env var falls back to M3. Tested end-to-end: code-generator /api/generate-code returns result.status=success with a PNG, slicks.fetch_telemetry() returns 407k rows from a real TimescaleDB window, and the slackbot -> code-gen -> sandbox chain still works. * root: pin npm overrides to address quick-win Dependabot alerts Adds an npm 'overrides' block to the root package.json to force patched versions of five transitive dependencies that had Dependabot alerts: lodash -> ^4.18.0 (CVE-2026-4800, CVE-2026-2950) qs -> ^6.15.2 (CVE-2026-8723) picomatch -> ^2.3.2 (CVE-2026-32233) fast-uri -> ^3.1.2 (CVE-2026-6321, CVE-2026-6322) serialize-javascript -> ^7.0.5 (CVE-2026-34043) Regenerated package-lock.json with 'npm install --package-lock-only' (no node_modules write, no network risk beyond what 'install' needs to resolve). Verified the lock now pins: picomatch 2.3.2 (remaining 4 packages aren't pulled into this lock). * pecan: bump react-router to ^7.15.0 and add npm overrides for quick-win CVEs react-router was a direct dep pinned at ^7.9.3; multiple CVEs (DoS via path expansion, RCE via turbo-stream, XSS via redirects) require >= 7.15.0. Bumping the direct dep range is cleaner than an override (npm errors out on an override that conflicts with a direct dep). Also adds an npm 'overrides' block to force patched transitive deps: lodash -> ^4.18.0 qs -> ^6.15.2 picomatch -> ^2.3.2 fast-uri -> ^3.1.2 serialize-javascript -> ^7.0.5 Regenerated package-lock.json. Verified the lock now pins: lodash 4.18.1 qs 6.15.2 picomatch 2.3.2 fast-uri 3.1.2 serialize-javascript 7.0.5 react-router 7.16.0 * flight-recorder: bump react-router-dom to ^7.15.0 and add npm overrides react-router-dom was a direct dep pinned at ^7.9.3; same CVE class as pecan's react-router (DoS, RCE, XSS via redirects) requires >= 7.15.0. Bumped the direct dep range. Also adds an npm 'overrides' block to force patched transitive deps: lodash -> ^4.18.0 qs -> ^6.15.2 picomatch -> ^2.3.2 fast-uri -> ^3.1.2 serialize-javascript -> ^7.0.5 Regenerated package-lock.json. Verified the lock now pins: lodash 4.18.1 picomatch 2.3.2 fast-uri 3.1.2 serialize-javascript 7.0.5 react-router-dom 7.16.0 * grafana-bridge: add npm overrides for quick-win Dependabot alerts grafana-bridge is a minimal Express app (single dep) but still pulls in the same vulnerable transitive packages. Adds an 'overrides' block to force patched versions: lodash -> ^4.18.0 qs -> ^6.15.2 picomatch -> ^2.3.2 fast-uri -> ^3.1.2 serialize-javascript -> ^7.0.5 Regenerated package-lock.json. Verified qs pinned at 6.15.2. * charging-dashboard: self-updating Slack TUI from Pecan accumulator Adds a charging dashboard that posts one self-updating Slack message (post once, chat.update every 5s, like the file-uploader) while charging the accumulator via the Kvaser bridge on the internal (pecan-dev) build. Bot (server/installer/slackbot): - charge_dashboard.py: TUI renderer (SoC bar, pack/cell/temp/alert lines, per-module sparkline + stats), per-session manager, stdlib HTTP receiver (POST /charging/state behind CF Zero Trust + shared-token auth), heartbeat staleness. - soc_model.py: data-derived SOC/ETA from the wfr26 audit. PackCurrent/BMS SOC are dead sentinels and PackVoltage is absent, so SOC is an OCV->SOC lookup on the limiting cell and ETA integrates a CV taper; phase is from the voltage trend. - Wired into slack_bot.py __main__; compose port/env; Dockerfile. - unittests: 17 (renderer, formatters, session updates, HTTP auth, model). Pecan: - chargingSnapshot.ts: pure, mockable snapshot builder mirroring the accumulator page; sentinel-guards the dead signals; derives pack voltage from the series-cell sum. - useChargingDashboard.ts: 5s poster, gated on VITE_INTERNAL + live Kvaser source (:9081). Never broadcasts the demo relay or replay data. - vitest: 15 (aggregation, alert chips, dead-signal guards, broadcast gate). Co-authored-by: Haorui Zhou <haorui2002@gmail.com> * Address PR review: fix slackbot CI smoke-test + harden charge receiver - slackbot-ci: stub SocketModeRequest/Response on the slack_sdk stub modules (import was failing with 'cannot import name SocketModeRequest') and provision /app/logs so the module-level mkdir doesn't fail on the runner. - slack_bot: require CHARGE_RELAY_TOKEN before starting the charge-dashboard receiver; the port is host-published and must not accept unauthenticated POSTs. - docker-compose.lan-sender-test: set POSTGRES_DSN to the timescaledb service (default DSN points at localhost, which is the base container here). - websocket_bridge: aclose the Redis client via try/finally so a failing set doesn't leak the connection. - redis_utils: correct get_sync_client docstring (default retries=5, ~8s). --------- Co-authored-by: Haorui Zhou <haorui2002@gmail.com>
Introduces a self-contained UDP sender that emits properly-formatted CAN batches (matching data.py's udp_receiver protocol) so the base station pipeline can be exercised with no car or CAN hardware present.
What changed
CI job
The new
lan-senderjob runs the base-only stack and lan_sender.py directly on the GHA runner, providing a fast smoke test that doesn't require the full car simulation container.