Skip to content

Optimize DataStore versioning and reduce allocations#68

Merged
haoruizhou merged 1 commit into
mainfrom
pecan-fixes
May 21, 2026
Merged

Optimize DataStore versioning and reduce allocations#68
haoruizhou merged 1 commit into
mainfrom
pecan-fixes

Conversation

@haoruizhou

Copy link
Copy Markdown
Contributor

Add a monotonic _version counter to DataStore (getVersion) and increment it on all mutating operations so subscribers can skip redundant work. Reduce allocations when rounding sensor readings by only cloning signal objects if rounding changes the value (preserving identity for common cases). Update useAllLatestMessages to snapshot the version and avoid unnecessary Map rebuilds, and batch cold-store state into a single object to cut re-renders. Simplify Trace memo usage that was a no-op. Add tests for version tick and ingestMessage data handling (rounding, immutability, defaults). Files changed: DataStore.ts, useDataStore.ts, Trace.tsx, and DataStore.test.ts.

@haoruizhou

Copy link
Copy Markdown
Contributor Author

@codex

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 07bd4107cc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pecan/src/lib/DataStore.ts Outdated
Comment on lines +617 to +618
this.notifyAll(msgID);
this._version++;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Move the version bump before notifying subscribers

When a live frame is ingested after useAllLatestMessages has subscribed, the subscriber runs during notifyAll while getVersion() still returns the old value, so the hook's currentVersion !== lastVersion guard skips rebuilding the latest-message Map. A dashboard that is mounted before the first/only CAN frame will not show it until a later write, and live data is always one notification behind; the same ordering appears in batch/clear paths, so the version needs to be incremented before notifying.

Useful? React with 👍 / 👎.

Comment on lines +572 to +574
roundedData[key] = rounded === signal.sensorReading
? signal
: { ...signal, sensorReading: rounded };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Copy signals even when rounding is a no-op

For any signal whose reading is already rounded, the stored sample now keeps the caller's signal object by reference. If the decoded message object is reused or mutated after ingestMessage, historical data in DataStore changes as well; the previous implementation always spread each signal and the new immutability test only covers the changed-rounding case, so no-op values still alias the caller.

Useful? React with 👍 / 👎.

@haoruizhou haoruizhou requested a review from Ellaharding1 May 13, 2026 17:06
@haoruizhou

Copy link
Copy Markdown
Contributor Author

@Ellaharding1 can you review these optimizations please, thank you

@Ellaharding1

Copy link
Copy Markdown
Contributor

@haoruizhou just a thought on this since _version is incremented after notifyAll, I wonder if the version check in the subscriber callback might always see the old value and skip the update? Could potentially mean live data doesn't come through. Might be worth swapping those two lines just to be safe

…version before notifyAll so subscribers see correct version.
@haoruizhou haoruizhou merged commit 3d29966 into main May 21, 2026
23 checks passed
@haoruizhou haoruizhou deleted the pecan-fixes branch May 21, 2026 14:17
haoruizhou added a commit that referenced this pull request May 21, 2026
…version before notifyAll so subscribers see correct version. (#68)
haoruizhou added a commit that referenced this pull request Jun 4, 2026
…tion (#71)

* Pulse health mosaic on advance

Update status page to visually pulse the newest column when the health mosaic advances and add a unit test for the behavior.

* Add setting to toggle plot axis range source

DBC and Dynamic y axis selection

* Attempt to fix packet indicator; stream overlay update

* Accurate speed calc, rpm alias, and loss recovery

Overlay speed calculation updated to derive vehicle speed from wheel RPM using gear ratio and wheel radius (replaces naive rpm/500). Telemetry-store rpm/motor_speed signal aliases switched to INV_Fast_Motor_Speed. universal-telemetry-software: add missing_seq_seconds to track when gaps occurred, clear on sequence reset, cap removal of oldest missing entries, record recovery timestamps and use them to mark per-second status correctly, and use a robust TCP read loop that accumulates all chunks.

* Add healthy field to /api/health response for Upptime compatibility

* Add /api/timescale-health endpoint exposing TimescaleDB data lag for newest season table

* Updated docs to reflect the car-side move to systemd

* Consolidate WebSocket protocol and add runtime notes

* Add agent orientation

Update docs

* Update docs

* Add relay worker and update flight-recorder docs

Add a Cloudflare Wrangler/Durable-Object relay backend under flight-recorder/relay-worker (package.json, src/index.ts, tsconfig.json, wrangler.toml) and a README describing deployment and usage as an optional public live-relay for remote viewers. Update flight-recorder documentation and top-level README to mark the app as internal (protected by Cloudflare Zero Trust), clarify store-and-forward behavior, relationship to the old lte-relay approach, optional live-relay semantics (public/tokenless worker, 1 Hz synthetic heartbeat, demo-source behavior), and DBC access/build notes. Also add a LiveRelayService and update WebSocketService and App.tsx to expose guarded controls for WS Relay vs DB forward so live relay can be used independently of database sync.

* Add vitest script and LiveRelayService tests

* Mark lap-detector as tabled until GPS hardware

* Route heartbeat frames and clean session URL

TelemetryHandler: treat both UTS (1999) and flight-relay (0x7FD) CAN IDs as diagnostic heartbeats and ingest them under DIAG_MSG_IDS.HEARTBEAT. RadioStatChips now uses the DIAG_MSG_IDS constant instead of a hardcoded '1999' string. Added TelemetryHandler tests to assert UTS and relay heartbeat frames are routed to the heartbeat diagnostic message. LiveRelayService: strip search and hash when building the /session URL to avoid stale query/hash data; updated its test to stub global fetch and ensure sessions can be created from older ingest URLs. Misc: test cleanup unstubs globals after each run.

* Move rag_viz.py and rag_vectors.png to server/installer/sandbox

* add file-uploader, lap-detector, slackbot to package cleanup matrix

* offline bundle: fix image refs from daq-radio to data-acquisition namespace

* Add /health endpoint and system health UI

Introduce a unified system health endpoint and UI backed by an atomic health snapshot.

- README and MACBOOK_DEPLOY: document the new /health endpoint and troubleshooting steps.
- src/data.py: add HEALTH_FILE env var; compute detailed health snapshot (components, timescale, redis, UDP listener, car state, version, clock, stats); perform Redis ping and timescale parsing; record uptime and UDP listener bound state; write snapshot atomically to HEALTH_FILE.
- src/status_server.py: add /health handler, helpers to load, finalize and synthesize unknown/stale health payloads, and mark stale snapshots when producer_ts is old; introduce HEALTH_STALE_SECONDS config.
- status/index.html: replace pipeline WebSocket-driven status with polling /health, add system health UI (CSS/JS) and rendering logic, update metrics/status text and polling cadence.
- tests/test_status_server.py: add tests for loading valid, missing, malformed and stale health snapshots and retain 404 test for unknown endpoints.

This change enables a simple JSON health probe for monitoring and makes the status page reflect aggregated infrastructure state even when the car is powered off.

* Update package references for data-acquisition repo

* Use .env.macbook and opt-in compose profiles

Switch MacBook deployment to use the committed deploy/.env.macbook via --env-file and make TimescaleDB, media, and cloudflared optional Docker Compose profiles. Add ENABLE_TIMESCALE_LOGGING (default false) to the env file and update docker-compose.macbook-base.yml to honor the env file and include profiles for timescale/media/tunnel. Update docs (top-level README, universal-telemetry-software/README.md, MACBOOK_DEPLOY.md, WHICH_ONE.md, and offline README) to describe the new minimal default stack, show example profile commands, and note which images are optional for offline saves.

* Add auto TimescaleDB logging detection

Introduce an "auto" mode for ENABLE_TIMESCALE_LOGGING so the service will probe the configured POSTGRES_DSN and enable the Timescale writer only when the DB is reachable.

* Add lan_sender.py: simulated LAN-based car for stack testing without hardware (#70)

* Optimize DataStore versioning and reduce allocations. Fix increment _version before notifyAll so subscribers see correct version. (#68)

* Add one-command macOS installer for base station

* Show internal vs external WebSocket client counts

Track WebSocket bridge client counts through Redis as total, internal, and external clients. Treat loopback connections such as the local relay upstream as internal so the status page does not count them as dashboard viewers.

Refresh the count periodically while the bridge is running, preventing the health snapshot from falling back to "Client count unavailable" after the Redis key expires. Update the status UI to show external/internal counts and add focused regression coverage.

* Add MacBook LAN-sender CI job and tests

Add a new GitHub Actions job (macbook-lan-sender) to smoke-test the MacBook one-click stack using deploy/docker-compose.macbook-base.yml. The job loads built Docker images, pulls base images, sets up Python, brings up the MacBook stack, starts lan_sender inside the telemetry container, runs pytest against a new test file, and collects/logs artifacts on failure. Also add tests/universal-telemetry-software/tests/test_macbook_base_lan_sender.py which exercise container health, status page, PECAN, telemetry logs, WebSocket CAN messages, and health metrics. Update CI compose validation to include the macbook compose file and add the new job as a dependency for publish-telemetry.

* Update telemetry-ci.yml

* Make installer cross-platform (Linux/macOS)

Add full Linux support and an interactive --hotspot mode to the one-line base-station installer; keep macOS support. The installer now bootstraps git and Docker on Debian/Ubuntu/Raspberry Pi OS, can enable a NetworkManager-based Wi‑Fi AP (wfr-hotspot) and installs a systemd unit for it. CI workflow updated to validate compose from the repo root and run a linux e2e job that syntax-checks the script, validates compose, runs the installer, asserts core containers, checks service ports, verifies idempotency, and ensures non-interactive hotspot behaviour. Documentation (README and MACBOOK_DEPLOY.md) and WHICH_ONE.md updated with Linux/RPi instructions and hotspot notes. Fix .env macbook DBC_HOST_PATH to be repo-root relative and add wfr-hotspot.service file.

* Slackbot missing def, ci

* Increase batch size and speed up row counting

* Add daily DAQ stats report to Lappy

Adds !stats command and a 9 AM ET daily auto-post showing rows logged,
CAN message count, and testing duration for yesterday and the past 7
days. Pulls from TimescaleDB wfr26 and monitoring tables directly.
Also adds psycopg2-binary to slackbot requirements.

* Add light/dark theme and toggle to data-downloadaer

* Revert "Increase batch size and speed up row counting"

This reverts commit 4621ab3.

* Align sandbox prompt-guide.txt.example with the active guide and main-branch slicks API

- Use generic placeholders (<season_table>, <signal_1>, etc.) instead of
  team-specific table and signal names so the example is shareable.
- Drop the 'monitoring table holds infra metrics' note (team-internal).
- Drop the verified Grafana query patterns section (team-internal sensor
  lists and table names).
- Update env-var references: TIMESCALE_TABLE (with TIMESCALE_SEASON
  fallback) replaces POSTGRES_TABLE; remove INFLUX_* references.
- Update slicks examples to match the merged main API:
    * discover_sensors() now requires start_time and end_time
    * connect_timescaledb() now has (dsn, schema, table) signature
    * env-var auto-connect note uses POSTGRES_DSN + TIMESCALE_TABLE
- Same overall structure as the active (gitignored) prompt-guide.txt,
  but generic.

* Wire slicks into sandbox via uv and bump default model to MiniMax-M3

Sandbox slicks wiring (matches the slicks v0.3.0 TimescaleDB backend on
the timescaledb-migration branch that was merged to main):

- server/installer/sandbox/Dockerfile.sandbox: install uv, then use it
  for both requirements-docker.txt and the editable slicks install from
  /slicks_src (a docker compose additional_context pointing at the
  sibling slicks checkout). uv keeps the image layer smaller than pip
  and the install deterministic. Added a build-time smoke test that
  imports slicks to catch future regressions early.

- server/installer/docker-compose.yml:
  * sandbox service: pass slicks source as a build-time additional_context
    and bind-mount the same path at runtime so live source edits are
    picked up on the next container recreate without an image rebuild
    (controlled by SLICKS_HOST_PATH env var, default points at
    /home/ubuntu/projects/slicks).
  * sandbox service: export POSTGRES_DSN, TIMESCALE_TABLE,
    TIMESCALE_SEASON, and POSTGRES_TABLE so slicks auto-connects on
    import. The published PyPI slicks (0.2.3) is the InfluxDB backend
    and would not be importable as TimescaleDB-aware code; install from
    local source instead.

- server/installer/sandbox/requirements-docker.txt: drop the
  'slicks>=0.2.0' PyPI pin. Slicks is now installed editable from the
  local source above; the pin was pulling the InfluxDB-only release.

- server/installer/sandbox/requirements.txt: add psycopg2-binary
  (slicks depends on it; listing explicitly so the layer order is
  stable across slicks extras changes).

Code-generator Dockerfile fix:

- server/installer/sandbox/Dockerfile: the previous version only COPY'd
  prompt-guide.txt.example, so the active (gitignored) prompt-guide.txt
  was silently being ignored at every build. Switched to COPY
  prompt-guide.txt* ./ (with the existing fallback to copy the example
  when the active file is missing). This restores the team's
  customization workflow where the active prompt-guide lives in the
  build context and gets shipped to the container.

Model bump:

- server/installer/.env.example: ANTHROPIC_MODEL default MiniMax-M2.7
  -> MiniMax-M3.
- server/installer/sandbox/code_generator.py: same bump in the
  os.getenv default so a missing env var falls back to M3.

Tested end-to-end: code-generator /api/generate-code returns
result.status=success with a PNG, slicks.fetch_telemetry() returns
407k rows from a real TimescaleDB window, and the slackbot -> code-gen
-> sandbox chain still works.

* root: pin npm overrides to address quick-win Dependabot alerts

Adds an npm 'overrides' block to the root package.json to force
patched versions of five transitive dependencies that had
Dependabot alerts:

  lodash               -> ^4.18.0   (CVE-2026-4800, CVE-2026-2950)
  qs                   -> ^6.15.2   (CVE-2026-8723)
  picomatch            -> ^2.3.2    (CVE-2026-32233)
  fast-uri             -> ^3.1.2    (CVE-2026-6321, CVE-2026-6322)
  serialize-javascript -> ^7.0.5    (CVE-2026-34043)

Regenerated package-lock.json with 'npm install --package-lock-only'
(no node_modules write, no network risk beyond what 'install' needs
to resolve). Verified the lock now pins:
  picomatch 2.3.2
(remaining 4 packages aren't pulled into this lock).

* pecan: bump react-router to ^7.15.0 and add npm overrides for quick-win CVEs

react-router was a direct dep pinned at ^7.9.3; multiple CVEs (DoS via
path expansion, RCE via turbo-stream, XSS via redirects) require
>= 7.15.0. Bumping the direct dep range is cleaner than an override
(npm errors out on an override that conflicts with a direct dep).

Also adds an npm 'overrides' block to force patched transitive deps:

  lodash               -> ^4.18.0
  qs                   -> ^6.15.2
  picomatch            -> ^2.3.2
  fast-uri             -> ^3.1.2
  serialize-javascript -> ^7.0.5

Regenerated package-lock.json. Verified the lock now pins:

  lodash                4.18.1
  qs                    6.15.2
  picomatch             2.3.2
  fast-uri              3.1.2
  serialize-javascript  7.0.5
  react-router          7.16.0

* flight-recorder: bump react-router-dom to ^7.15.0 and add npm overrides

react-router-dom was a direct dep pinned at ^7.9.3; same CVE class as
pecan's react-router (DoS, RCE, XSS via redirects) requires >= 7.15.0.
Bumped the direct dep range.

Also adds an npm 'overrides' block to force patched transitive deps:

  lodash               -> ^4.18.0
  qs                   -> ^6.15.2
  picomatch            -> ^2.3.2
  fast-uri             -> ^3.1.2
  serialize-javascript -> ^7.0.5

Regenerated package-lock.json. Verified the lock now pins:

  lodash                4.18.1
  picomatch             2.3.2
  fast-uri              3.1.2
  serialize-javascript  7.0.5
  react-router-dom      7.16.0

* grafana-bridge: add npm overrides for quick-win Dependabot alerts

grafana-bridge is a minimal Express app (single dep) but still pulls
in the same vulnerable transitive packages. Adds an 'overrides'
block to force patched versions:

  lodash               -> ^4.18.0
  qs                   -> ^6.15.2
  picomatch            -> ^2.3.2
  fast-uri             -> ^3.1.2
  serialize-javascript -> ^7.0.5

Regenerated package-lock.json. Verified qs pinned at 6.15.2.

* charging-dashboard: self-updating Slack TUI from Pecan accumulator

Adds a charging dashboard that posts one self-updating Slack message
(post once, chat.update every 5s, like the file-uploader) while charging
the accumulator via the Kvaser bridge on the internal (pecan-dev) build.

Bot (server/installer/slackbot):
- charge_dashboard.py: TUI renderer (SoC bar, pack/cell/temp/alert lines,
  per-module sparkline + stats), per-session manager, stdlib HTTP receiver
  (POST /charging/state behind CF Zero Trust + shared-token auth), heartbeat
  staleness.
- soc_model.py: data-derived SOC/ETA from the wfr26 audit. PackCurrent/BMS
  SOC are dead sentinels and PackVoltage is absent, so SOC is an OCV->SOC
  lookup on the limiting cell and ETA integrates a CV taper; phase is from
  the voltage trend.
- Wired into slack_bot.py __main__; compose port/env; Dockerfile.
- unittests: 17 (renderer, formatters, session updates, HTTP auth, model).

Pecan:
- chargingSnapshot.ts: pure, mockable snapshot builder mirroring the
  accumulator page; sentinel-guards the dead signals; derives pack voltage
  from the series-cell sum.
- useChargingDashboard.ts: 5s poster, gated on VITE_INTERNAL + live Kvaser
  source (:9081). Never broadcasts the demo relay or replay data.
- vitest: 15 (aggregation, alert chips, dead-signal guards, broadcast gate).

Co-authored-by: Haorui Zhou <haorui2002@gmail.com>

* Address PR review: fix slackbot CI smoke-test + harden charge receiver

- slackbot-ci: stub SocketModeRequest/Response on the slack_sdk stub modules
  (import was failing with 'cannot import name SocketModeRequest') and
  provision /app/logs so the module-level mkdir doesn't fail on the runner.
- slack_bot: require CHARGE_RELAY_TOKEN before starting the charge-dashboard
  receiver; the port is host-published and must not accept unauthenticated POSTs.
- docker-compose.lan-sender-test: set POSTGRES_DSN to the timescaledb service
  (default DSN points at localhost, which is the base container here).
- websocket_bridge: aclose the Redis client via try/finally so a failing set
  doesn't leak the connection.
- redis_utils: correct get_sync_client docstring (default retries=5, ~8s).

---------

Co-authored-by: Haorui Zhou <haorui2002@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants