Skip to content

uts: stop the base-station feed from silently dying#73

Closed
haoruizhou wants to merge 1 commit into
mainfrom
claude/uts-telemetry-audit-GxIZZ
Closed

uts: stop the base-station feed from silently dying#73
haoruizhou wants to merge 1 commit into
mainfrom
claude/uts-telemetry-audit-GxIZZ

Conversation

@haoruizhou

@haoruizhou haoruizhou commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Two fixes for the "telemetry stops after a few random minutes" failure on
the MacBook base station.

websocket_bridge.redis_listener: wrap the pub/sub loop in a reconnect loop
with health_check_interval. Previously a single Redis connection blip (idle
timeout, transient Docker-bridge hiccup, Redis restart) made the listener
coroutine return for good while the WebSocket server kept running — PECAN
stayed connected but never received another frame, with no error surfaced.
ws_relay already reconnects this way; redis_listener now matches.

main.py: the child-process monitor only logged "Process X died!" once per
second forever and never recovered. Because the parent stayed alive, neither
Docker's restart: unless-stopped nor systemd's Restart=always ever saw the
failure. Now a dead child tears down the surviving children and exits non-zero
so the supervisor restarts the whole stack cleanly.

Two fixes for the "telemetry stops after a few random minutes" failure on
the MacBook base station.

websocket_bridge.redis_listener: wrap the pub/sub loop in a reconnect loop
with health_check_interval. Previously a single Redis connection blip (idle
timeout, transient Docker-bridge hiccup, Redis restart) made the listener
coroutine return for good while the WebSocket server kept running — PECAN
stayed connected but never received another frame, with no error surfaced.
ws_relay already reconnects this way; redis_listener now matches.

main.py: the child-process monitor only logged "Process X died!" once per
second forever and never recovered. Because the parent stayed alive, neither
Docker's `restart: unless-stopped` nor systemd's `Restart=always` ever saw the
failure. Now a dead child tears down the surviving children and exits non-zero
so the supervisor restarts the whole stack cleanly.
@haoruizhou haoruizhou force-pushed the claude/uts-telemetry-audit-GxIZZ branch from 07176ad to 60fd029 Compare June 6, 2026 17:06
@haoruizhou haoruizhou requested a review from Copilot June 6, 2026 17:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@haoruizhou haoruizhou closed this Jun 7, 2026
@haoruizhou haoruizhou deleted the claude/uts-telemetry-audit-GxIZZ branch June 7, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants