uts: stop the base-station feed from silently dying#73
Closed
haoruizhou wants to merge 1 commit into
Closed
Conversation
Two fixes for the "telemetry stops after a few random minutes" failure on the MacBook base station. websocket_bridge.redis_listener: wrap the pub/sub loop in a reconnect loop with health_check_interval. Previously a single Redis connection blip (idle timeout, transient Docker-bridge hiccup, Redis restart) made the listener coroutine return for good while the WebSocket server kept running — PECAN stayed connected but never received another frame, with no error surfaced. ws_relay already reconnects this way; redis_listener now matches. main.py: the child-process monitor only logged "Process X died!" once per second forever and never recovered. Because the parent stayed alive, neither Docker's `restart: unless-stopped` nor systemd's `Restart=always` ever saw the failure. Now a dead child tears down the surviving children and exits non-zero so the supervisor restarts the whole stack cleanly.
07176ad to
60fd029
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two fixes for the "telemetry stops after a few random minutes" failure on
the MacBook base station.
websocket_bridge.redis_listener: wrap the pub/sub loop in a reconnect loop
with health_check_interval. Previously a single Redis connection blip (idle
timeout, transient Docker-bridge hiccup, Redis restart) made the listener
coroutine return for good while the WebSocket server kept running — PECAN
stayed connected but never received another frame, with no error surfaced.
ws_relay already reconnects this way; redis_listener now matches.
main.py: the child-process monitor only logged "Process X died!" once per
second forever and never recovered. Because the parent stayed alive, neither
Docker's
restart: unless-stoppednor systemd'sRestart=alwaysever saw thefailure. Now a dead child tears down the surviving children and exits non-zero
so the supervisor restarts the whole stack cleanly.