Skip to content

fix(logging): never delete .log files held by live processes#145

Merged
yishuiliunian merged 1 commit into
mainfrom
worktree-shiny-gathering-scone
May 12, 2026
Merged

fix(logging): never delete .log files held by live processes#145
yishuiliunian merged 1 commit into
mainfrom
worktree-shiny-gathering-scone

Conversation

@yishuiliunian
Copy link
Copy Markdown
Contributor

Summary

  • cleanup_old_logs was unlinking still-open log files of long-lived hub / agent-server processes, stranding KB of logs in unreadable write-only fds and making classifier failures undebuggable.
  • Two-part fix: open new log fds as O_RDWR (so an unlinked inode can still be re-opened via /dev/fd/N), and teach cleanup to skip any .log whose filename PID is still alive.
  • Investigation context: hub-only sessions ran --decision=classifier; users reported repeated classifier failures with no extractable log evidence. Root cause was that each ephemeral sub-agent's startup re-ran cleanup and deleted the hub/server's older-mtime log.

Changes

  • src/log_writer.rs
    • RotatingFileWriter::new and rotation path: OpenOptions adds .read(true).
    • Refactor cleanup_old_logs to delegate to cleanup_with_alive_filter(impl Fn(u32) -> bool); production wiring passes bootstrap::is_alive.
    • New helper pid_from_log_filename parses both loopal-{ts}-{pid}.log and rotated *.N.log shapes.
    • 3 inline tests covering filename parsing, alive-PID protection under count overflow, and the new read-capable fd.
  • src/bootstrap/mod.rs
    • pub(crate) use discovery::is_alive so log_writer can reach it.

Test plan

  • bazel build //:loopal passes
  • bazel build //:loopal --config=clippy zero warnings
  • bazel build //:loopal --config=rustfmt clean
  • bazel test //:loopal-unit-test — 3 new tests pass alongside 40 existing
  • CI green

cleanup_old_logs sorted by mtime and trimmed oldest unconditionally.
In hub-only mode the long-lived hub and its root agent-server fall
idle between writes, so their .log mtime stays older than short-lived
sub-agents'. Each sub-agent startup re-runs cleanup and unlinks the
hub/server's still-open log, stranding KB of classifier failure logs
in unreadable write-only fds.

- OpenOptions opens with .read(true) so the inode can be re-opened
  via /dev/fd/N even after unlink (was O_WRONLY|O_APPEND, now O_RDWR).
- cleanup_with_alive_filter parses each filename's PID and skips
  deletion whenever kill(pid, 0) reports the writer is still alive.
@yishuiliunian yishuiliunian merged commit cbd55d3 into main May 12, 2026
4 checks passed
@yishuiliunian yishuiliunian deleted the worktree-shiny-gathering-scone branch May 12, 2026 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant