Skip to content

[upstream-take #7219] fix(shutdown): kill detached agent processes on server shutdown (#7218)#176

Merged
AnilChinchawaleXDC merged 27 commits into
masterfrom
chore/take-upstream-pr-7219-kill-detached-agents-on-shutdown
Jun 1, 2026
Merged

[upstream-take #7219] fix(shutdown): kill detached agent processes on server shutdown (#7218)#176
AnilChinchawaleXDC merged 27 commits into
masterfrom
chore/take-upstream-pr-7219-kill-detached-agents-on-shutdown

Conversation

@AnilChinchawaleXDC

Copy link
Copy Markdown
Collaborator

No description provided.

mrveiss and others added 27 commits May 28, 2026 12:58
Co-Authored-By: Paperclip <noreply@paperclip.ing>
When running in Paperclip, interactive permission prompts cannot be answered,
making it critical for all executions (including subagents) to have permissions
pre-approved via --dangerously-skip-permissions.

This fix forces dangerouslySkipPermissions=true when executing within a
Paperclip context (detected via context.paperclipWorkspace), ensuring that:

1. Subagents spawned by the Agent tool inherit Bash and other permissions
2. Main session permission settings propagate to all child executions
3. Non-interactive runs never get blocked on permission approval prompts

Fixes: MVA-392 (subagent Bash permission denied in /team-implement)

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…exhausted

Add a pre-flight quota check to `tickTimers` that calls `fetchAllQuotaWindows()`
before dispatching any timer-based agent wakeups. If any window reports ≥95% usage,
all timer wakes in that tick are skipped and a structured warn is logged with the
reset timestamp.

Fails open: a quota-check error is logged but does not block wakeups.
The `tickTimers` return value now includes `quotaBlocked` and `quotaResetAt` fields,
and `index.ts` logs a distinct `quota_blocked` warn when they are set.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Extend the final-disposition checklist in the Paperclip skill to explicitly
require removal of any git worktrees and temp clones created during the work
before the issue can be set to `done`. Adds concrete commands for both the
AutoBot worktree path and the Paperclip dev-clone path.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
…pt logging

Agents must now:
- Post a structured "Work artifact" comment whenever they open a PR,
  push commits, or create a branch
- Log every fix attempt outcome (success or failure) before exiting
  the heartbeat, including approach taken, error output, and next step

Also documents the periodic orphaned-work sweep the CEO should run:
- Unlinked open PRs (branch name matches issue-NNNN but no active Paperclip issue)
- Stale worktrees whose issue is done/cancelled
- Dead in_review issues whose linked PR was already merged or closed

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… meta-project

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Onboarding now reads README, docs, dependency manifests, env templates,
and CI config to build a structured inventory of what the project needs
to run, then creates setup tasks from any gaps found.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Onboarding now sweeps open issues (classifying critical vs backlog) and
open PRs (failing CI, stale approved, needs review) and creates Paperclip
tasks for each actionable item found.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…kickoff

Every discovery item (setup gaps, docs gaps, critical GH issues, PR
actions) becomes a child issue of the kickoff issue, grouped under
category parent issues. Kickoff auto-closes when all children are done.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Every created issue must have a label and correct projectId. A sweep
check runs before moving past Step 3. Labels also applied back to GH
where missing (e.g. unlabelled bugs get the bug label).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Every project gets a permanent [Docs] hub issue with 5 required documents:
prd, tech-stack, access-guide, architecture, runbooks. Populated from
repo discovery; unknown sections stubbed as _Not yet documented_.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
User creates empty project with repo URL → PM detects no docs hub →
runs full onboarding: reads repo, extracts content into 5 hub documents,
structures all findings as sub-issue tree, labels everything.
Content is extracted from existing repo docs, not just stubbed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6-task plan: GH labels, dispatch AGENTS.md section, 30-min routine,
onboarding AGENTS.md section, dispatch smoke test, onboarding smoke test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
…rclipai#7218)

Export signalRunningProcess from adapter-utils and call it for all
runningProcesses entries in shutdownAppServices(). Previously, agent
subprocesses spawned with detached:true escaped the systemd cgroup and
survived server restarts, accumulating as orphans that consumed memory
and left routine execution issues stuck in_progress.
Write oom_score_adj=500 to /proc/<pid>/oom_score_adj immediately after
spawning each detached agent subprocess. This makes the kernel prefer
killing agent processes over the Paperclip server when memory is tight,
preventing OOM-kill of the server and embedded postgres mid-query.

Companion to systemd OOMScoreAdjust=-500 on the service side.
Adds the same Claude Code discipline used in AutoBot:
- .claude/hooks/: block-dangerous-commands.sh (28 tests pass), protect-files.sh,
  scan-secrets.sh — all adapted for paperclip's master branch + pnpm stack
- .claude/settings.json: wires hooks to PreToolUse/PostToolUse/Notification events;
  PostToolUse runs pnpm -r typecheck on .ts/.tsx edits
- .claude/skills/: commit, bugfix, implement, research, parallel — all rewritten
  for paperclip's branch strategy, AGENTS.md contracts, and pnpm commands

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… PID

When the server crashes/restarts before persistRunProcessMetadata writes
the child PID to the database, reapOrphanedRuns() cannot check whether
the process is still alive. Previously it immediately treated the run as
dead, called releaseIssueExecutionAndPromote + startNextQueuedRunForAgent,
and dispatched a replacement — while the original process kept running as
an untracked orphan. On each subsequent restart this repeated, leading to
exponential orphan accumulation that exhausted the WSL memory cap and froze
the host.

Fix: when tracksLocalChild=true but both processPid and processGroupId are
null, skip the run if it was updated within the last 10 minutes. The slot
stays marked "running" and countRunningRunsForAgent() counts it, so the
concurrency cap remains enforced. After the grace window the slot is
reclaimed as before. The startup reaper (staleThresholdMs=0) and the
periodic reaper (5-min threshold) both benefit.

Two new tests cover the grace-period boundary:
  - recent null-PID run: reaped=0, status=running (grace period active)
  - old null-PID run:    reaped=1, status=failed  (grace period expired)

All 46 process-recovery tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@AnilChinchawaleXDC AnilChinchawaleXDC merged commit 6c39725 into master Jun 1, 2026
1 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants