Skip to content

fix(evaluation): gc.collect() after each instance to reclaim cyclic RemoteConversation objects#702

Closed
simonrosenberg wants to merge 2 commits intomainfrom
fix/gc-collect-after-instance
Closed

fix(evaluation): gc.collect() after each instance to reclaim cyclic RemoteConversation objects#702
simonrosenberg wants to merge 2 commits intomainfrom
fix/gc-collect-after-instance

Conversation

@simonrosenberg
Copy link
Copy Markdown
Collaborator

@simonrosenberg simonrosenberg commented May 1, 2026

Summary

Add gc.collect() after each instance result is persisted, so the cyclic GC runs once per instance rather than on an unpredictable threshold.

Why

RemoteConversation has a circular reference via its WebSocket callback closure that prevents CPython's reference counter from freeing it immediately. Objects with large _cached_events lists accumulate across instances and can OOM the eval-container. Fixes OpenHands/evaluation#540.

Companion PR

OpenHands/software-agent-sdk#3031 eliminates the cycle at the source with a weakref. This gc.collect() is a belt-and-suspenders safety net.

…emoteConversation objects

RemoteConversation has a circular reference that prevents CPython's
reference counter from freeing it when evaluate_instance() returns:

  conversation → _ws_client (WebSocketCallbackClient)
               → callback (composed_callback)
               → run_complete_callback closure → self._terminal_status_queue

CPython's cyclic GC only runs on a threshold, not on demand. With many
workers completing ACP instances rapidly, circular-ref objects accumulate
before the threshold fires. ACP agents make this severe because they have
no condenser, so each conversation's _cached_events list grows large
(100+ tool calls × MB-sized raw_output on complex JS projects).

Adding gc.collect() in the per-instance callback ensures the cyclic GC
runs immediately after each result is persisted, keeping memory flat
regardless of how many instances have completed.

This is a belt-and-suspenders fix alongside the SDK-side weakref change
(OpenHands/software-agent-sdk), which eliminates the cycle at the source.

Fixes OpenHands/evaluation#540

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Simple, pragmatic fix for a real OOM issue

[TESTING GAPS]

  • [PR description] No Evidence: The test plan items are unchecked. Add an Evidence section showing that you ran swebenchmultimodal with acp-codex and confirmed it completes without OOMKill at the previously failing instance count (~40/68). Include the command used and confirmation that instances completed successfully.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟢 LOW

This is a defensive memory management fix with minimal risk. It adds a single gc.collect() call to force cyclic garbage collection after each instance completes, preventing memory accumulation from circular references in RemoteConversation objects. The change is small, focused, and doesn't modify core logic or data structures. The root cause is being addressed in the SDK layer with weakref, making this a belt-and-suspenders safety net against future cycles.

VERDICT:
Worth merging: Simple, pragmatic fix for a real production issue. Would benefit from evidence that the fix was validated.

KEY INSIGHT:
Calling gc.collect() explicitly trades a small performance cost for preventing OOM failures—a justified trade-off when circular references prevent timely cleanup.


Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

  1. Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.
  2. Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
  3. When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@simonrosenberg simonrosenberg self-assigned this May 1, 2026
@simonrosenberg
Copy link
Copy Markdown
Collaborator Author

Closing because not needed, this fixes it OpenHands/software-agent-sdk#3033

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants