Skip to content

fix: avoid 500s when a2a cleanup is cancelled#1268

Open
lithammer wants to merge 1 commit intokagent-dev:mainfrom
lithammer:handle-cancellation-during-cleanup
Open

fix: avoid 500s when a2a cleanup is cancelled#1268
lithammer wants to merge 1 commit intokagent-dev:mainfrom
lithammer:handle-cancellation-during-cleanup

Conversation

@lithammer
Copy link

We hit a case where a client retried on 5xx responses, and the agent's Slack tool had already executed before the server returned 500. The request itself was effectively successful, but cleanup raised CancelledError, which bubbled into a 500 and triggered the retry. That led to duplicate Slack messages.

This change treats CancelledError during _cleanup_producer as non‑fatal so a completed request doesn't get turned into a 500 during cleanup. The result is that clients don't retry after work has already been done, avoiding duplicate side effects.

@lithammer lithammer marked this pull request as ready for review February 5, 2026 13:57
Copilot AI review requested due to automatic review settings February 5, 2026 13:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents duplicate side effects when clients retry on 5xx responses by treating CancelledError during A2A cleanup as non-fatal. Previously, if cleanup was cancelled after a request completed successfully, the server would return a 500 error, triggering client retries and causing duplicate operations (e.g., multiple Slack messages).

Changes:

  • Introduced SafeRequestHandler that catches CancelledError during cleanup and performs best-effort resource cleanup instead of propagating the error
  • Replaced DefaultRequestHandler with SafeRequestHandler in the A2A application
  • Added unit tests verifying that cleanup handles both cancelled and successful task scenarios

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
python/packages/kagent-adk/src/kagent/adk/_safe_request_handler.py New handler class that catches CancelledError during cleanup and performs manual resource cleanup
python/packages/kagent-adk/src/kagent/adk/_a2a.py Updated to use SafeRequestHandler instead of DefaultRequestHandler
python/packages/kagent-adk/tests/unittests/test_safe_request_handler.py Added tests for both cancelled and normal cleanup scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Swallow CancelledError during cleanup so successful requests don't turn
into 5xx responses. This prevents clients retrying after side effects
already happened.

Signed-off-by: Peter Lithammer <peter.lithammer@embark-studios.com>
@lithammer lithammer force-pushed the handle-cancellation-during-cleanup branch from 0d98064 to abeed0d Compare February 5, 2026 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant