fix(github): harden connected-account and list_commits against Unhandled Lambda crash#334
fix(github): harden connected-account and list_commits against Unhandled Lambda crash#334Tram-Nguyen87 wants to merge 3 commits into
Conversation
…tch fails Root cause: GitHubConnectedAccountHandler.get_account_info called GitHubAPI.get_user(context) without any error handling. The SDK's Integration.get_connected_account() also does not catch exceptions from the handler, so any HTTPError (revoked/expired token, 5xx outage, etc.) propagated out of the Lambda and AWS reported "Unhandled" -- matching the Raygun crash for integration-github. Unlike actions (which are wrapped by @handle_github_errors), the connected-account handler had no safety net. Fix: wrap the GitHub API call in try/except, log the failure, and return an empty ConnectedAccountInfo so the Lambda returns normally. Auth failures still surface to the user the next time they invoke an action -- those paths return ActionError cleanly. Tests: 3 new unit tests covering populated info, HTTPError -> empty info (regression for this crash), and generic Exception -> empty info. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🔍 Integration Validation ResultsCommit: Changed directories:
✅ Structure Check output✅ Code Check output✅ Tests Check output✅ README Check output✅ Version Check output |
Required by version check CI on PR #334. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ProRedCat
left a comment
There was a problem hiding this comment.
The error did not come from the auth connection, though this does fix another potential issue. The instance that you were investigating was caused by list_commits action throwing an unhandled exception
Two additional defensive changes to address the recurring Raygun
"Unhandled" Lambda crash that Risheet flagged was actually triggered
inside list_commits (not the connected-account handler).
list_commits / get_commit
- Previously accessed commit["commit"]["author"]["name"] directly.
GitHub returns commit.author = null for commits whose email isn't
tied to a GitHub user (deleted accounts, bot commits, etc.), which
raised TypeError mid-action.
- New helper _commit_summary builds the response using .get() chains
and treats null author/committer as {name: null, email: null,
date: null}. Schema updated to allow date = null to match.
handle_github_errors decorator
- Now also catches asyncio.CancelledError. CancelledError inherits
from BaseException (not Exception), so a bare `except Exception`
let it slip through and the Lambda returned "Unhandled". The most
likely vector for the reported crash is a Lambda timeout on large
paginated repos; this ensures we return a clean ActionError instead
of crashing.
Note: the actual Python traceback from Raygun is not available (only
the .NET wrapper side is forwarded), so the exact cause of the
production crash is not fully confirmed. Both changes harden real
fragility regardless.
Tests
- 4 new cases under tests/test_github_branches_commits_unit.py covering
null author, null committer, and CancelledError handling.
|
Thanks for catching that the Raygun instance was actually triggered in I've expanded this PR to cover both. Latest commit (
|
|
This is directionally good and the null-author/committer handling looks like a real fix for a plausible Lambda hard timeouts won’t normally raise a catchable Python I’d try to add a more deterministic mitigation for long paginated calls, e.g. page/limit controls or an invocation time budget. The connected-account and commit-shaping fixes look good. Also, it'd be interesting to log some more details in there to see what's really going on. |
|
|
||
| return await func(self, inputs, context) | ||
|
|
||
| except asyncio.CancelledError: |
There was a problem hiding this comment.
Would this really catch a timeout from asyncio? It has its own TimeoutException from looking at the docs and I'm not sure a timeout would actually be exposed there via asyncio.
There was a problem hiding this comment.
As a more specific recommendation, I'd change the comment to this:
# Defensive: convert Python-side task cancellation into an
# ActionError if cancellation reaches the action coroutine.
# This does not catch Lambda hard timeouts or caller-side
# invocation cancellation.
| } | ||
|
|
||
| @staticmethod | ||
| async def paginated_fetch( |
There was a problem hiding this comment.
One point I'd like to raise:
If we do think this is connected to timeouts, one reason could be that the fetches here are essentially unbound.
Right now GitHub pagination loops until GitHub returns fewer than per_page items. For large repos, list_commits can keep fetching pages until the Lambda dies. A safer shape would be:
async def paginated_fetch(
context: ExecutionContext,
url: str,
params: Dict[str, Any] = None,
data_key: str = None,
max_pages: int = 10,
) -> List[Dict[str, Any]]:
...
pages_fetched = 0
while True:
if pages_fetched >= max_pages:
raise TimeoutError(
f"GitHub pagination stopped after {max_pages} pages. "
"Use filters such as sha, path, since, or until to narrow the request."
)
fetch_result = await context.fetch(url, params=params, headers=headers)
pages_fetched += 1
...
Then for list_commits, either hardcode a conservative max or expose it in config.json:
commits = await GitHubAPI.get_commits(
context,
inputs["owner"],
inputs["repo"],
sha=inputs.get("sha"),
path=inputs.get("path"),
since=inputs.get("since"),
until=inputs.get("until"),
max_pages=inputs.get("max_pages", 10),
)
That would actually prevent long unbounded pagination from reaching runtime cancellation.
Summary
This PR closes three known fragility holes that can produce the
UnhandledLambda error seen in Raygun (integration-github:14). The reviewer noted the original Raygun instance was triggered insidelist_commits, not the connected-account path — so this PR has been expanded to cover both, plus a defensive widening of the action decorator.1. Connected-account handler (original scope)
GitHubConnectedAccountHandler.get_account_infohad no error handling. The SDK'sIntegration.get_connected_account()also doesn't catch handler exceptions, so anyHTTPError(revoked/expired token, 5xx outage) propagated out of the Lambda.Fix: wrap
GitHubAPI.get_user(context)intry/except, log a warning, and return an emptyConnectedAccountInfoso the Lambda returns normally.2.
list_commits/get_commitnull-author crash (new)The result-shaping code accessed
commit["commit"]["author"]["name"]directly. GitHub returnscommit.author = nullfor commits whose email isn't tied to a GitHub user (deleted accounts, bot commits), which raisedTypeErrormid-action.Fix: new
_commit_summaryhelper uses.get()chains and treats null author/committer as{name: null, email: null, date: null}. Output schema updated to allowdate = nullto match.3.
handle_github_errorsdecorator — broaden to async cancellation (new)The decorator's
except Exceptiondid not catchasyncio.CancelledError(aBaseException), which is the most plausible vector for Lambda timeouts on large paginated requests.Fix: explicit
except asyncio.CancelledErrorreturns a cleanActionErrorso the Lambda no longer returnsUnhandledon timeout.Honest caveat on root cause
The actual Python traceback for the production
Unhandledevents is not available — the .NET wrapper that talks to the Lambda only forwards the AWSFunctionError: Unhandledheader to Raygun, not theLogResultpayload that contains the Python traceback. The three fixes above each close a real, independently demonstrable fragility, but the exact production cause is inferred from code reading + the reviewer's pointer tolist_commits. A follow-up to forwardLogResultto Raygun would make future investigations much faster.Test plan (verified locally)
HTTPErrorbefore the connected-account fix; finishes cleanly after.TestConnectedAccount× 3 — populated info, HTTPError, generic exceptionTestListCommits::test_null_author_does_not_crashTestListCommits::test_null_committer_does_not_crashTestListCommits::test_cancelled_error_returns_action_errorTestGetCommit::test_null_author_does_not_crashvalidate_integration.py github— passedcheck_code.py github— passed (lint, format, bandit, pip-audit, config sync, fetch patterns)Production verification (for reviewer / QA)
For the connected-account path:
Unhandledevent in Raygun.For the
list_commitspath:torvalds/linuxstyle projects).list_commitson it.author: nullfields rather than crashing.Before the connected-account fix — repro script crashes with an uncaught HTTPError:

After the fix — same script, same bad token, finishes cleanly:
