Skip to content

fix(github): harden connected-account and list_commits against Unhandled Lambda crash#334

Open
Tram-Nguyen87 wants to merge 3 commits into
masterfrom
fix/github-unhandled-lambda-connected-account
Open

fix(github): harden connected-account and list_commits against Unhandled Lambda crash#334
Tram-Nguyen87 wants to merge 3 commits into
masterfrom
fix/github-unhandled-lambda-connected-account

Conversation

@Tram-Nguyen87
Copy link
Copy Markdown
Contributor

@Tram-Nguyen87 Tram-Nguyen87 commented May 22, 2026

Summary

This PR closes three known fragility holes that can produce the Unhandled Lambda error seen in Raygun (integration-github:14). The reviewer noted the original Raygun instance was triggered inside list_commits, not the connected-account path — so this PR has been expanded to cover both, plus a defensive widening of the action decorator.

1. Connected-account handler (original scope)

GitHubConnectedAccountHandler.get_account_info had no error handling. The SDK's Integration.get_connected_account() also doesn't catch handler exceptions, so any HTTPError (revoked/expired token, 5xx outage) propagated out of the Lambda.

Fix: wrap GitHubAPI.get_user(context) in try/except, log a warning, and return an empty ConnectedAccountInfo so the Lambda returns normally.

2. list_commits / get_commit null-author crash (new)

The result-shaping code accessed commit["commit"]["author"]["name"] directly. GitHub returns commit.author = null for commits whose email isn't tied to a GitHub user (deleted accounts, bot commits), which raised TypeError mid-action.

Fix: new _commit_summary helper uses .get() chains and treats null author/committer as {name: null, email: null, date: null}. Output schema updated to allow date = null to match.

3. handle_github_errors decorator — broaden to async cancellation (new)

The decorator's except Exception did not catch asyncio.CancelledError (a BaseException), which is the most plausible vector for Lambda timeouts on large paginated requests.

Fix: explicit except asyncio.CancelledError returns a clean ActionError so the Lambda no longer returns Unhandled on timeout.

Honest caveat on root cause

The actual Python traceback for the production Unhandled events is not available — the .NET wrapper that talks to the Lambda only forwards the AWS FunctionError: Unhandled header to Raygun, not the LogResult payload that contains the Python traceback. The three fixes above each close a real, independently demonstrable fragility, but the exact production cause is inferred from code reading + the reviewer's pointer to list_commits. A follow-up to forward LogResult to Raygun would make future investigations much faster.

Test plan (verified locally)

  • Local repro with deliberately invalid token reproduces the original uncaught HTTPError before the connected-account fix; finishes cleanly after.
  • 7 new unit tests:
    • TestConnectedAccount × 3 — populated info, HTTPError, generic exception
    • TestListCommits::test_null_author_does_not_crash
    • TestListCommits::test_null_committer_does_not_crash
    • TestListCommits::test_cancelled_error_returns_action_error
    • TestGetCommit::test_null_author_does_not_crash
  • 89 unit tests pass (82 existing + 7 new)
  • validate_integration.py github — passed
  • check_code.py github — passed (lint, format, bandit, pip-audit, config sync, fetch patterns)

Production verification (for reviewer / QA)

For the connected-account path:

  1. Connect a GitHub account normally.
  2. Revoke Autohive's access in GitHub → Settings → Applications → Authorized OAuth Apps.
  3. Trigger the connected-account refresh (reload Integrations page).
  4. Expected: integration card stays usable; no new Unhandled event in Raygun.

For the list_commits path:

  1. Pick an old open-source repo with commits authored by deleted GitHub accounts (e.g., very old repos in torvalds/linux style projects).
  2. Run list_commits on it.
  3. Expected: returns commits with author: null fields rather than crashing.

Before the connected-account fix — repro script crashes with an uncaught HTTPError:
Screenshot 2026-05-22 121911

After the fix — same script, same bad token, finishes cleanly:
Screenshot 2026-05-22 120647

…tch fails

Root cause: GitHubConnectedAccountHandler.get_account_info called
GitHubAPI.get_user(context) without any error handling. The SDK's
Integration.get_connected_account() also does not catch exceptions from
the handler, so any HTTPError (revoked/expired token, 5xx outage, etc.)
propagated out of the Lambda and AWS reported "Unhandled" -- matching
the Raygun crash for integration-github.

Unlike actions (which are wrapped by @handle_github_errors), the
connected-account handler had no safety net.

Fix: wrap the GitHub API call in try/except, log the failure, and
return an empty ConnectedAccountInfo so the Lambda returns normally.
Auth failures still surface to the user the next time they invoke an
action -- those paths return ActionError cleanly.

Tests: 3 new unit tests covering populated info, HTTPError -> empty
info (regression for this crash), and generic Exception -> empty info.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 22, 2026

🔍 Integration Validation Results

Commit: 8a27edec043d50cd7f784f89eaac3660e2a61259 · fix(github): harden list_commits and catch async cancellation
Updated: 2026-05-22T02:52:05Z

Changed directories: github

Check Result
Structure ✅ Passed
Code ✅ Passed
Tests ✅ Passed
README ✅ Passed
Version ✅ Passed
✅ Structure Check output
Validating 1 integration(s)...

============================================================
Integration: github
============================================================
✅ All checks passed!

============================================================
SUMMARY
============================================================
Integrations validated: 1
Total errors: 0
Total warnings: 0

✅ All validations passed!
✅ Code Check output
----------------------------------------
Checking: github
----------------------------------------

📦 Installing dependencies...

🐍 Checking Python syntax...
   ✅ Syntax OK

📥 Checking imports...
   ✅ Imports OK

📄 Checking JSON files...
   ✅ JSON files OK

🔍 Linting with ruff...
   ✅ Lint OK

🎨 Checking formatting with ruff...
   ✅ Formatting OK

🔒 Scanning for security issues with bandit...
   ✅ Security OK

🛡️ Checking dependencies for vulnerabilities with pip-audit...
   ✅ Dependencies OK

🔗 Checking config-code sync...
   ✅ Config-code sync OK

🔄 Checking fetch patterns...
   ✅ Fetch patterns OK

========================================
✅ CODE CHECK PASSED
========================================
✅ Tests Check output

Integration   Tests  Coverage        Status
-------------------------------------------
github     89/89       86%      ✅ Passed
-------------------------------------------
Total      89/89            ✅ All passed

✅ Tests passed: github
✅ README Check output
========================================
✅ README CHECK PASSED
========================================
✅ Version Check output
✅ github: 2.3.0 → 2.4.0 (minor bump)

========================================
✅ VERSION CHECK PASSED
========================================

Required by version check CI on PR #334.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@ProRedCat ProRedCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error did not come from the auth connection, though this does fix another potential issue. The instance that you were investigating was caused by list_commits action throwing an unhandled exception

Two additional defensive changes to address the recurring Raygun
"Unhandled" Lambda crash that Risheet flagged was actually triggered
inside list_commits (not the connected-account handler).

list_commits / get_commit
- Previously accessed commit["commit"]["author"]["name"] directly.
  GitHub returns commit.author = null for commits whose email isn't
  tied to a GitHub user (deleted accounts, bot commits, etc.), which
  raised TypeError mid-action.
- New helper _commit_summary builds the response using .get() chains
  and treats null author/committer as {name: null, email: null,
  date: null}. Schema updated to allow date = null to match.

handle_github_errors decorator
- Now also catches asyncio.CancelledError. CancelledError inherits
  from BaseException (not Exception), so a bare `except Exception`
  let it slip through and the Lambda returned "Unhandled". The most
  likely vector for the reported crash is a Lambda timeout on large
  paginated repos; this ensures we return a clean ActionError instead
  of crashing.

Note: the actual Python traceback from Raygun is not available (only
the .NET wrapper side is forwarded), so the exact cause of the
production crash is not fully confirmed. Both changes harden real
fragility regardless.

Tests
- 4 new cases under tests/test_github_branches_commits_unit.py covering
  null author, null committer, and CancelledError handling.
@Tram-Nguyen87 Tram-Nguyen87 changed the title fix(github): prevent Unhandled Lambda crash in connected-account handler fix(github): harden connected-account and list_commits against Unhandled Lambda crash May 22, 2026
@Tram-Nguyen87
Copy link
Copy Markdown
Contributor Author

Tram-Nguyen87 commented May 22, 2026

Thanks for catching that the Raygun instance was actually triggered in list_commits, not the connected-account path.

I've expanded this PR to cover both. Latest commit (8a27ede) adds:

  1. Defensive _commit_summary helper so a null author/committer in commit["commit"] no longer raises TypeError.
  2. handle_github_errors now also catches asyncio.CancelledError (a BaseException that bypasses except Exception) — most plausible vector for a Lambda timeout returning Unhandled.

@Tram-Nguyen87 Tram-Nguyen87 requested a review from ProRedCat May 22, 2026 03:05
@TheRealAgentK TheRealAgentK self-requested a review May 22, 2026 06:00
@TheRealAgentK
Copy link
Copy Markdown
Collaborator

TheRealAgentK commented May 22, 2026

This is directionally good and the null-author/committer handling looks like a real fix for a plausible list_commits crash. I’d like one clarification/change before approving: the asyncio.CancelledError path doesn’t seem like it actually proves or fixes Lambda timeout behavior.

Lambda hard timeouts won’t normally raise a catchable Python CancelledError, and asyncio.wait_for timeout would surface as TimeoutError, already covered by except Exception.

I’d try to add a more deterministic mitigation for long paginated calls, e.g. page/limit controls or an invocation time budget. The connected-account and commit-shaping fixes look good.

Also, it'd be interesting to log some more details in there to see what's really going on.

Comment thread github/github.py

return await func(self, inputs, context)

except asyncio.CancelledError:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this really catch a timeout from asyncio? It has its own TimeoutException from looking at the docs and I'm not sure a timeout would actually be exposed there via asyncio.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a more specific recommendation, I'd change the comment to this:

# Defensive: convert Python-side task cancellation into an
# ActionError if cancellation reaches the action coroutine.
# This does not catch Lambda hard timeouts or caller-side
# invocation cancellation.

Comment thread github/github.py
}

@staticmethod
async def paginated_fetch(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One point I'd like to raise:

If we do think this is connected to timeouts, one reason could be that the fetches here are essentially unbound.

Right now GitHub pagination loops until GitHub returns fewer than per_page items. For large repos, list_commits can keep fetching pages until the Lambda dies. A safer shape would be:

    async def paginated_fetch(
        context: ExecutionContext,
        url: str,
        params: Dict[str, Any] = None,
        data_key: str = None,
        max_pages: int = 10,
    ) -> List[Dict[str, Any]]:
        ...
        pages_fetched = 0
        while True:
            if pages_fetched >= max_pages:
                raise TimeoutError(
                    f"GitHub pagination stopped after {max_pages} pages. "
                    "Use filters such as sha, path, since, or until to narrow the request."
                )

            fetch_result = await context.fetch(url, params=params, headers=headers)
            pages_fetched += 1
            ...

Then for list_commits, either hardcode a conservative max or expose it in config.json:

commits = await GitHubAPI.get_commits(
    context,
    inputs["owner"],
    inputs["repo"],
    sha=inputs.get("sha"),
    path=inputs.get("path"),
    since=inputs.get("since"),
    until=inputs.get("until"),
    max_pages=inputs.get("max_pages", 10),
)

That would actually prevent long unbounded pagination from reaching runtime cancellation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants