Add Bitbucket integration for auth, repo access, sandbox git flow, and PR creation#177
Add Bitbucket integration for auth, repo access, sandbox git flow, and PR creation#177Abhinandan-Khurana wants to merge 7 commits intoColeMurray:mainfrom
Conversation
Greptile SummaryComprehensive Bitbucket integration PR that extends the existing GitHub-only codebase to support multiple VCS providers with provider-neutral abstractions. Implements OAuth flow, token management, sandbox git operations, and PR creation for Bitbucket alongside existing GitHub support. Major changes:
Architecture:
Test coverage:
Last reviewed commit: f07cec5 |
| onClick={() => signIn("bitbucket")} | ||
| className="flex items-center gap-2 bg-secondary text-secondary-foreground px-6 py-3 font-medium hover:opacity-90 transition" | ||
| > | ||
| Sign in with Bitbucket |
There was a problem hiding this comment.
[Automated Review] Both sign-in buttons are rendered unconditionally, but the auth providers in auth.ts are conditionally registered based on env vars (BITBUCKET_CLIENT_ID / BITBUCKET_CLIENT_SECRET). Clicking "Sign in with Bitbucket" when no Bitbucket OAuth consumer is configured will cause a NextAuth runtime error.
Gate the Bitbucket button on provider availability (e.g., expose configured providers via the session or a server-side check) so it only appears when Bitbucket OAuth is actually configured.
| condition = var.enable_github_bot == false || (length(var.github_webhook_secret) > 0 && length(var.github_bot_username) > 0) | ||
| error_message = "When enable_github_bot is true, github_webhook_secret and github_bot_username must be non-empty." | ||
| } | ||
| } |
There was a problem hiding this comment.
[Automated Review] The validation blocks for enable_github_bot and enable_linear_bot (which enforced that required secrets are non-empty when the bot is enabled) have been removed. This weakens deploy-time validation and is unrelated to Bitbucket support.
Please revert these deletions or, if intentional, justify in the PR description and split into a separate commit.
| @@ -207,7 +208,7 @@ export interface ParticipantPresence { | |||
|
|
|||
| // Repository types for GitHub App installation | |||
| export interface InstallationRepository { | |||
There was a problem hiding this comment.
[Automated Review] Widening id from number to string | number is a breaking change to a shared interface consumed by multiple packages. Any downstream code that assumes id is a number (strict equality checks, arithmetic, database schemas expecting INTEGER) will break silently or fail at runtime for Bitbucket repos where id is a UUID string like "{123}".
The use-repos.ts hook mirrors this change, so the break propagates to the web client as well.
Recommendation: Keep id: number and add a separate optional providerRepoId?: string field for Bitbucket's UUID identifier. This avoids forcing every consumer to handle a union type for what was previously a simple numeric ID.
| import { resolveScmProviderFromEnv, type SourceControlProviderName } from "../source-control"; | ||
|
|
||
| function stableRepoIdFromFullName(fullName: string): number { | ||
| let hash = 0; |
There was a problem hiding this comment.
[Automated Review] This hash function has several problems:
-
Overflow bug:
(hash * 31 + charCode) | 0forces 32-bit signed int. Whenhashequals-2147483648(valid 32-bit int),Math.abs(-2147483648)returns2147483648in JS, then+ 1gives2147483649— but in some edge casesMath.absof the minimum 32-bit int is itself negative in engines that use 32-bit internally. This is fragile. -
Collision risk: A 31-bit hash space (~2 billion values) makes collisions likely across repos. Since
repoIdis stored in D1 and used for session-to-repo association, a collision would silently associate sessions with the wrong repository. -
Applied to GitHub path too: The fallback in
resolveInstalledRepo(typeof repo.id === "number" ? repo.id : stableRepoIdFromFullName(...)) means this hash could be used for GitHub repos ifrepo.idcomes through as a string for any reason — changingrepoIdvalues for existing sessions and orphaning them in the session index.
Recommendation: Use Bitbucket's UUID string directly as the repo identifier (see comment on InstallationRepository.id) rather than hashing it into a synthetic numeric ID.
| const sessionScmProvider = body.vcsProvider ?? resolveDeploymentScmProvider(env); | ||
| const scmToken = body.scmToken ?? body.githubToken; | ||
|
|
||
| let repoId: number | null; |
There was a problem hiding this comment.
[Automated Review] repoId was previously number (always set by resolveInstalledRepo for GitHub). Changing it to number | null means null can now flow into the Session DO's upsertSession call and be stored as NULL in the repo_id SQLite column.
This changes the data model for sessions — any queries filtering or joining on repo_id (e.g., the D1 session index, repo secrets lookups) will silently miss these sessions. Existing code downstream likely assumes repo_id is always a valid integer.
Recommendation: Ensure resolveInstalledRepo always returns a non-null repoId (even for Bitbucket), or add explicit null handling in all downstream consumers before this goes in.
| return withCorsAndTraceHeaders(response, ctx); | ||
| } | ||
|
|
||
| resolveDeploymentScmProvider(env); |
There was a problem hiding this comment.
[Automated Review] The entire 501 guard for unimplemented SCM providers has been removed. Previously, setting SCM_PROVIDER=bitbucket without a working implementation would return a clear 501 Not Implemented on all non-public routes. Now requests proceed and will fail with less obvious errors deeper in the stack if Bitbucket credentials are missing or misconfigured.
Recommendation: Replace the old guard with credential validation — e.g., if provider === "bitbucket" and none of BITBUCKET_BOT_USERNAME, BITBUCKET_BOT_APP_PASSWORD, BITBUCKET_CLIENT_ID are set, return a clear error early rather than letting requests fail in auth/repo-resolution layers with confusing messages.
| from websockets.exceptions import InvalidStatus | ||
|
|
||
| try: | ||
| from websockets import State |
There was a problem hiding this comment.
[Automated Review] These changes are unrelated to Bitbucket and appear to be fixing a websockets library version compatibility issue. The ClientConnection type annotation changed to Any which weakens type safety.
Please split this into a separate commit/PR, and consider using a version-conditional type alias instead of Any to preserve type checking.
| terraform/environments/production/terraform.tfstate* | ||
| terraform/environments/production/.terraform* | ||
| terraform/environments/production/terraform.tfvars.backup-* | ||
|
|
There was a problem hiding this comment.
[Automated Review] research.md and plan.md look like developer scratch files that should not be in the public repo's .gitignore. Consider removing these entries — they belong in a personal global gitignore (~/.gitignore_global) rather than the project-level ignore file.
| */ | ||
|
|
||
| export * from "./types"; | ||
| export * from "./types/index"; |
There was a problem hiding this comment.
[Automated Review] This change from "./types" to "./types/index" is functionally equivalent but gratuitous — Node/TypeScript module resolution already resolves "./types" to "./types/index". The explicit /index suffix could cause issues with certain bundler caching or TypeScript project reference configurations. Please revert unless there's a specific reason for the change.
| const githubClientSecret = this.env.GITHUB_CLIENT_SECRET; | ||
| const refreshed = await refreshAccessToken(refreshToken, { | ||
| clientId: githubClientId as string, | ||
| clientSecret: githubClientSecret as string, |
There was a problem hiding this comment.
[Automated Review] refreshToken() now calls updateParticipantScmTokens() instead of updateParticipantTokens().
For existing GitHub sessions, this writes to scm_* columns AND conditionally to github_* columns (via the CASE WHEN ? = 'github' SQL). Existing participants have scm_provider defaulting to 'github' from the migration, but scm_refresh_token_encrypted will be null. The refreshToken() method reads from scm_refresh_token_encrypted ?? github_refresh_token_encrypted, which is fine for fallback, but after the first refresh it writes to scm_* columns — so the github_* columns will be updated via the CASE statement, which is correct.
However, the old updateParticipantTokens() method still exists — it's not clear if any other code path still calls it. If so, those callers would write to github_* columns only, causing the scm_* and github_* columns to drift out of sync. Please either remove updateParticipantTokens() if it's dead code, or document which callers use which method and why.
Prep for Bitbucket integration (PR #177). Refactor the sandbox manager and entrypoint to use VCS-provider-neutral env vars (VCS_HOST, VCS_CLONE_USERNAME, VCS_CLONE_TOKEN) derived from the deployment-level SCM_PROVIDER env var, so the sandbox pipeline works for any provider. - Rename SandboxConfig.github_app_token → clone_token - Rename restore_from_snapshot(github_app_token=) → clone_token= - Inject VCS_HOST/VCS_CLONE_USERNAME/VCS_CLONE_TOKEN in both create_sandbox() and restore_from_snapshot() - Gate GITHUB_APP_TOKEN/GITHUB_TOKEN on scm_provider == "github" - Add _build_repo_url() helper, replacing 3 inline URL constructions - Read VCS_* env vars in entrypoint with fallback to GITHUB_APP_TOKEN
Prep for Bitbucket integration (PR #177). Refactor the sandbox manager and entrypoint to use VCS-provider-neutral env vars (VCS_HOST, VCS_CLONE_USERNAME, VCS_CLONE_TOKEN) derived from the deployment-level SCM_PROVIDER env var, so the sandbox pipeline works for any provider. - Rename SandboxConfig.github_app_token → clone_token - Rename restore_from_snapshot(github_app_token=) → clone_token= - Inject VCS_HOST/VCS_CLONE_USERNAME/VCS_CLONE_TOKEN in both create_sandbox() and restore_from_snapshot() - Gate GITHUB_APP_TOKEN/GITHUB_TOKEN on scm_provider == "github" - Add _build_repo_url() helper, replacing 3 inline URL constructions - Read VCS_* env vars in entrypoint with fallback to GITHUB_APP_TOKEN
## Summary Prep for Bitbucket integration (PR #177). Refactors the sandbox manager and entrypoint to use VCS-provider-neutral env vars derived from the deployment-level `SCM_PROVIDER` env var, so the sandbox pipeline works for any provider without hardcoding `github.com` or `x-access-token`. - **manager.py**: Rename `SandboxConfig.github_app_token` → `clone_token`, inject `VCS_HOST`/`VCS_CLONE_USERNAME`/`VCS_CLONE_TOKEN` in both `create_sandbox()` and `restore_from_snapshot()`, gate `GITHUB_APP_TOKEN`/`GITHUB_TOKEN` on `scm_provider == "github"` - **entrypoint.py**: Read VCS_* env vars (with fallback to `GITHUB_APP_TOKEN`), add `_build_repo_url()` helper replacing 3 inline URL constructions - **web_api.py**: Update 2 callers for renamed `clone_token` field **Prerequisite**: `SCM_PROVIDER` must be added to the `internal-api` Modal secret in Terraform before Bitbucket deployments work. Existing GitHub deployments are unaffected (defaults to `"github"`). ## Test plan - [x] All 154 tests pass (142 existing + 12 new) - [x] `ruff check` and `ruff format` clean - [ ] Verify GitHub deployment still works (default path, no `SCM_PROVIDER` needed) - [ ] After Terraform adds `SCM_PROVIDER=bitbucket`, verify Bitbucket clone URLs are correct
Summary
This PR adds end-to-end Bitbucket support to Background Agents by extending the existing source-control abstraction and session pipeline.
What’s included
BitbucketSourceControlProviderand factory wiring for repo lookup, push auth/spec, and PR creation.vcsProviderpersistence and provider-aware runtime resolution in Session DO.VCS_*credentials.Validation
web,control-plane, workspace)