Skip to content

Comments

Add Bitbucket integration for auth, repo access, sandbox git flow, and PR creation#177

Open
Abhinandan-Khurana wants to merge 7 commits intoColeMurray:mainfrom
Abhinandan-Khurana:bitbucket-dev
Open

Add Bitbucket integration for auth, repo access, sandbox git flow, and PR creation#177
Abhinandan-Khurana wants to merge 7 commits intoColeMurray:mainfrom
Abhinandan-Khurana:bitbucket-dev

Conversation

@Abhinandan-Khurana
Copy link

Summary
This PR adds end-to-end Bitbucket support to Background Agents by extending the existing source-control abstraction and session pipeline.

What’s included

  • Added BitbucketSourceControlProvider and factory wiring for repo lookup, push auth/spec, and PR creation.
  • Added Bitbucket OAuth flow in web auth and provider-neutral SCM token/session fields across web + control-plane.
  • Added per-session vcsProvider persistence and provider-aware runtime resolution in Session DO.
  • Added Bitbucket repo listing and repo validation using OAuth token fallback.
  • Updated sandbox creation/restore and entrypoint git sync to use provider-neutral VCS_* credentials.
  • Updated docs, env examples, and Terraform config for Bitbucket variables/secrets.
  • Added security fix to avoid putting OAuth access tokens in URL query params during userinfo.

Validation

  • Typecheck passed (web, control-plane, workspace)
  • Control-plane tests passed
  • Web tests passed
  • Modal tests passed

@greptile-apps
Copy link

greptile-apps bot commented Feb 21, 2026

Greptile Summary

Comprehensive Bitbucket integration PR that extends the existing GitHub-only codebase to support multiple VCS providers with provider-neutral abstractions. Implements OAuth flow, token management, sandbox git operations, and PR creation for Bitbucket alongside existing GitHub support.

Major changes:

  • Added BitbucketSourceControlProvider implementing the same interface as GitHub provider with repository access, PR creation, and push auth generation
  • Extended session and participant schemas with vcs_provider, scm_provider, and provider-neutral scm_* token fields while maintaining backward compatibility with legacy github_* fields
  • Updated sandbox entrypoint to use provider-neutral VCS_* environment variables for git clone/push operations
  • Added Bitbucket OAuth flow in web auth with security fix removing token from URL query params
  • Extended control-plane routing to resolve provider per-session and support both bot credentials and user OAuth tokens

Architecture:

  • Uses factory pattern for provider selection with runtime resolution from session state
  • Maintains backward compatibility by defaulting to GitHub when vcs_provider not set
  • Schema migrations (19-27) add new columns with safe defaults for existing sessions
  • Sandbox manager passes provider-specific credentials via environment variables based on vcs_provider

Test coverage:

  • Unit tests added for Bitbucket provider (repository fetch, PR creation, push spec building)
  • Existing tests updated to include provider context
  • All reported test suites passing

Last reviewed commit: f07cec5

onClick={() => signIn("bitbucket")}
className="flex items-center gap-2 bg-secondary text-secondary-foreground px-6 py-3 font-medium hover:opacity-90 transition"
>
Sign in with Bitbucket
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Automated Review] Both sign-in buttons are rendered unconditionally, but the auth providers in auth.ts are conditionally registered based on env vars (BITBUCKET_CLIENT_ID / BITBUCKET_CLIENT_SECRET). Clicking "Sign in with Bitbucket" when no Bitbucket OAuth consumer is configured will cause a NextAuth runtime error.

Gate the Bitbucket button on provider availability (e.g., expose configured providers via the session or a server-side check) so it only appears when Bitbucket OAuth is actually configured.

condition = var.enable_github_bot == false || (length(var.github_webhook_secret) > 0 && length(var.github_bot_username) > 0)
error_message = "When enable_github_bot is true, github_webhook_secret and github_bot_username must be non-empty."
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Automated Review] The validation blocks for enable_github_bot and enable_linear_bot (which enforced that required secrets are non-empty when the bot is enabled) have been removed. This weakens deploy-time validation and is unrelated to Bitbucket support.

Please revert these deletions or, if intentional, justify in the PR description and split into a separate commit.

@@ -207,7 +208,7 @@ export interface ParticipantPresence {

// Repository types for GitHub App installation
export interface InstallationRepository {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Automated Review] Widening id from number to string | number is a breaking change to a shared interface consumed by multiple packages. Any downstream code that assumes id is a number (strict equality checks, arithmetic, database schemas expecting INTEGER) will break silently or fail at runtime for Bitbucket repos where id is a UUID string like "{123}".

The use-repos.ts hook mirrors this change, so the break propagates to the web client as well.

Recommendation: Keep id: number and add a separate optional providerRepoId?: string field for Bitbucket's UUID identifier. This avoids forcing every consumer to handle a union type for what was previously a simple numeric ID.

import { resolveScmProviderFromEnv, type SourceControlProviderName } from "../source-control";

function stableRepoIdFromFullName(fullName: string): number {
let hash = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Automated Review] This hash function has several problems:

  1. Overflow bug: (hash * 31 + charCode) | 0 forces 32-bit signed int. When hash equals -2147483648 (valid 32-bit int), Math.abs(-2147483648) returns 2147483648 in JS, then + 1 gives 2147483649 — but in some edge cases Math.abs of the minimum 32-bit int is itself negative in engines that use 32-bit internally. This is fragile.

  2. Collision risk: A 31-bit hash space (~2 billion values) makes collisions likely across repos. Since repoId is stored in D1 and used for session-to-repo association, a collision would silently associate sessions with the wrong repository.

  3. Applied to GitHub path too: The fallback in resolveInstalledRepo (typeof repo.id === "number" ? repo.id : stableRepoIdFromFullName(...)) means this hash could be used for GitHub repos if repo.id comes through as a string for any reason — changing repoId values for existing sessions and orphaning them in the session index.

Recommendation: Use Bitbucket's UUID string directly as the repo identifier (see comment on InstallationRepository.id) rather than hashing it into a synthetic numeric ID.

const sessionScmProvider = body.vcsProvider ?? resolveDeploymentScmProvider(env);
const scmToken = body.scmToken ?? body.githubToken;

let repoId: number | null;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Automated Review] repoId was previously number (always set by resolveInstalledRepo for GitHub). Changing it to number | null means null can now flow into the Session DO's upsertSession call and be stored as NULL in the repo_id SQLite column.

This changes the data model for sessions — any queries filtering or joining on repo_id (e.g., the D1 session index, repo secrets lookups) will silently miss these sessions. Existing code downstream likely assumes repo_id is always a valid integer.

Recommendation: Ensure resolveInstalledRepo always returns a non-null repoId (even for Bitbucket), or add explicit null handling in all downstream consumers before this goes in.

return withCorsAndTraceHeaders(response, ctx);
}

resolveDeploymentScmProvider(env);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Automated Review] The entire 501 guard for unimplemented SCM providers has been removed. Previously, setting SCM_PROVIDER=bitbucket without a working implementation would return a clear 501 Not Implemented on all non-public routes. Now requests proceed and will fail with less obvious errors deeper in the stack if Bitbucket credentials are missing or misconfigured.

Recommendation: Replace the old guard with credential validation — e.g., if provider === "bitbucket" and none of BITBUCKET_BOT_USERNAME, BITBUCKET_BOT_APP_PASSWORD, BITBUCKET_CLIENT_ID are set, return a clear error early rather than letting requests fail in auth/repo-resolution layers with confusing messages.

from websockets.exceptions import InvalidStatus

try:
from websockets import State
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Automated Review] These changes are unrelated to Bitbucket and appear to be fixing a websockets library version compatibility issue. The ClientConnection type annotation changed to Any which weakens type safety.

Please split this into a separate commit/PR, and consider using a version-conditional type alias instead of Any to preserve type checking.

terraform/environments/production/terraform.tfstate*
terraform/environments/production/.terraform*
terraform/environments/production/terraform.tfvars.backup-*

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Automated Review] research.md and plan.md look like developer scratch files that should not be in the public repo's .gitignore. Consider removing these entries — they belong in a personal global gitignore (~/.gitignore_global) rather than the project-level ignore file.

*/

export * from "./types";
export * from "./types/index";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Automated Review] This change from "./types" to "./types/index" is functionally equivalent but gratuitous — Node/TypeScript module resolution already resolves "./types" to "./types/index". The explicit /index suffix could cause issues with certain bundler caching or TypeScript project reference configurations. Please revert unless there's a specific reason for the change.

const githubClientSecret = this.env.GITHUB_CLIENT_SECRET;
const refreshed = await refreshAccessToken(refreshToken, {
clientId: githubClientId as string,
clientSecret: githubClientSecret as string,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Automated Review] refreshToken() now calls updateParticipantScmTokens() instead of updateParticipantTokens().

For existing GitHub sessions, this writes to scm_* columns AND conditionally to github_* columns (via the CASE WHEN ? = 'github' SQL). Existing participants have scm_provider defaulting to 'github' from the migration, but scm_refresh_token_encrypted will be null. The refreshToken() method reads from scm_refresh_token_encrypted ?? github_refresh_token_encrypted, which is fine for fallback, but after the first refresh it writes to scm_* columns — so the github_* columns will be updated via the CASE statement, which is correct.

However, the old updateParticipantTokens() method still exists — it's not clear if any other code path still calls it. If so, those callers would write to github_* columns only, causing the scm_* and github_* columns to drift out of sync. Please either remove updateParticipantTokens() if it's dead code, or document which callers use which method and why.

ColeMurray added a commit that referenced this pull request Feb 22, 2026
Prep for Bitbucket integration (PR #177). Refactor the sandbox manager
and entrypoint to use VCS-provider-neutral env vars (VCS_HOST,
VCS_CLONE_USERNAME, VCS_CLONE_TOKEN) derived from the deployment-level
SCM_PROVIDER env var, so the sandbox pipeline works for any provider.

- Rename SandboxConfig.github_app_token → clone_token
- Rename restore_from_snapshot(github_app_token=) → clone_token=
- Inject VCS_HOST/VCS_CLONE_USERNAME/VCS_CLONE_TOKEN in both
  create_sandbox() and restore_from_snapshot()
- Gate GITHUB_APP_TOKEN/GITHUB_TOKEN on scm_provider == "github"
- Add _build_repo_url() helper, replacing 3 inline URL constructions
- Read VCS_* env vars in entrypoint with fallback to GITHUB_APP_TOKEN
ColeMurray added a commit that referenced this pull request Feb 22, 2026
Prep for Bitbucket integration (PR #177). Refactor the sandbox manager
and entrypoint to use VCS-provider-neutral env vars (VCS_HOST,
VCS_CLONE_USERNAME, VCS_CLONE_TOKEN) derived from the deployment-level
SCM_PROVIDER env var, so the sandbox pipeline works for any provider.

- Rename SandboxConfig.github_app_token → clone_token
- Rename restore_from_snapshot(github_app_token=) → clone_token=
- Inject VCS_HOST/VCS_CLONE_USERNAME/VCS_CLONE_TOKEN in both
  create_sandbox() and restore_from_snapshot()
- Gate GITHUB_APP_TOKEN/GITHUB_TOKEN on scm_provider == "github"
- Add _build_repo_url() helper, replacing 3 inline URL constructions
- Read VCS_* env vars in entrypoint with fallback to GITHUB_APP_TOKEN
ColeMurray added a commit that referenced this pull request Feb 22, 2026
## Summary

Prep for Bitbucket integration (PR #177). Refactors the sandbox manager
and entrypoint to use VCS-provider-neutral env vars derived from the
deployment-level `SCM_PROVIDER` env var, so the sandbox pipeline works
for any provider without hardcoding `github.com` or `x-access-token`.

- **manager.py**: Rename `SandboxConfig.github_app_token` →
`clone_token`, inject `VCS_HOST`/`VCS_CLONE_USERNAME`/`VCS_CLONE_TOKEN`
in both `create_sandbox()` and `restore_from_snapshot()`, gate
`GITHUB_APP_TOKEN`/`GITHUB_TOKEN` on `scm_provider == "github"`
- **entrypoint.py**: Read VCS_* env vars (with fallback to
`GITHUB_APP_TOKEN`), add `_build_repo_url()` helper replacing 3 inline
URL constructions
- **web_api.py**: Update 2 callers for renamed `clone_token` field

**Prerequisite**: `SCM_PROVIDER` must be added to the `internal-api`
Modal secret in Terraform before Bitbucket deployments work. Existing
GitHub deployments are unaffected (defaults to `"github"`).

## Test plan

- [x] All 154 tests pass (142 existing + 12 new)
- [x] `ruff check` and `ruff format` clean
- [ ] Verify GitHub deployment still works (default path, no
`SCM_PROVIDER` needed)
- [ ] After Terraform adds `SCM_PROVIDER=bitbucket`, verify Bitbucket
clone URLs are correct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants