Copy checkpoints from forks on PR merge via actions#259
Conversation
Entire-Checkpoint: 61a440d894cc
There was a problem hiding this comment.
Pull request overview
This PR adds a GitHub Actions workflow to automatically sync checkpoint data from forked repositories when pull requests are merged. This enables preserving full Entire session history even when contributors work from forks without push access to the upstream repository.
Changes:
- Adds workflow triggered on fork PR merges via
pull_request_target - Implements composite action to extract checkpoint IDs from commit trailers, fetch fork's checkpoints branch, and cherry-pick relevant commits
- Uses retry logic with rebase/merge fallback to handle concurrent operations
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 13 comments.
| File | Description |
|---|---|
.github/workflows/sync-fork-checkpoints.yml |
Workflow definition that triggers on merged fork PRs, guards execution, and calls the sync action |
.github/actions/sync-fork-checkpoints/action.yml |
Composite action containing bash script to extract checkpoint trailers, fetch fork data, validate IDs, cherry-pick commits, and push to upstream |
| github.event.pull_request.head.repo.fork == true | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: entireio/cli/.github/actions/sync-fork-checkpoints@main |
There was a problem hiding this comment.
The action reference should use a relative path for local actions in the same repository. Change this to use a relative path starting with ./ instead of the full repository path. Using the full repository path entireio/cli/.github/actions/...@main will cause GitHub Actions to fetch the action from the main branch instead of using the version in the current commit, which could lead to unexpected behavior during development and testing.
| - uses: entireio/cli/.github/actions/sync-fork-checkpoints@main | |
| - uses: ./.github/actions/sync-fork-checkpoints |
| git fetch origin "$MERGE_SHA" --depth=100 2>/dev/null || true | ||
| git fetch origin "$BASE_SHA" --depth=100 2>/dev/null || true |
There was a problem hiding this comment.
This shallow fetch depth of 100 commits may not be sufficient for long-lived PRs or repositories with many commits. If the BASE_SHA is not within the last 100 commits, git log will fail silently (due to || true), and no checkpoint trailers will be found. Consider using a deeper fetch depth or fetching without depth limitation for the merge range. Alternatively, document this limitation in the action description.
| env: | ||
| GH_TOKEN: ${{ inputs.token }} | ||
| REPO: ${{ github.repository }} | ||
| FORK_URL: ${{ github.event.pull_request.head.repo.clone_url }} |
There was a problem hiding this comment.
The script accepts FORK_URL from untrusted input (github.event.pull_request.head.repo.clone_url). While there's validation that IS_FORK is true, malicious fork URLs could potentially be crafted to exploit git operations. Consider adding validation to ensure the FORK_URL has an expected format (e.g., matches https://github.com/* pattern) before using it in git fetch commands to prevent potential security issues.
| git config user.name "github-actions[bot]" | ||
| git config user.email "41898282+github-actions[bot]@users.noreply.github.com" |
There was a problem hiding this comment.
The email address uses the generic github-actions bot email. While this is functional, consider using a more specific identity that indicates this is an automated checkpoint sync operation. This would make it clearer in the git history who created these commits. For example: "github-actions[bot]" with a comment in the commit message indicating it's from the sync-fork-checkpoints action.
| git config user.name "github-actions[bot]" | |
| git config user.email "41898282+github-actions[bot]@users.noreply.github.com" | |
| git config user.name "sync-fork-checkpoints[bot]" | |
| git config user.email "sync-fork-checkpoints[bot]@users.noreply.github.com" |
| git rebase "origin/$BRANCH" || { | ||
| echo "Rebase failed, trying merge..." | ||
| git rebase --abort 2>/dev/null || true | ||
| git merge "origin/$BRANCH" --no-edit |
There was a problem hiding this comment.
The retry mechanism uses merge as a fallback after rebase fails, but it doesn't check if the merge itself was successful. If the merge fails, the script will continue to the next retry attempt without cleaning up the failed merge state. Add error handling after the merge command to ensure it succeeded: git merge "origin/$BRANCH" --no-edit || { echo "Merge failed"; git merge --abort 2>/dev/null || true; }
| git merge "origin/$BRANCH" --no-edit | |
| git merge "origin/$BRANCH" --no-edit || { echo "Merge failed"; git merge --abort 2>/dev/null || true; } |
| echo "::warning::Failed to push checkpoints after $MAX_RETRIES attempts" | ||
| exit 1 |
There was a problem hiding this comment.
The script initializes outputs at the start (lines 40-41) and updates them on successful sync (lines 174-175), but there's a logic issue: when the script exits early due to errors or no checkpoints found, the outputs remain at their default values (synced=false, imported_count=0). However, line 190 uses exit 1 for failure but doesn't update the outputs. This means consumers of this action can't distinguish between "no checkpoints to sync" (successful exit 0) and "push failed after retries" (exit 1) by checking the outputs alone. Consider setting an error output or ensuring the exit status is the primary indicator of failure.
| # Sync Entire session checkpoints from fork PRs. | ||
| # | ||
| # When a PR from a fork is merged, this workflow imports the checkpoint data | ||
| # (session transcripts, prompts, context) from the fork's entire/checkpoints/v1 |
There was a problem hiding this comment.
The comment mentions "entire/sessions/v1" but the actual branch name used throughout the code is "entire/checkpoints/v1". This is inconsistent with the PR description and could cause confusion. The correct branch name should be "entire/checkpoints/v1" as verified in the codebase.
| # Fallback: check original PR commits (handles squash merges that drop trailers) | ||
| if [ -z "$CHECKPOINT_IDS" ]; then | ||
| echo "No trailers in merge range. Checking original PR commits..." | ||
| git fetch "$FORK_URL" "$HEAD_SHA" --depth=50 2>/dev/null || true |
There was a problem hiding this comment.
The fallback logic for squash merges fetches from the fork using HEAD_SHA, but the depth of 50 commits may not be sufficient for long-lived PRs. If the BASE_SHA is not within the last 50 commits from HEAD_SHA, the git log range will be incorrect. Consider either using a deeper fetch or removing the depth limitation for this fallback case to ensure all commits in the PR are examined.
| git fetch "$FORK_URL" "$HEAD_SHA" --depth=50 2>/dev/null || true | |
| git fetch "$FORK_URL" "$HEAD_SHA" 2>/dev/null || true |
| pull_request_target: | ||
| types: [closed] |
There was a problem hiding this comment.
Using pull_request_target is necessary for write access but is a security consideration. The workflow correctly guards against running on non-fork PRs and only runs after merge, which limits the attack surface. However, be aware that this gives the workflow access to repository secrets and write permissions. The current implementation is safe because it only operates on git commit metadata (trailers) and the checkpoints branch, but any future modifications should be carefully reviewed for security implications.
| steps: | ||
| - uses: entireio/cli/.github/actions/sync-fork-checkpoints@main |
There was a problem hiding this comment.
The workflow is missing a required checkout step. Since the action being called is a local composite action (stored in .github/actions/sync-fork-checkpoints/), GitHub Actions needs to checkout the repository first to access the action definition. Add - uses: actions/checkout@v6 as the first step before calling the action. Without this, the workflow will fail because the action code won't be available.
This allows running an action on merge of a PR from a fork to then pull in the matching checkpoint commits from
entire/sessions/v1.Need to figure out permissions for it to work.