diff --git a/.gemini/commands/gemini-investigate.toml b/.gemini/commands/gemini-investigate.toml new file mode 100644 index 0000000000..2b76d53a81 --- /dev/null +++ b/.gemini/commands/gemini-investigate.toml @@ -0,0 +1,77 @@ +description = "Automatically investigates and diagnoses CI test failures" +prompt = """ +You are a world-class autonomous software diagnostics agent. Your purpose is to analyze failed CI/CD runs, pinpoint the root cause in the codebase, and write a highly actionable diagnostic comment. + +## Context Available: +- **Failed Log Excerpt:** Available in `.gemini/failed_logs.txt`. Use `cat` to view it. +- **Pull Request Ref / Diff:** Use the `pull_request_read` tool or explore files using shell tools. + +## Systematic Diagnostics Protocol: + +*Optimizing Investigation Efficiency:* Perform cheap, lightweight actions first (reading local log files, searching git history, checking previous issues/diffs) before initiating deep analysis of codebase modules or downloading heavy build artifacts. + +1. **Read & Parse failed_logs.txt:** + - Locate the failing test functions, classes, or scripts. + - Extract the exact error messages and tracebacks. + - **Group and Compare Failures:** If a run has multiple failures, determine if they are all part of a single cascade (sharing the same root cause) or if different independent root causes are at play. Focus deep-dive analysis on the most recent/representative failure, but explicitly note if multiple distinct failure modes were found. + - Limit detailed trace extraction to up to 3 representative examples to avoid cluttering the final report. + +2. **Explore Related Issues / Previous Runs (Cheap Search):** + - Check if this is a recurring or known flake by searching recent issues, discussions, or git history for similar error messages or failing test names. + - If a similar error has been encountered before, reference those occurrences or prior investigation outcomes. + +3. **Locate the Failing Component:** + - Search the codebase using `search_code` or look up the files where the failing tests or code reside. + +4. **Analyze Changes & Identify Culprits:** + - **PR runs:** Compare the failure traces with the recent code additions/deletions in the PR. Identify if the failures are due to syntax, logic, sharding changes, parameter mismatches, environment configuration, or infrastructure issues. + - *Fallback check:* If no clear link is found between the failure and the PR changes (low confidence), check the git log/blame of the failing component on `main`. Try to identify if a recent upstream PR or commit merged to the base branch (`main`) might be the actual culprit. + - **Scheduled runs (on main):** If investigating a scheduled failure on the `main` branch, inspect the git log history (e.g., `git log`, `git blame`) of the failing component/test file. Identify recent merges or commits that modified relevant paths, and try to identify the specific 'culprit PR' or commit that likely introduced the failure. + +5. **Calibrate Tone and Confidence:** + - State your confidence level: **low**, **moderate**, or **high**. + - **Codebase vs. Infrastructure Distinction:** Explicitly distinguish whether you believe the failure is a codebase regression (e.g., logical bugs, syntax, API mismatches) or an infrastructure/environment flake (e.g., TPU provisioning failures, GCS timeout errors, CUDA out-of-memory or driver issues). + - Default to "possible cause" or "hypothesis" language. + - Upgrade to "likely cause" only when multiple independent pieces of evidence converge (e.g., a suspicious commit + matching error signature + timing correlation). + - Use "confirmed cause" only when evidence is unambiguous. + - If inconclusive, say so. Partial findings and ruling things out is still valuable. Avoid assertive phrasing like "the root cause is" unless genuinely certain. + +6. **Formulate the Diagnostics Report:** + - Write a clean, professional, and precise markdown report matching the template below. Do not be overly wordy; get straight to the facts. + - **Save the Report**: You MUST write and save this formulated markdown report to `.gemini/findings.md`. + - **Keep it Concise:** If there are many failing tests due to the same error or infra issue, mention that a cascade occurred, list up to 3 representative examples, and explain the single root cause instead of repeating sections. + +## Report Template: + +```markdown +### 🤖 CI Failure Investigation Report + +I have analyzed the recent test failures in the CI pipeline and identified the following: + +#### 🔍 What Failed +*(If there are many failures, group them by root cause and list only up to 3 representative example test cases)* +* **Job/Matrix**: `Matrix-Flavor-Name` +* **Failing Test**: `test_filename.py::test_function_name` +* **Error**: `TypeError: ...` + +#### 🪵 Error Details & Stack Trace +```python +[Short stack trace snippet showing where the error(s) occurred] +``` + +#### 💡 Root Cause Analysis & Context +**Confidence:** [low / moderate / high] *(Calibrate based on whether this is a hypothesis, a likely cause, or a confirmed cause)* + +[Provide a clear explanation connecting the failure(s) to recent changes made in this PR, or to infrastructure issues. If you searched for previous occurrences or similar issues, summarize those findings here.] + +#### 🛠️ Recommended Fix *(Only include this section if Confidence is HIGH)* +[Provide the recommended code block diff(s) or specific file edit(s) to fix the issue(s).] +``` + +7. **Execute the Report:** + - **Determine Target Destination:** + - If the environment variable `PULL_REQUEST_NUMBER` is present and non-empty, post the report as a comment on that PR/issue using the `add_issue_comment` tool. + - If `PULL_REQUEST_NUMBER` is empty or not a valid number (such as in a scheduled CI run failure), use `gh issue list --state open` with the shell tool to locate the open failure notification issue for the "MaxText Package Tests" workflow. If found, post the report as a comment on that issue using the `gh issue comment --body-file .gemini/findings.md` command. + - If no target issue is found, verify that the findings are written to `.gemini/findings.md` so it is preserved in the runner's artifacts. + +""" diff --git a/.github/workflows/gemini-dispatch.yml b/.github/workflows/gemini-dispatch.yml index 091c947f28..f40df1e18d 100644 --- a/.github/workflows/gemini-dispatch.yml +++ b/.github/workflows/gemini-dispatch.yml @@ -9,6 +9,10 @@ on: pull_request_review: types: ['submitted'] + # Trigger when a comment is added to the main conversation of a PR/Issue + issue_comment: + types: ['created'] + # Trigger when any label is attached to the PR pull_request: types: ['labeled'] @@ -61,6 +65,7 @@ jobs: command: '${{ steps.extract_command.outputs.command }}' request: '${{ steps.extract_command.outputs.request }}' additional_context: '${{ steps.extract_command.outputs.additional_context }}' + failed_run_id: '${{ steps.extract_command.outputs.failed_run_id }}' issue_number: '${{ github.event.pull_request.number || github.event.issue.number }}' steps: - name: 'Mint identity token' @@ -92,8 +97,13 @@ jobs: core.setOutput('command', 'review'); } else if (request.startsWith("@gemini-cli /review")) { core.setOutput('command', 'review'); - const additionalContext = request.replace(/^@gemini-cli \/review/, '').trim(); - core.setOutput('additional_context', additionalContext); + core.setOutput('additional_context', ''); + } else if (request.startsWith("@gemini-cli /investigate")) { + core.setOutput('command', 'investigate'); + const parts = request.split(/\s+/); + const failedRunId = parts.length > 2 ? parts[2] : ''; + core.setOutput('failed_run_id', failedRunId); + core.setOutput('additional_context', ''); } else if (request.startsWith("@gemini-cli")) { const additionalContext = request.replace(/^@gemini-cli/, '').trim(); core.setOutput('command', 'invoke'); @@ -142,11 +152,28 @@ jobs: additional_context: '${{ needs.dispatch.outputs.additional_context }}' secrets: 'inherit' + investigate: + needs: 'dispatch' + if: |- + ${{ needs.dispatch.outputs.command == 'investigate' }} + uses: './.github/workflows/gemini-investigate.yml' + permissions: + contents: 'read' + id-token: 'write' + issues: 'write' + pull-requests: 'write' + actions: 'read' + with: + additional_context: '${{ needs.dispatch.outputs.additional_context }}' + failed_run_id: '${{ needs.dispatch.outputs.failed_run_id }}' + secrets: 'inherit' + fallthrough: needs: - 'dispatch' - 'review' - 'invoke' + - 'investigate' if: |- ${{ always() && !cancelled() && (failure() || needs.dispatch.outputs.command == 'fallthrough') }} runs-on: 'ubuntu-latest' diff --git a/.github/workflows/gemini-investigate.yml b/.github/workflows/gemini-investigate.yml new file mode 100644 index 0000000000..c28e0c28b2 --- /dev/null +++ b/.github/workflows/gemini-investigate.yml @@ -0,0 +1,119 @@ +name: 'Gemini Failure Investigator' + +on: + workflow_call: + inputs: + additional_context: + type: 'string' + required: false + failed_run_id: + type: 'string' + required: false + +permissions: + contents: 'read' + id-token: 'write' + issues: 'write' + pull-requests: 'write' + actions: 'read' # Required to fetch workflow logs + +jobs: + investigate: + runs-on: 'ubuntu-latest' + steps: + - name: 'Checkout repository' + uses: 'actions/checkout@v4' + with: + persist-credentials: 'false' + + - name: 'Gather failed logs' + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + RUN_ID: ${{ github.event.workflow_run.id || inputs.failed_run_id }} + REPO: ${{ github.repository }} + BRANCH: ${{ github.event.pull_request.head.ref }} + SHA: ${{ github.event.pull_request.head.sha }} + run: | + mkdir -p .gemini + + # Determine target run ID + if [ -z "$RUN_ID" ]; then + # Fallback to finding the latest failed run for this PR's specific commit + if [ -n "$SHA" ]; then + echo "Searching for failed runs for commit: $SHA" + RUN_ID=$(gh run list --workflow "MaxText Package Tests" --status failure --commit "$SHA" --limit 1 --json databaseId --jq '.[0].databaseId' --repo "$REPO") + fi + + # Fallback to branch if commit-specific run wasn't found + if [ -z "$RUN_ID" ] && [ -n "$BRANCH" ]; then + echo "Searching for failed runs on branch: $BRANCH" + RUN_ID=$(gh run list --workflow "MaxText Package Tests" --status failure --branch "$BRANCH" --limit 1 --json databaseId --jq '.[0].databaseId' --repo "$REPO") + fi + + # Global fallback + if [ -z "$RUN_ID" ]; then + echo "Searching for latest failed run across the repository" + RUN_ID=$(gh run list --workflow "MaxText Package Tests" --status failure --limit 1 --json databaseId --jq '.[0].databaseId' --repo "$REPO") + fi + fi + + echo "Gathering logs for failed run: $RUN_ID" + + if [ -n "$RUN_ID" ]; then + # Retrieve only the failing lines/jobs to avoid token limit overhead + gh run view "$RUN_ID" --log-failed --repo "$REPO" > .gemini/failed_logs.txt || true + else + echo "No failed runs found." > .gemini/failed_logs.txt + fi + + - name: 'Run Gemini Failure Investigator' + uses: 'google-github-actions/run-gemini-cli@v0' + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + REPOSITORY: ${{ github.repository }} + PULL_REQUEST_NUMBER: ${{ github.event.workflow_run.pull_requests[0].number || github.event.pull_request.number || github.event.issue.number }} + with: + gcp_location: '${{ vars.GOOGLE_CLOUD_LOCATION }}' + gcp_project_id: '${{ vars.GOOGLE_CLOUD_PROJECT }}' + gcp_service_account: '${{ vars.SERVICE_ACCOUNT_EMAIL }}' + gcp_workload_identity_provider: '${{ vars.GCP_WIF_PROVIDER }}' + gemini_api_key: '${{ secrets.GEMINI_API_KEY }}' + gemini_cli_version: '${{ vars.GEMINI_CLI_VERSION }}' + gemini_model: '${{ vars.GEMINI_MODEL }}' + workflow_name: 'gemini-investigate' + settings: |- + { + "model": { + "maxSessionTurns": 15 + }, + "mcpServers": { + "github": { + "command": "docker", + "args": [ + "run", + "-i", + "--rm", + "-e", + "GITHUB_PERSONAL_ACCESS_TOKEN", + "ghcr.io/github/github-mcp-server:v0.27.0" + ], + "includeTools": [ + "add_issue_comment", + "pull_request_read", + "search_code", + "get_file_contents", + "list_commits", + "get_commit" + ], + "env": { + "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}" + } + } + }, + "tools": { + "shell": { + "allowCommands": ["cat", "grep", "head", "tail", "gh", "git", "find"] + } + } + } + prompt: '/gemini-investigate' diff --git a/.gitignore b/.gitignore index 3b3d6d1b67..b67cba160d 100644 --- a/.gitignore +++ b/.gitignore @@ -150,6 +150,7 @@ dmypy.json # Gemini CLI .gemini/ +!.gemini/commands/ gha-creds-*.json # vscode workspace