Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions .github/prompts/adversarial-review.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Bash AST adversarial review

You are an adversarial reviewer for `bash-ast`, a Rust CLI/library that uses GNU Bash's real parser through FFI to parse shell scripts into JSON AST and convert JSON AST back to bash.

Your job is to add value beyond ordinary CI. Do not simply rerun the full test suite as your main contribution; the workflow has already captured baseline build/test logs for you. Instead, inspect the repository and the supplied context, identify parser behaviors worth challenging, and run a small number of targeted probes.

## What to inspect first

- `README.md`, `Cargo.toml`, `src/`, and relevant tests under `tests/`.
- `review-artifacts/pr-context.json` if present.
- `review-artifacts/base-diff.stat` and `review-artifacts/base-diff.patch` if present.
- `review-artifacts/build.log`, `review-artifacts/baseline-tests.log`, and status files if present.
- `review-artifacts/baseline-test-inventory.md` and `review-artifacts/baseline-test-list.txt` for the automated tests that were already enumerated after the baseline test run.

Before planning probes, inspect the automated test inventory so you do not duplicate existing coverage or claim a gap that is already covered by a listed test. If this run is associated with a PR, extract 2-4 concrete, testable claims from the PR title/body/diff before running probes. If there is no PR context, pick high-risk parser/round-trip behaviors from the current checkout.

## Probe guidance

Prefer edge cases involving one or more of:

- nested quotes and escaped newlines;
- command substitution and arithmetic expansion;
- heredocs and here-strings;
- process substitution;
- pipelines, negated pipelines, and lists;
- arrays and parameter expansion;
- case/select/for/while/function syntax;
- malformed syntax and graceful error handling;
- parse-to-JSON then `--to-bash` round trips.

For each probe:

1. Create temporary scripts/data only under `/tmp` or `review-artifacts/agent-probes/`.
2. Use the repository's actual binary/library/test harness whenever practical. The built CLI is usually `target/debug/bash-ast` after `cargo build`.
3. Capture concise evidence. If output is long, write full logs to `review-artifacts/agent-probes/` and summarize the relevant lines.
4. Decide whether the observed behavior supports or refutes the hypothesis.

## Constraints

- Do not modify repository source, tests, manifests, lockfiles, generated snapshots, or submodules.
- Do not install arbitrary dependencies.
- Do not run broad/unbounded commands that dump huge files or recursive listings.
- Do not use network access except GitHub context already provided by the workflow.
- Keep shell commands and outputs in the final response compact.
- In each `unitTestRecommendation`, distinguish between existing automated coverage you saw in the test inventory and any new coverage you believe should be added.
- If setup/build failures prevent runtime probes, perform source-level inspection and report `INVESTIGATE` with the best concrete blocker evidence.

## Required final response format

Return a concise human-readable review followed by a machine-readable JSON block between exact markers:

`JSON_RESULT_START`

```json
{
"recommendation": "PASS|FAIL|INVESTIGATE",
"why": "One or two sentences explaining the recommendation and highest risk.",
"tests": [
{
"title": "Short name",
"hypothesis": "What behavior was being tested",
"impact": "Why this matters if wrong",
"command": "Short command summary, not a giant script",
"output": "Concise observed output or pointer to artifact path",
"result": "PASS|FAIL",
"unitTestRecommendation": "What automated coverage should be added or why existing coverage is enough"
}
],
"finalMessage": "Brief operator-facing summary"
}
```

`JSON_RESULT_END`

Rules for the JSON block:

- `recommendation` must be exactly `PASS`, `FAIL`, or `INVESTIGATE`.
- `tests` must contain at least one substantive probe or one clearly labeled blocker probe.
- Every test object must have non-empty string fields: `title`, `hypothesis`, `impact`, `command`, `output`, `result`, and `unitTestRecommendation`.
- Per-test `result` must be exactly `PASS` or `FAIL`.
8 changes: 8 additions & 0 deletions .github/scripts/adversarial-review/build-baseline-binary.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env bash
set +e

mkdir -p review-artifacts
cargo build --verbose 2>&1 | tee review-artifacts/build.log
status=${PIPESTATUS[0]}
echo "$status" > review-artifacts/build-status.txt
exit "$status"
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/usr/bin/env bash
set -euxo pipefail

gh pr checkout "$REQUESTED_PR"
git submodule update --init --recursive
echo "HEAD_SHA=$(git rev-parse HEAD)" >> "$GITHUB_ENV"
echo "BASE_REF=$(gh pr view "$REQUESTED_PR" --json baseRefName --jq .baseRefName)" >> "$GITHUB_ENV"
26 changes: 26 additions & 0 deletions .github/scripts/adversarial-review/collect-review-context.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/usr/bin/env bash
set -euxo pipefail

mkdir -p review-artifacts/agent-probes
git status --short > review-artifacts/git-status.txt
git log --oneline -n 20 > review-artifacts/recent-commits.txt
cargo metadata --no-deps --format-version 1 > review-artifacts/cargo-metadata.json || true

git fetch origin "$BASE_REF" --depth=1 || true
if git rev-parse --verify "origin/$BASE_REF" >/dev/null 2>&1; then
git diff --stat "origin/$BASE_REF...HEAD" > review-artifacts/base-diff.stat || true
git diff --find-renames "origin/$BASE_REF...HEAD" > review-artifacts/base-diff.patch || true
else
: > review-artifacts/base-diff.stat
: > review-artifacts/base-diff.patch
fi

if [ -n "$PR_NUMBER" ]; then
gh pr view "$PR_NUMBER" \
--json number,title,author,body,baseRefName,headRefName,headRefOid,url,files,comments \
> review-artifacts/pr-context.json || echo '{}' > review-artifacts/pr-context.json
gh pr diff "$PR_NUMBER" > review-artifacts/pr.diff || true
else
echo '{}' > review-artifacts/pr-context.json
: > review-artifacts/pr.diff
fi
36 changes: 36 additions & 0 deletions .github/scripts/adversarial-review/collect-test-inventory.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/usr/bin/env bash
set +e

mkdir -p review-artifacts
cargo test -- --list 2>&1 | tee review-artifacts/baseline-test-list.txt
status=${PIPESTATUS[0]}
echo "$status" > review-artifacts/baseline-test-list-status.txt
python3 - <<'PY'
from pathlib import Path

text = Path('review-artifacts/baseline-test-list.txt').read_text(encoding='utf-8', errors='replace')
tests = sorted({line.strip()[:-len(': test')] for line in text.splitlines() if line.strip().endswith(': test')})
benches = sorted({line.strip()[:-len(': benchmark')] for line in text.splitlines() if line.strip().endswith(': benchmark')})
status = Path('review-artifacts/baseline-test-list-status.txt').read_text(encoding='utf-8').strip()
lines = [
'# Automated tests already enumerated',
'',
f'- Test inventory exit code: `{status}`',
f'- Enumerated test count: `{len(tests)}`',
f'- Enumerated benchmark count: `{len(benches)}`',
'',
'The baseline workflow already ran `cargo test --verbose -- --test-threads=1` before this inventory was collected.',
'Use this inventory to avoid duplicating existing automated coverage in adversarial probes and recommendations.',
'',
'## Test names',
]
max_names = 250
lines.extend(f'- `{name}`' for name in tests[:max_names])
if len(tests) > max_names:
lines.append(f'- ... truncated {len(tests) - max_names} additional tests; see `baseline-test-list.txt` for the full list.')
if benches:
lines.extend(['', '## Benchmark names'])
lines.extend(f'- `{name}`' for name in benches[:50])
Path('review-artifacts/baseline-test-inventory.md').write_text('\n'.join(lines) + '\n', encoding='utf-8')
PY
exit 0
17 changes: 17 additions & 0 deletions .github/scripts/adversarial-review/compose-retry-review-prompt.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/usr/bin/env bash
set -euo pipefail

delimiter="RETRY_REVIEW_PROMPT_$(date +%s)_$$"
{
cat review-artifacts/review-prompt.md
echo
echo "## Retry instruction"
echo
echo "The previous adversarial-review attempt did not produce a valid structured result. Reason: ${RETRY_REASON:-unknown}."
echo "Retry the review now. You must end with exact line markers JSON_RESULT_START and JSON_RESULT_END, with one valid JSON object between them and no nested marker text inside JSON strings. The JSON must satisfy the required schema, including a non-empty tests array."
} > review-artifacts/review-prompt-retry.md
{
echo "prompt<<$delimiter"
cat review-artifacts/review-prompt-retry.md
echo "$delimiter"
} >> "$GITHUB_OUTPUT"
30 changes: 30 additions & 0 deletions .github/scripts/adversarial-review/compose-review-prompt.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/usr/bin/env bash
set -euo pipefail

delimiter="REVIEW_PROMPT_$(date +%s)_$$"
{
cat .github/prompts/adversarial-review.md
echo
echo "## Workflow-provided context"
echo
echo "- Repository: $GITHUB_REPOSITORY"
echo "- Event: $GITHUB_EVENT_NAME"
echo "- PR number: ${PR_NUMBER:-none}"
echo "- Base ref: ${BASE_REF:-unknown}"
echo "- Head SHA: ${HEAD_SHA:-unknown}"
echo "- Build exit code: $(cat review-artifacts/build-status.txt 2>/dev/null || echo unknown)"
echo "- Baseline test exit code: $(cat review-artifacts/baseline-test-status.txt 2>/dev/null || echo unknown)"
echo "- Baseline test inventory exit code: $(cat review-artifacts/baseline-test-list-status.txt 2>/dev/null || echo unknown)"
echo
echo "Artifacts are available under ./review-artifacts/. Keep any additional probe artifacts under ./review-artifacts/agent-probes/."
if [ -f review-artifacts/baseline-test-inventory.md ]; then
echo
echo "## Automated tests already run/enumerated"
cat review-artifacts/baseline-test-inventory.md
fi
} > review-artifacts/review-prompt.md
{
echo "prompt<<$delimiter"
cat review-artifacts/review-prompt.md
echo "$delimiter"
} >> "$GITHUB_OUTPUT"
36 changes: 36 additions & 0 deletions .github/scripts/adversarial-review/enforce-recommendation.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/usr/bin/env bash
set -euo pipefail

python3 - <<'PY'
import importlib.util
import sys
from pathlib import Path

module_path = Path('.github/scripts/render-adversarial-review-summary.py')
spec = importlib.util.spec_from_file_location('review_summary', module_path)
if spec is None or spec.loader is None:
print(f'::error::Could not load review summary parser from {module_path}')
sys.exit(1)

review_summary = importlib.util.module_from_spec(spec)
spec.loader.exec_module(review_summary)

response_path = Path('review-artifacts/agent-response.md')
response = response_path.read_text(encoding='utf-8', errors='replace') if response_path.exists() else ''
review, warning = review_summary.extract_json_blob(response)
if review is None:
print(f'::error::Adversarial review did not produce a valid structured recommendation: {warning}')
sys.exit(1)

validation_errors = review_summary.validate_review(review)
if validation_errors:
for error in validation_errors:
print(f'::error::Invalid adversarial review result: {error}')
sys.exit(1)

recommendation = str(review.get('recommendation', '')).strip()
print(f'Adversarial review recommendation: {recommendation}')
if recommendation != 'PASS':
print(f'::error::Adversarial review recommendation is {recommendation}; failing the check so the PR is not mergeable as-is.')
sys.exit(1)
PY
5 changes: 5 additions & 0 deletions .github/scripts/adversarial-review/install-os-dependencies.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/usr/bin/env bash
set -euxo pipefail

sudo apt-get update
sudo apt-get install -y libncurses-dev
26 changes: 26 additions & 0 deletions .github/scripts/adversarial-review/persist-agent-response.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/usr/bin/env bash
set -euo pipefail

mkdir -p review-artifacts
python3 - <<'PY'
import json
import os
from pathlib import Path

use_retry = os.environ.get('INITIAL_VALID') != 'true'
response = os.environ.get('RETRY_RESPONSE' if use_retry else 'INITIAL_RESPONSE', '')
success = os.environ.get('RETRY_SUCCESS' if use_retry else 'INITIAL_SUCCESS', '')
share_url = os.environ.get('RETRY_SHARE_URL' if use_retry else 'INITIAL_SHARE_URL', '')
Path('review-artifacts/agent-response.md').write_text(response, encoding='utf-8')
if use_retry:
Path('review-artifacts/agent-response-retry.md').write_text(os.environ.get('RETRY_RESPONSE', ''), encoding='utf-8')
Path('review-artifacts/agent-action-metadata.json').write_text(
json.dumps({
'selected_attempt': 'retry' if use_retry else 'initial',
'initial_valid': os.environ.get('INITIAL_VALID', ''),
'success': success,
'share_url': share_url,
}, indent=2) + '\n',
encoding='utf-8',
)
PY
26 changes: 26 additions & 0 deletions .github/scripts/adversarial-review/post-sticky-pr-comment.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/usr/bin/env bash
set -euo pipefail

post_comment_input=$(jq -r '.inputs.post_comment // "false"' "$GITHUB_EVENT_PATH")
if [ "${ADVERSARIAL_REVIEW_POST_COMMENTS:-}" != "true" ] && [ "$post_comment_input" != "true" ]; then
echo "Sticky PR comment disabled; set ADVERSARIAL_REVIEW_POST_COMMENTS=true or workflow_dispatch post_comment=true to enable."
exit 0
fi

marker='<!-- adversarial-review:bash-ast -->'
body_file=$(mktemp)
{
echo "$marker"
echo "<!-- head_sha: ${HEAD_SHA:-unknown}; run_id: $GITHUB_RUN_ID; run_attempt: $GITHUB_RUN_ATTEMPT -->"
echo
cat review-artifacts/adversarial-review-summary.md
} > "$body_file"

comment_id=$(gh api "repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments" --paginate \
--jq ".[] | select(.body | contains(\"$marker\")) | .id" | tail -n 1)

if [ -n "$comment_id" ]; then
gh api -X PATCH "repos/$GITHUB_REPOSITORY/issues/comments/$comment_id" -F "body=@$body_file" >/dev/null
else
gh pr comment "$PR_NUMBER" --body-file "$body_file"
fi
11 changes: 11 additions & 0 deletions .github/scripts/adversarial-review/render-review-summary.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/usr/bin/env bash
set -euxo pipefail

python3 .github/scripts/render-adversarial-review-summary.py \
--response review-artifacts/agent-response.md \
--build-log review-artifacts/build.log \
--baseline-log review-artifacts/baseline-tests.log \
--build-status review-artifacts/build-status.txt \
--baseline-status review-artifacts/baseline-test-status.txt \
--output review-artifacts/adversarial-review-summary.md
cat review-artifacts/adversarial-review-summary.md >> "$GITHUB_STEP_SUMMARY"
8 changes: 8 additions & 0 deletions .github/scripts/adversarial-review/run-baseline-tests.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env bash
set +e

mkdir -p review-artifacts
cargo test --verbose -- --test-threads=1 2>&1 | tee review-artifacts/baseline-tests.log
status=${PIPESTATUS[0]}
echo "$status" > review-artifacts/baseline-test-status.txt
exit "$status"
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/usr/bin/env bash
set -euo pipefail

mkdir -p review-artifacts
python3 - <<'PY' >> "$GITHUB_OUTPUT"
import importlib.util
import os
from pathlib import Path

Path('review-artifacts/agent-response-initial.md').write_text(os.environ.get('AGENT_RESPONSE', ''), encoding='utf-8')
spec = importlib.util.spec_from_file_location('review_summary', '.github/scripts/render-adversarial-review-summary.py')
review_summary = importlib.util.module_from_spec(spec)
spec.loader.exec_module(review_summary)
review, warning = review_summary.extract_json_blob(os.environ.get('AGENT_RESPONSE', ''))
errors = []
if review is None:
errors.append(warning or 'missing structured review result')
else:
errors.extend(review_summary.validate_review(review))
if errors:
print('valid=false')
print(f"reason={' ; '.join(errors)}")
else:
print('valid=true')
print('reason=valid structured review result')
PY
Loading
Loading