Add deterministic changelog-draft scripts (QUALITY-944) by warp-dev-github-integration[bot] · Pull Request #13282 · warpdotdev/warp

warp-dev-github-integration · 2026-07-01T18:06:50Z

Summary

Add three deterministic Python scripts (stdlib only) to the changelog-draft skill to replace free-form agent steps, making changelog generation faster and more reliable.

Changes

New scripts:

resolve_release_range.py (Step 1): determines the previous release cut given a release tag + channel
classify_pr.py (Step 6): two-pass deterministic classifier with mechanical exclusion rules (bots, CI/test/docs/tooling files, channel flag gates); enforces that mechanical excludes win over agent answers
assemble_changelog.py (Steps 7+8): assembles changelog-draft.md + changelog-draft.json; enforces accounting invariant

Updated SKILL.md files:

changelog-draft/SKILL.md: Steps 1/6/7+8 call the new scripts; Step 9 unchanged
classify-changelog-pr/SKILL.md: reframed as subjective-candidate guidance only

New tests (72 stdlib unittest tests):

test_resolve_release_range.py: temp-git fixtures + real-tag smoke test
test_classify_pr.py: all rule scenarios + enforcement test + subjective merge test
test_assemble_changelog.py: golden-file tests + accounting invariant + converter compatibility + py_compile

Review rework (cycle 1)

Addressed two issues raised in review:

1. Dead candidate dict in classify_one_pr (classify_pr.py):
classify_one_pr built a candidate dict then returned None, discarding it — while both callers (run_pass1 and run_pass2) duplicated the same dict-building logic. Fixed by returning candidate (tagged source='candidate') instead of None; callers now distinguish deterministic excludes vs. candidates via det.get('source') == 'deterministic', removing all duplicated code.

2. Community section synthesized PR URLs (assemble_changelog.py):
The Contributors sub-section synthesized URLs as https://github.com/warpdotdev/warp/pull/{pn} from the PR number, violating the spec's "no synthesized PR URLs" constraint. Fixed by storing {pr_number, url} dicts in ext_contributors and using the stored url field; when url is empty the PR reference appears as plain #N with no hyperlink.

Regression tests added for both fixes.

Verification

Unit tests: 72/72 pass — python3 -m unittest discover -s .agents/skills/changelog-draft/tests
py_compile: all scripts compile without errors ✓
./script/presubmit skipped: changes are purely Python scripts and SKILL.md files (no Rust source modified); narrower checks above cover the full scope.

Conversation: https://staging.warp.dev/conversation/ba0d7066-ac2f-44ce-a015-c4200ce07323

Add three new Python scripts (stdlib only) and tests to replace free-form agent steps in the changelog-draft skill: - resolve_release_range.py (Step 1): deterministically resolves the previous release cut given a release tag and channel. Validates tag format, channel match, and head tag existence; picks the greatest prior _00 cut with a different date-time prefix. - classify_pr.py (Step 6): two-pass deterministic classifier. Pass 1 applies mechanical exclusion rules (bot authors, CI/test/docs/tooling- only files, channel feature-flag gates) and emits candidates for agent subjective classification. Pass 2 validates and merges agent answers, enforcing that mechanical excludes always win over agent choices. - assemble_changelog.py (Steps 7+8 merged): assembles changelog-draft.md and changelog-draft.json from all intermediate artifacts. Enforces the accounting invariant (entries + skipped + needs_review == total PRs) and exits non-zero on duplicate PR numbers or missing buckets. Also: - Update changelog-draft/SKILL.md to call the three new scripts at Steps 1, 6, and 7+8; keep Step 9 convert_to_release_json.py unchanged. - Reframe classify-changelog-pr/SKILL.md as subjective-candidate guidance only; remove bot/CI/flag rules that are now script-enforced. - Add tests/: 68 stdlib unittest tests covering all spec validation criteria (temp-git fixtures, real-tag smoke test, rule coverage, enforcement test, subjective merge test, golden-file assembly tests, converter compatibility, py_compile). Co-Authored-By: Oz <oz-agent@warp.dev>

warp-dev-github-integration

Overview

This PR adds three deterministic Python scripts to replace free-form agent steps in the changelog-draft skill, along with 68 stdlib unittest tests and updated SKILL.md files. The overall design is solid, all tests pass, and CI is green.

All acceptance criteria from the spec are met:

resolve_release_range.py correctly encapsulates the Step 1 git-tag logic
classify_pr.py two-pass design enforces mechanical excludes over agent answers
assemble_changelog.py correctly merges Steps 7+8 with accounting invariant
classify-changelog-pr/SKILL.md properly reframed as subjective-candidate guidance
convert_to_release_json.py compatibility verified by tests

Two important code quality issues to fix before merge, plus a few nits.

Concerns

IMPORTANT — Dead candidate dict in classify_one_pr: The function builds a candidate dict (lines 324–338 of classify_pr.py) but always returns None. The variable is computed and immediately discarded. Both run_pass1 and run_pass2 then re-derive the same candidate data from scratch, duplicating the flag-detection and annotation logic. This is confusing because the function signature (dict | None) and docstring suggest it could return the candidate, but it never does. If a new candidate field is added to the classify_one_pr block, callers won't pick it up automatically. The fix is to either: (a) return the candidate dict when it needs agent judgment (changing callers to use the return value), or (b) drop the dead assignment and keep all candidate-building in run_pass1/run_pass2.

IMPORTANT — Community section synthesizes PR URLs from pr_number: The SKILL.md constraint says "If url is empty, omit the PR link from user-facing markdown rather than synthesizing one." The community contributor section in _build_markdown (around line 406 of assemble_changelog.py) hardcodes https://github.com/warpdotdev/warp/pull/{pn} from the PR number instead of using the url already stored on each entry. Since ext_contributors only stores PR numbers (not URLs), the URL must be reconstructed. In practice this always produces the correct URL for public warp PRs, but it violates the spec contract and would break if a contributor's PR URL were empty or non-standard. Fix: store (pr_number, url) pairs in ext_contributors so the community section can use the stored URL (and omit the link when empty).

Found: 0 critical, 2 important, 1 suggestion, 2 nits

Approve with nits — The implementation is functionally correct and the spec criteria are met. The IMPORTANT issues are code quality concerns that should be addressed before merge but don't affect the current correctness of the skill workflow.

warp-dev-github-integration · 2026-07-01T18:19:30Z

+    candidate: dict = {
+        "pr_number": pr_number,
+        "title": pr.get("title", ""),
+        "url": pr.get("url", ""),
+        "author": author,
+        "changed_files": changed_files,
+        "source_repo": pr.get("source_repo", ""),
+    }
+    if detected_flags:
+        candidate["detected_feature_flags"] = detected_flags
+    if uncertain_flag_touch:
+        candidate["uncertain_flag_touch"] = True
+    if author in unknown_bucket:
+        candidate["unknown_contributor"] = True
+
+    return None  # signal: this PR needs agent judgment


⚠️ [IMPORTANT] Dead candidate dict — computed but never returned or used.

The function builds candidate (and annotates it with flags/unknown-contributor info), then returns None. Both run_pass1 and run_pass2 re-derive the same data from scratch by re-running detect_feature_flags and re-checking the same conditions. This creates a maintenance hazard: if a new annotation field is added here, callers won't pick it up.

Two clean options:

Return the candidate: change the return type to dict | dict[str, object] and return the candidate; callers use it directly instead of rebuilding.

Drop the dead block: remove lines 324–343 from this function and keep all candidate-building solely in the callers.

Option 2 is simpler — the function's single responsibility is detecting mechanical excludes.

warp-dev-github-integration · 2026-07-01T18:19:30Z

+            for author, pr_numbers in sorted(ext_contributors.items()):
+                pr_links = ", ".join(
+                    f"[#{pn}](https://github.com/warpdotdev/warp/pull/{pn})"
+                    if pr_numbers
+                    else ""
+                    for pn in sorted(pr_numbers)
+                )
+                lines.append(f"- {github_profile_link(author)} — {pr_links}  ✨")


⚠️ [IMPORTANT] Community section synthesizes PR URLs from pr_number instead of using the stored url.

The SKILL.md contract ("If url is empty, omit the PR link … rather than synthesizing one") applies here too. The entries list already carries a url field; but ext_contributors only stores pr_number, so the URL must be reconstructed.

Suggested fix: store (pr_number, url) pairs in ext_contributors and use the stored URL when present (omitting the link if it's empty):

Suggested change

for author, pr_numbers in sorted(ext_contributors.items()):

pr_links = ", ".join(

f"[#{pn}](https://github.com/warpdotdev/warp/pull/{pn})"

if pr_numbers

else ""

for pn in sorted(pr_numbers)

)

lines.append(f"- {github_profile_link(author)} — {pr_links} ✨")

for author, pr_entries in sorted(ext_contributors.items()):

pr_links = ", ".join(

f"[#{pn}]({url})" if url else f"#{pn}"

for pn, url in sorted(pr_entries)

)

This also requires changing how ext_contributors is populated:

ext_contributors.setdefault(a, []) if pn not in (e[0] for e in ext_contributors[a]): ext_contributors[a].append((pn, entry.get("url") or ""))

warp-dev-github-integration · 2026-07-01T18:19:30Z

+            lines.append("")
+
+        if has_reporters:


🧹 [NIT] if pr_numbers is always True inside this generator — it's the outer list variable, which is guaranteed non-empty since we're iterating over sorted(pr_numbers). The else "" branch is dead code.

Suggested change

lines.append("")

if has_reporters:

f"[#{pn}](https://github.com/warpdotdev/warp/pull/{pn})"

for pn in sorted(pr_numbers)

warp-dev-github-integration · 2026-07-01T18:19:30Z

+    print(
+        f"assemble_changelog: {n_entries} entries, {n_skipped} skipped, "
+        f"{n_review} needs_review ({total} total PRs)"
+    )
+    print(f"  wrote: {md_path}")
+    print(f"  wrote: {json_path}")


💡 [SUGGESTION] Summary goes to sys.stdout but classify_pr.py routes its summary to sys.stderr when --output is used (keeping stdout clean for data). Since assemble_changelog.py always writes to files, the summary is a progress message and should go to sys.stderr for consistency.

Suggested change

print(

f"assemble_changelog: {n_entries} entries, {n_skipped} skipped, "

f"{n_review} needs_review ({total} total PRs)"

)

print(f" wrote: {md_path}")

print(f" wrote: {json_path}")

n_entries = len(draft_dict.get("entries", []))

n_skipped = len(draft_dict.get("skipped", []))

n_review = len(draft_dict.get("needs_review", []))

total = n_entries + n_skipped + n_review

print(

f"assemble_changelog: {n_entries} entries, {n_skipped} skipped, "

f"{n_review} needs_review ({total} total PRs)",

file=sys.stderr,

)

print(f" wrote: {md_path}", file=sys.stderr)

print(f" wrote: {json_path}", file=sys.stderr)

warp-dev-github-integration · 2026-07-01T18:19:30Z

+def _run(args: list[str], repo_dir: str | None = None) -> subprocess.CompletedProcess:
+    """Run the resolve_release_range.py script with given args."""
+    cmd = [sys.executable, _SCRIPT] + args
+    return subprocess.run(cmd, capture_output=True, text=True)


🧹 [NIT] The repo_dir parameter is accepted but never used — all callers embed --repo-dir inside the args list instead. Either remove the unused parameter or actually pass it through as cwd to subprocess.run.

…ITY-944) classify_pr.py: - classify_one_pr now returns the candidate dict (source='candidate') instead of None, so the value it built is actually used - run_pass1 and run_pass2 now use det.get('source') == 'deterministic' to distinguish excludes from candidates, removing duplicate candidate-building code that both callers had assemble_changelog.py: - ext_contributors now stores {pr_number, url} dicts instead of bare ints - Community > Contributors section uses the stored url field; omits the hyperlink (shows plain #N) when url is empty, per the spec's 'no synthesized PR URLs' constraint Tests: - TestClassifyOnePrReturnValue: verify classify_one_pr returns a non-None dict with source='candidate' and expected fields for agent-required PRs - TestCommunityContributorUrls: verify stored URL is used in Contributors section, and that no synthesized URL appears when url is empty Co-Authored-By: Oz <oz-agent@warp.dev>

warp-dev-github-integration

Overview

This PR adds three deterministic Python scripts (resolve_release_range.py, classify_pr.py, assemble_changelog.py) plus a comprehensive test suite (72 tests) to the changelog-draft skill, replacing the ad-hoc in-skill logic with reproducible, independently testable steps.

Prior Issues — Confirmed Resolved

Issue 1 — classify_one_pr returning None instead of the candidate dict: Fixed. classify_one_pr now builds the candidate dict and returns it with source='candidate' (lines 329–345 of classify_pr.py). The duplicated dict-building code in the callers (run_pass1, run_pass2) has been removed; both now branch on det.get("source") == "deterministic" and route accordingly. Regression test TestClassifyOnePrReturnValue.test_candidate_pr_returns_dict_not_none directly validates this fix.

Issue 2 — Community section synthesizing PR URLs from pr_number: Fixed. The Community > Contributors section now uses e['url'] from the stored entry and falls back to plain #N (no hyperlink) when the URL is empty (lines 406–409 of assemble_changelog.py). Regression tests TestCommunityContributorUrls.test_community_section_uses_stored_url and test_community_section_omits_hyperlink_when_url_empty confirm both cases.

Test Results

72/72 tests pass.

Full Rubric Review

Correctness: ✅ Both prior bugs are fixed. run_pass2 correctly re-runs deterministic checks and reads flag annotations from the returned candidate dict. The seen dict in assemble correctly enforces the one-bucket invariant.

Standards/style: ✅ Clean module-level constants, good docstrings, consistent helper factoring.

Complexity/clarity: ✅ Two-pass design is well-structured and documented.

Naming: ✅ Function and variable names are consistent and descriptive.

Comments/docs: Minor only — the classify_pr.py module-level docstring summary schema shows included and needs_review fields (pass 2 only) and omits skipped_explicit_entries (present in both passes). No functional impact.

Tests: ✅ Regression tests for both prior issues are correctly isolated and clearly assertive. _make_pr, _pass1, _pass2, _assemble helpers reduce duplication. TestPySyntax.test_py_compile_all_scripts provides a fast syntax gate on all scripts.

Security: ✅ subprocess.run calls use list arguments (no shell=True) and handle non-zero exits explicitly.

Minor Nit (non-blocking)

The classify_pr.py module-level docstring summary schema (lines 64–70) includes included and needs_review (pass 2 fields) but omits skipped_explicit_entries (present in both passes). The runtime behavior is correct; this is a documentation-only inconsistency. Consider updating the docstring to reflect the actual per-pass schema.

Concerns

Found: 0 critical, 0 important, 1 nit

Verdict

Both prior blocking issues are confirmed resolved with regression tests. All 72 tests pass. The implementation matches the approved spec.

Approve

cla-bot Bot added the cla-signed label Jul 1, 2026

warp-dev-github-integration Bot commented Jul 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add deterministic changelog-draft scripts (QUALITY-944)#13282

Add deterministic changelog-draft scripts (QUALITY-944)#13282
warp-dev-github-integration[bot] wants to merge 2 commits into
masterfrom
oz-agent/QUALITY-944-deterministic-changelog-scripts

warp-dev-github-integration Bot commented Jul 1, 2026 •

edited

Loading

Uh oh!

warp-dev-github-integration Bot left a comment

Uh oh!

warp-dev-github-integration Bot Jul 1, 2026

Uh oh!

warp-dev-github-integration Bot Jul 1, 2026

Uh oh!

warp-dev-github-integration Bot Jul 1, 2026

Uh oh!

warp-dev-github-integration Bot Jul 1, 2026

Uh oh!

warp-dev-github-integration Bot Jul 1, 2026

Uh oh!

warp-dev-github-integration Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-    print(
-        f"assemble_changelog: {n_entries} entries, {n_skipped} skipped, "
-        f"{n_review} needs_review ({total} total PRs)"
-    )
-    print(f"  wrote: {md_path}")
-    print(f"  wrote: {json_path}")
+    n_entries = len(draft_dict.get("entries", []))
+    n_skipped = len(draft_dict.get("skipped", []))
+    n_review = len(draft_dict.get("needs_review", []))
+    total = n_entries + n_skipped + n_review
+    print(
+        f"assemble_changelog: {n_entries} entries, {n_skipped} skipped, "
+        f"{n_review} needs_review ({total} total PRs)",
+        file=sys.stderr,
+    )
+    print(f"  wrote: {md_path}", file=sys.stderr)
+    print(f"  wrote: {json_path}", file=sys.stderr)

Uh oh!

Conversation

warp-dev-github-integration Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Review rework (cycle 1)

Verification

Uh oh!

warp-dev-github-integration Bot left a comment

Choose a reason for hiding this comment

Overview

Concerns

Found: 0 critical, 2 important, 1 suggestion, 2 nits

Uh oh!

warp-dev-github-integration Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

warp-dev-github-integration Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

warp-dev-github-integration Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

warp-dev-github-integration Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

warp-dev-github-integration Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

warp-dev-github-integration Bot left a comment

Choose a reason for hiding this comment

Overview

Prior Issues — Confirmed Resolved

Test Results

Full Rubric Review

Minor Nit (non-blocking)

Concerns

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

warp-dev-github-integration Bot commented Jul 1, 2026 •

edited

Loading