diff --git a/.kilo/command/benchmarks-affected.md b/.kilo/command/benchmarks-affected.md new file mode 100644 index 0000000..158cb9e --- /dev/null +++ b/.kilo/command/benchmarks-affected.md @@ -0,0 +1,34 @@ +--- +description: Scan current branch and report impacted benchmark targets/functions. +--- + +# Benchmarks Affected + +Identify which benchmark binaries and benchmark functions are affected by changes on the current branch. + +Use the `benchmarks-affected` skill as the single source of truth for workflow details and guardrails. +Do not duplicate or override the skill instructions in this command. + +## Inputs + +- Optional `--baseline ` (default: `main`) +- Optional `--compile-commands ` +- Optional `--no-include-working-tree` +- Optional `--format ` (default: `text`) + +## Workflow + +1. Execute the `benchmarks-affected` skill workflow. +2. Pass through command inputs to the analyzer invocation defined by the skill. +3. Report results with these sections: + - Changed files + - Affected benchmark targets + - Affected benchmark functions + - Suggested `--benchmark_filter` regex + - Any warnings/failures + +## Output rules + +1. If `affected_benchmarks` is non-empty, prioritize those names. +2. If `affected_benchmarks` is empty but benchmark targets are affected, mark result as partial and include target-level impact. +3. Do not run full benchmark suites in this command; this command is for impact discovery only. diff --git a/.kilo/command/perf-review.md b/.kilo/command/perf-review.md index 3fff142..4b1ab95 100644 --- a/.kilo/command/perf-review.md +++ b/.kilo/command/perf-review.md @@ -19,119 +19,43 @@ Non-negotiable requirements: If arguments are omitted: - Default target branch to PR base branch from `gh pr view --json baseRefName` when available. - Fall back target branch to `main`. -- Default filter must be **targeted**, not full-suite: - - Derive from changed files and changed symbols. - - If `include/pixie/bitvector.h` changed in select path, default to `BM_Select` and add `BM_RankNonInterleaved` as control. - - Run full selected suites only as last resort when mapping fails. - -## Step 1 - Resolve Branches and Revisions - -1. Identify contender branch and hash: - - Contender branch: current checked-out branch (or `HEAD` if detached). - - Contender hash: `git rev-parse --short HEAD`. -2. Identify baseline branch: - - Use `--target` if provided. - - Else use PR base branch from GitHub CLI when available. - - Else use `main`. -3. Resolve baseline hash with `git rev-parse --short `. -4. Print branch and hash mapping before running benchmarks. - -## Step 2 - Select Relevant Benchmark Binaries - -Inspect changed files with: - -`git diff --name-only ...HEAD` - -Map file paths to benchmark binaries: - -| Changed path pattern | Benchmark binary | Coverage | -|---|---|---| -| `include/pixie/bitvector*`, `include/*bit_vector*`, `include/interleaved*` | `benchmarks` | BitVector rank/select | -| `include/rmm*` | `bench_rmm` | RmM tree operations | -| `include/louds*` | `louds_tree_benchmarks` | LOUDS traversal | -| `include/simd*`, `include/aligned*` | `alignment_comparison` | SIMD and alignment | -| `include/misc/*` | all relevant | Differential helpers | -| `CMakeLists.txt`, benchmark infra, broad/unknown changes | all benchmarks | Conservative full run | - -Available benchmark binaries: -- `benchmarks` -- `bench_rmm` -- `bench_rmm_sdsl` -- `louds_tree_benchmarks` -- `alignment_comparison` - -If the mapping is ambiguous, run all benchmark binaries but still apply a focused filter first. -If `--filter` is provided, pass it through as `--benchmark_filter`. -Print selected binaries and why they were selected. + +Filter handling: +- If `--filter` is provided, pass it through. +- Else use the filter produced by `benchmarks-affected` through `benchmarks-compare-revisions`. +- If no filter can be derived, run conservative full-binary compare for impacted binaries. + +## Step 1 - Resolve branches and hashes + +1. Resolve contender from current checkout (`HEAD`) and compute short hash. +2. Resolve baseline branch using precedence: `--target` -> PR base from `gh pr view --json baseRefName` -> `main`. +3. Resolve baseline short hash. +4. Print branch/hash mapping before benchmark execution. + +## Step 2 - Run timing comparison via skill (single source of truth) + +Use `benchmarks-compare-revisions` as the single source of truth for revision builds, benchmark scope, compare.py flow, retry policy, and guardrails. + +Do not duplicate or override its internal build/run steps in this command. + +Pass-through inputs: +- Baseline ref/hash from Step 1. +- Contender ref/hash from Step 1. +- Optional `--filter` override. + +Consume outputs from `benchmarks-compare-revisions`: +- Baseline and contender benchmark JSON artifacts. +- compare.py output per binary. +- Effective filter used. +- Scope metadata from `benchmarks-affected` (`affected_benchmark_targets`, `affected_benchmarks`) when available. Execution guardrails: -- Do not use background jobs (`nohup`, `&`) for benchmark runs in CI. -- Do not interleave multiple benchmark runs into one shell command stream. -- Run one benchmark command at a time and wait for completion. - -## Step 3 - Build Both Revisions (Timing and Profiling Builds) - -Use isolated build directories per short hash. - -1. Capture original ref (`git rev-parse --abbrev-ref HEAD` or detached `HEAD`). -2. If worktree is dirty, stash safely with untracked files: - - `git stash push -u -m "perf-review-auto-stash"` -3. Build baseline revision: - - `git checkout ` - - Timing build (required): - - `cmake -B build/benchmarks-all_bench_ -DCMAKE_BUILD_TYPE=Release -DPIXIE_BENCHMARKS=ON` - - `cmake --build build/benchmarks-all_bench_ --config Release -j` - - Profiling build (Linux only, recommended): - - `cmake -B build/benchmarks-diagnostic_bench_ -DCMAKE_BUILD_TYPE=RelWithDebInfo -DPIXIE_BENCHMARKS=ON -DBENCHMARK_ENABLE_LIBPFM=ON -DPIXIE_DIAGNOSTICS=ON` - - `cmake --build build/benchmarks-diagnostic_bench_ --config RelWithDebInfo -j` -4. Build contender revision: - - `git checkout ` - - Repeat timing and profiling build with contender hash suffix. -5. Restore original ref and restore stashed state if a stash was created. - -Critical guardrails: -- Never use Debug binaries for timing review. -- Timing comparisons must use `benchmarks-all` Release builds. -- Profiling counters should use `benchmarks-diagnostic` RelWithDebInfo builds. - -## Step 4 - Resolve Binary Paths - -Support both generator layouts: - -- Multi-config: `build//Release/` or `build//RelWithDebInfo/` -- Single-config: `build//` - -For each needed binary, detect the existing executable path before running. -If a required binary is missing, report failure and stop with a blocked verdict. - -## Step 5 - Run Timing Comparison (Primary Judgment) - -Use a deterministic JSON-first workflow. Do not rely on long-running `compare.py` binary-vs-binary mode. - -1. Verify Python benchmark tooling once before runs: - - `python3 -c "import numpy, scipy"` -2. For each selected benchmark binary, run baseline then contender sequentially, each with explicit JSON out: - - `--benchmark_filter=""` - - `--benchmark_format=json` - - `--benchmark_out=.json` - - `--benchmark_report_aggregates_only=true` - - `--benchmark_display_aggregates_only=true` -3. Suppress benchmark stdout/stderr noise when generating JSON artifacts so files stay valid: - - `> .log 2>&1` -4. Validate both JSON files before comparison: - - `python3 -m json.tool .json > /dev/null` -5. Compare using one of: - - `python3 -a benchmarks ` - - or a deterministic local Python diff script over aggregate means. -6. Keep raw JSON files and comparison output for auditability. - -Timeout and retry policy: -- Use command timeouts that match benchmark scope. -- If a run times out once, narrow filter immediately and retry once. -- Maximum retry count per benchmark group: 1. -- If still timing out, produce a blocked/partial verdict with explicit scope limitations. - -## Step 6 - Collect Hardware Counter Profiles (Linux Only) +- Run benchmarks sequentially. +- No background jobs (`nohup`, `&`). +- Use Release timing builds only. +- If timing comparison fails, return blocked verdict with exact failure points. + +## Step 3 - Collect hardware counter profiles (Linux only, optional) Run a preflight first to avoid wasted attempts: 1. Execute one tiny benchmark with perf counters (e.g. one benchmark case) and inspect output for counter availability. @@ -157,7 +81,7 @@ Compute derived metrics when denominators are non-zero: If profiling is unavailable (non-Linux, libpfm missing, or perf permissions blocked), continue with timing-only review and explicitly mark profiling as unavailable in the report. -## Step 7 - Analyze Timing and Counter Data +## Step 4 - Analyze timing and counter data Timing classification per benchmark entry: - Improvement: time delta < -5% @@ -181,7 +105,7 @@ Noise-control expectations: - Include at least one control benchmark family expected to be unaffected by the code change. - Treat isolated swings without pattern as noise unless reproduced across related sizes/fill ratios. -## Step 8 - Produce Final Markdown Report +## Step 5 - Produce final markdown report Return a structured markdown report with this shape: diff --git a/.kilo/skills/benchmarks-affected/SKILL.md b/.kilo/skills/benchmarks-affected/SKILL.md new file mode 100644 index 0000000..e886c92 --- /dev/null +++ b/.kilo/skills/benchmarks-affected/SKILL.md @@ -0,0 +1,77 @@ +--- +name: benchmarks-affected +description: Analyze current branch versus a baseline and extract affected benchmark targets and benchmark functions using compile_commands and clang AST. +--- + +# Benchmarks Affected Skill + +Use this skill to identify exactly which benchmark binaries and benchmark functions are affected by code changes on the current branch. + +It implements a two-stage workflow: + +1. `compile_commands.json` analysis to determine affected compile targets. +2. Clang AST analysis to determine affected benchmark functions. + +## Goal + +Given `HEAD` and a baseline branch (default `main`), produce: + +- Changed files. +- Affected targets (with emphasis on benchmark targets). +- Exact benchmark functions impacted by the changes. +- A ready-to-use Google Benchmark filter regex. + +## Prerequisites + +1. Build tree with benchmarks enabled and compile database exported: + +```bash +BUILD_SUFFIX=local +cmake -B build/benchmarks-all_${BUILD_SUFFIX} \ + -DCMAKE_BUILD_TYPE=Release \ + -DPIXIE_BENCHMARKS=ON \ + -DCMAKE_EXPORT_COMPILE_COMMANDS=ON +cmake --build build/benchmarks-all_${BUILD_SUFFIX} --config Release -j +``` + +2. `clang++` must be available on `PATH` (used for AST dump). + +## Run + +```bash +python3 .kilo/skills/benchmarks-affected/analyze_benchmarks_affected.py \ + --baseline main \ + --compile-commands build/benchmarks-all_local/compile_commands.json \ + --format json +``` + +If `--compile-commands` is omitted, the script auto-selects the most recently modified `build/**/compile_commands.json`. +Working tree changes are included by default. Use `--no-include-working-tree` to restrict analysis to `...HEAD` only. + +## Output + +The analyzer reports: + +- `affected_targets`: impacted CMake targets inferred from compile dependency analysis. +- `affected_benchmark_targets`: subset of benchmark binaries impacted. +- `affected_benchmarks`: precise benchmark function names from AST-level call analysis. +- `suggested_filter_regex`: regex to pass as `--benchmark_filter`. + +## How to Use Findings + +1. Build only impacted benchmark binaries where feasible. +2. Run benchmark binaries with the suggested filter: + +```bash +FILTER='^(BM_RankNonInterleaved|BM_SelectNonInterleaved)(/|$)' +build/benchmarks-all_local/benchmarks --benchmark_filter="${FILTER}" +``` + +3. If impact mapping is broad/uncertain, run full binary for selected benchmark target(s). + +## Guardrails + +1. Keep baseline comparison at merge-base style diff: `...HEAD`. +2. Use Release binaries for timing runs. +3. If AST parse fails for a TU, still trust compile target impact and mark benchmark-function scope as partial. +4. If benchmark infra (`CMakeLists.txt`, benchmark source layout) changed, fall back to conservative benchmark selection. diff --git a/.kilo/skills/benchmarks-affected/analyze_benchmarks_affected.py b/.kilo/skills/benchmarks-affected/analyze_benchmarks_affected.py new file mode 100644 index 0000000..1aa3b4a --- /dev/null +++ b/.kilo/skills/benchmarks-affected/analyze_benchmarks_affected.py @@ -0,0 +1,988 @@ +#!/usr/bin/env python3 + +from __future__ import annotations + +import argparse +import json +import re +import shlex +import shutil +import subprocess +import sys +from collections import defaultdict +from dataclasses import dataclass, field +from pathlib import Path +from typing import Any + + +KNOWN_BENCHMARK_TARGETS = { + "benchmarks", + "bench_rmm", + "bench_rmm_sdsl", + "louds_tree_benchmarks", + "alignment_comparison", +} + +BUILD_INFRA_FILES = { + "CMakeLists.txt", + "CMakePresets.json", +} + +DIFF_HUNK_RE = re.compile(r"^@@ -\d+(?:,\d+)? \+(\d+)(?:,(\d+))? @@") + +CPP_FUNCTION_START_RE = re.compile( + r"^\s*" + r"(?:template\s*<[^>]*>\s*)?" + r"(?:(?:inline|constexpr|consteval|constinit|static|friend|virtual|explicit)\s+)*" + r"[A-Za-z_~][\w:<>,\s\*&\[\]]*\s+" + r"([~A-Za-z_][A-Za-z0-9_]*)\s*" + r"\([^;{}]*\)\s*" + r"(?:const\s*)?" + r"(?:noexcept\s*)?" + r"(?:->\s*[^\{]+)?\{" +) + + +@dataclass +class CompileCommandEntry: + directory: Path + source: Path + arguments: list[str] + output: Path | None + target: str | None + dependencies: set[Path] = field(default_factory=set) + dep_error: str | None = None + + +@dataclass +class AstImpactResult: + benchmark_names: set[str] = field(default_factory=set) + affected_names: set[str] = field(default_factory=set) + ast_error: str | None = None + + +def run_command( + args: list[str], + cwd: Path, + check: bool = True, +) -> subprocess.CompletedProcess[str]: + return subprocess.run( + args, + cwd=str(cwd), + text=True, + capture_output=True, + check=check, + ) + + +def parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser( + description=( + "Analyze benchmark impact between baseline and HEAD via " + "compile_commands dependency mapping and clang AST analysis." + ) + ) + parser.add_argument( + "--baseline", + default="main", + help="Baseline ref used as ...HEAD (default: main).", + ) + parser.add_argument( + "--head", + default="HEAD", + help="Contender ref (default: HEAD).", + ) + parser.add_argument( + "--compile-commands", + default=None, + help=( + "Path to compile_commands.json. If omitted, auto-discovers most " + "recent build/**/compile_commands.json." + ), + ) + parser.add_argument( + "--clangxx", + default=None, + help="clang++ executable for AST dump (auto-detected if omitted).", + ) + parser.add_argument( + "--format", + choices=["text", "json"], + default="text", + help="Output format (default: text).", + ) + parser.add_argument( + "--include-working-tree", + dest="include_working_tree", + action="store_true", + default=True, + help=( + "Include local unstaged/staged changes in changed-files set, " + "in addition to ... (default: enabled)." + ), + ) + parser.add_argument( + "--no-include-working-tree", + dest="include_working_tree", + action="store_false", + help="Disable working-tree inclusion and only analyze ....", + ) + return parser.parse_args() + + +def git_repo_root() -> Path: + proc = run_command(["git", "rev-parse", "--show-toplevel"], cwd=Path.cwd()) + return Path(proc.stdout.strip()).resolve() + + +def resolve_compile_commands(repo_root: Path, explicit_path: str | None) -> Path: + if explicit_path: + compile_path = Path(explicit_path) + if not compile_path.is_absolute(): + compile_path = (repo_root / compile_path).resolve() + if not compile_path.exists(): + raise FileNotFoundError(f"compile_commands.json not found: {compile_path}") + return compile_path + + candidates = sorted( + repo_root.glob("build/**/compile_commands.json"), + key=lambda path: path.stat().st_mtime, + reverse=True, + ) + if not candidates: + raise FileNotFoundError( + "No compile_commands.json found under build/**. " + "Configure with -DCMAKE_EXPORT_COMPILE_COMMANDS=ON first." + ) + return candidates[0].resolve() + + +def load_compile_commands(compile_commands_path: Path) -> list[CompileCommandEntry]: + entries: list[CompileCommandEntry] = [] + data = json.loads(compile_commands_path.read_text(encoding="utf-8")) + for raw_entry in data: + directory = Path(raw_entry["directory"]).resolve() + + raw_source = Path(raw_entry["file"]) + if raw_source.is_absolute(): + source = raw_source.resolve() + else: + source = (directory / raw_source).resolve() + + if "arguments" in raw_entry: + arguments = [str(arg) for arg in raw_entry["arguments"]] + else: + arguments = shlex.split(raw_entry["command"]) + + output = infer_output_path(arguments, directory) + target = infer_cmake_target_from_output(output) + + entries.append( + CompileCommandEntry( + directory=directory, + source=source, + arguments=arguments, + output=output, + target=target, + ) + ) + return entries + + +def infer_output_path(arguments: list[str], directory: Path) -> Path | None: + output_token: str | None = None + for idx, arg in enumerate(arguments): + if arg == "-o" and idx + 1 < len(arguments): + output_token = arguments[idx + 1] + elif arg.startswith("-o") and len(arg) > 2: + output_token = arg[2:] + elif arg.startswith("/Fo") and len(arg) > 3: + output_token = arg[3:] + + if output_token is None: + return None + + out_path = Path(output_token) + if out_path.is_absolute(): + return out_path.resolve() + return (directory / out_path).resolve() + + +def infer_cmake_target_from_output(output: Path | None) -> str | None: + if output is None: + return None + parts = output.parts + for index, part in enumerate(parts): + if part == "CMakeFiles" and index + 1 < len(parts): + target_part = parts[index + 1] + if target_part.endswith(".dir"): + return target_part[: -len(".dir")] + return target_part + return None + + +def git_changed_files(repo_root: Path, baseline: str, head: str) -> set[Path]: + diff_range = f"{baseline}...{head}" + proc = run_command(["git", "diff", "--name-only", diff_range], cwd=repo_root) + changed_files: set[Path] = set() + for line in proc.stdout.splitlines(): + line = line.strip() + if not line: + continue + changed_files.add((repo_root / line).resolve()) + return changed_files + + +def git_working_tree_changed_files(repo_root: Path) -> set[Path]: + changed_files: set[Path] = set() + commands = [ + ["git", "diff", "--name-only"], + ["git", "diff", "--name-only", "--cached"], + ] + for cmd in commands: + proc = run_command(cmd, cwd=repo_root) + for line in proc.stdout.splitlines(): + line = line.strip() + if not line: + continue + changed_files.add((repo_root / line).resolve()) + return changed_files + + +def parse_changed_lines_from_diff_text( + diff_text: str, + repo_root: Path, +) -> dict[Path, set[int]]: + changed_lines: dict[Path, set[int]] = defaultdict(set) + + current_file: Path | None = None + in_hunk = False + new_line = 0 + + for raw_line in diff_text.splitlines(): + if raw_line.startswith("+++ "): + file_token = raw_line[4:].strip() + if file_token == "/dev/null": + current_file = None + in_hunk = False + continue + if file_token.startswith("b/"): + file_token = file_token[2:] + current_file = (repo_root / file_token).resolve() + in_hunk = False + continue + + hunk_match = DIFF_HUNK_RE.match(raw_line) + if hunk_match: + in_hunk = current_file is not None + new_line = int(hunk_match.group(1)) + continue + + if not in_hunk or current_file is None: + continue + + if raw_line.startswith("+") and not raw_line.startswith("+++"): + changed_lines[current_file].add(new_line) + new_line += 1 + continue + + if raw_line.startswith("-") and not raw_line.startswith("---"): + continue + + if raw_line.startswith(" "): + new_line += 1 + continue + + return changed_lines + + +def git_changed_line_map( + repo_root: Path, + baseline: str, + head: str, + include_working_tree: bool, +) -> dict[Path, set[int]]: + changed_lines: dict[Path, set[int]] = defaultdict(set) + + proc = run_command( + ["git", "diff", "--unified=0", f"{baseline}...{head}"], + cwd=repo_root, + ) + baseline_map = parse_changed_lines_from_diff_text(proc.stdout, repo_root) + for path, lines in baseline_map.items(): + changed_lines[path].update(lines) + + if include_working_tree: + for cmd in ( + ["git", "diff", "--unified=0"], + ["git", "diff", "--cached", "--unified=0"], + ): + wt_proc = run_command(cmd, cwd=repo_root) + wt_map = parse_changed_lines_from_diff_text(wt_proc.stdout, repo_root) + for path, lines in wt_map.items(): + changed_lines[path].update(lines) + + return changed_lines + + +def extract_changed_symbol_names_from_file( + file_path: Path, + changed_lines: set[int], +) -> set[str]: + if not changed_lines or not file_path.exists(): + return set() + + lines = file_path.read_text(encoding="utf-8", errors="replace").splitlines() + symbols: set[str] = set() + + line_index = 1 + max_line = len(lines) + while line_index <= max_line: + line = lines[line_index - 1] + match = CPP_FUNCTION_START_RE.match(line) + if not match: + line_index += 1 + continue + + symbol_name = match.group(1) + start_line = line_index + brace_depth = line.count("{") - line.count("}") + end_line = start_line + + while brace_depth > 0 and end_line < max_line: + end_line += 1 + body_line = lines[end_line - 1] + brace_depth += body_line.count("{") - body_line.count("}") + + if any(start_line <= line_no <= end_line for line_no in changed_lines): + symbols.add(symbol_name) + + line_index = end_line + 1 + + return symbols + + +def collect_changed_symbol_names( + changed_line_map: dict[Path, set[int]], +) -> set[str]: + symbol_names: set[str] = set() + for file_path, changed_lines in changed_line_map.items(): + symbol_names.update( + extract_changed_symbol_names_from_file(file_path, changed_lines) + ) + return symbol_names + + +def clean_command_for_dependency_scan(arguments: list[str]) -> list[str]: + cleaned: list[str] = [] + skip_next = False + flags_with_value = { + "-o", + "-MF", + "-MT", + "-MQ", + "-MJ", + "-Xclang", + } + standalone_drop = { + "-c", + "-MD", + "-MMD", + "-MP", + "-MM", + "-M", + "-E", + "-S", + } + + index = 0 + while index < len(arguments): + arg = arguments[index] + if skip_next: + skip_next = False + index += 1 + continue + + if arg in flags_with_value: + skip_next = True + index += 1 + continue + if arg in standalone_drop: + index += 1 + continue + if arg.startswith("-o") and len(arg) > 2: + index += 1 + continue + if arg.startswith("-MF") and len(arg) > 3: + index += 1 + continue + if arg.startswith("-MT") and len(arg) > 3: + index += 1 + continue + if arg.startswith("-MQ") and len(arg) > 3: + index += 1 + continue + if arg.startswith("-MJ") and len(arg) > 3: + index += 1 + continue + + cleaned.append(arg) + index += 1 + + return cleaned + + +def parse_makefile_dependencies(stdout_text: str) -> list[str]: + flattened = stdout_text.replace("\\\n", " ").replace("\n", " ") + if ":" not in flattened: + return [] + dep_payload = flattened.split(":", 1)[1].strip() + if not dep_payload: + return [] + return shlex.split(dep_payload) + + +def compute_tu_dependencies(entry: CompileCommandEntry) -> None: + dep_cmd = clean_command_for_dependency_scan(entry.arguments) + if not dep_cmd: + entry.dep_error = "Empty compile command after sanitization" + entry.dependencies = {entry.source} + return + + dep_cmd.extend(["-MM", "-MF", "-", "-MT", "__pixie_tu__"]) + source_arg = str(entry.source) + if source_arg not in dep_cmd: + dep_cmd.append(source_arg) + + try: + proc = run_command(dep_cmd, cwd=entry.directory, check=False) + except FileNotFoundError as exc: + entry.dep_error = str(exc) + entry.dependencies = {entry.source} + return + + dependencies: set[Path] = {entry.source} + if proc.returncode != 0: + stderr = proc.stderr.strip() + entry.dep_error = ( + stderr if stderr else f"Dependency scan failed ({proc.returncode})" + ) + entry.dependencies = dependencies + return + + for dep in parse_makefile_dependencies(proc.stdout): + dep_path = Path(dep) + resolved = ( + dep_path.resolve() + if dep_path.is_absolute() + else (entry.directory / dep_path).resolve() + ) + dependencies.add(resolved) + + entry.dependencies = dependencies + + +def is_build_infra_change(repo_root: Path, changed: set[Path]) -> bool: + for path in changed: + if path.name in BUILD_INFRA_FILES: + return True + try: + rel = path.relative_to(repo_root) + except ValueError: + continue + rel_text = rel.as_posix() + if rel_text.startswith("cmake/"): + return True + return False + + +def identify_benchmark_targets( + entries: list[CompileCommandEntry], repo_root: Path +) -> set[str]: + benchmark_targets: set[str] = set() + for entry in entries: + if entry.target is None: + continue + try: + rel = entry.source.relative_to(repo_root) + rel_text = rel.as_posix() + except ValueError: + rel_text = entry.source.as_posix() + + if rel_text.startswith("src/benchmarks/"): + benchmark_targets.add(entry.target) + + benchmark_targets.update(KNOWN_BENCHMARK_TARGETS) + return benchmark_targets + + +def discover_clangxx(explicit: str | None) -> str: + if explicit: + return explicit + + candidates = [ + "clang++", + "clang++-19", + "clang++-18", + "clang++-17", + "clang++-16", + ] + for candidate in candidates: + resolved = shutil.which(candidate) + if resolved: + return resolved + raise FileNotFoundError( + "clang++ was not found on PATH. Provide --clangxx to select a clang compiler." + ) + + +def clean_command_for_ast(arguments: list[str], clangxx: str) -> list[str]: + cleaned = clean_command_for_dependency_scan(arguments) + if not cleaned: + return [] + cleaned[0] = clangxx + cleaned.extend(["-Xclang", "-ast-dump=json", "-fsyntax-only"]) + return cleaned + + +def normalize_path_candidate(path_text: str | None, working_dir: Path) -> Path | None: + if not path_text: + return None + path = Path(path_text) + if path.is_absolute(): + return path.resolve() + return (working_dir / path).resolve() + + +def file_from_loc(loc: dict[str, Any] | None, working_dir: Path) -> Path | None: + if not isinstance(loc, dict): + return None + if "file" in loc: + return normalize_path_candidate(str(loc["file"]), working_dir) + for nested_key in ("spellingLoc", "expansionLoc", "includedFrom"): + nested_loc = loc.get(nested_key) + if isinstance(nested_loc, dict): + resolved = file_from_loc(nested_loc, working_dir) + if resolved is not None: + return resolved + return None + + +def iter_ast_nodes(node: Any): + if isinstance(node, dict): + yield node + inner = node.get("inner", []) + if isinstance(inner, list): + for child in inner: + yield from iter_ast_nodes(child) + elif isinstance(node, list): + for item in node: + yield from iter_ast_nodes(item) + + +def referenced_decl_file(node: dict[str, Any], working_dir: Path) -> Path | None: + referenced = node.get("referencedDecl") + if not isinstance(referenced, dict): + return None + return file_from_loc(referenced.get("loc"), working_dir) + + +def node_references_changed_symbol( + node: dict[str, Any], + changed_symbol_names: set[str], +) -> bool: + if not changed_symbol_names: + return False + + for subnode in iter_ast_nodes(node): + if not isinstance(subnode, dict): + continue + + kind = subnode.get("kind") + if kind == "MemberExpr": + member_name = subnode.get("name") + if isinstance(member_name, str) and member_name in changed_symbol_names: + return True + + if kind == "DeclRefExpr": + ref_decl = subnode.get("referencedDecl") + if not isinstance(ref_decl, dict): + continue + ref_name = ref_decl.get("name") + if isinstance(ref_name, str) and ref_name in changed_symbol_names: + return True + + return False + + +def call_expr_callee_name(call_expr: dict[str, Any]) -> str | None: + for node in iter_ast_nodes(call_expr): + if not isinstance(node, dict): + continue + if node.get("kind") != "DeclRefExpr": + continue + referenced = node.get("referencedDecl") + if isinstance(referenced, dict) and isinstance(referenced.get("name"), str): + return referenced["name"] + return None + + +def string_literals_in_node(node: dict[str, Any]) -> list[str]: + values: list[str] = [] + for cur in iter_ast_nodes(node): + if not isinstance(cur, dict): + continue + if cur.get("kind") != "StringLiteral": + continue + value = cur.get("value") + if isinstance(value, str): + if len(value) >= 2 and value[0] == '"' and value[-1] == '"': + value = value[1:-1] + values.append(value) + return values + + +def benchmark_names_from_source(source: Path) -> set[str]: + names: set[str] = set() + if not source.exists(): + return names + text = source.read_text(encoding="utf-8", errors="replace") + for match in re.finditer(r"BENCHMARK\(\s*([A-Za-z_][A-Za-z0-9_]*)\s*\)", text): + names.add(match.group(1)) + for match in re.finditer(r"register_op\(\s*\"([^\"]+)\"", text): + names.add(match.group(1)) + return names + + +def ast_analyze_entry( + entry: CompileCommandEntry, + changed_files: set[Path], + changed_symbol_names: set[str], + clangxx: str, +) -> AstImpactResult: + result = AstImpactResult() + + ast_cmd = clean_command_for_ast(entry.arguments, clangxx) + if not ast_cmd: + result.ast_error = "Failed to build AST command" + return result + + try: + proc = run_command(ast_cmd, cwd=entry.directory, check=False) + except FileNotFoundError as exc: + result.ast_error = str(exc) + return result + + if proc.returncode != 0: + stderr = proc.stderr.strip() + result.ast_error = ( + stderr if stderr else f"AST command failed ({proc.returncode})" + ) + return result + + try: + ast_root = json.loads(proc.stdout) + except json.JSONDecodeError as exc: + result.ast_error = f"Invalid AST JSON: {exc}" + return result + + for node in iter_ast_nodes(ast_root): + if not isinstance(node, dict): + continue + + kind = node.get("kind") + if kind in {"FunctionDecl", "CXXMethodDecl"}: + name = node.get("name") + if isinstance(name, str) and name.startswith("BM_"): + result.benchmark_names.add(name) + function_loc = file_from_loc(node.get("loc"), entry.directory) + if function_loc in changed_files: + result.affected_names.add(name) + continue + + if node_references_changed_symbol(node, changed_symbol_names): + result.affected_names.add(name) + continue + + for subnode in iter_ast_nodes(node): + if not isinstance(subnode, dict): + continue + ref_file = referenced_decl_file(subnode, entry.directory) + if ref_file in changed_files: + result.affected_names.add(name) + break + + if kind == "CallExpr": + callee = call_expr_callee_name(node) + if callee != "register_op": + continue + literal_values = string_literals_in_node(node) + if not literal_values: + continue + benchmark_name = literal_values[0] + result.benchmark_names.add(benchmark_name) + + call_loc = file_from_loc(node.get("loc"), entry.directory) + if call_loc in changed_files: + result.affected_names.add(benchmark_name) + continue + + if node_references_changed_symbol(node, changed_symbol_names): + result.affected_names.add(benchmark_name) + continue + + for subnode in iter_ast_nodes(node): + if not isinstance(subnode, dict): + continue + ref_file = referenced_decl_file(subnode, entry.directory) + if ref_file in changed_files: + result.affected_names.add(benchmark_name) + break + + return result + + +def regex_for_benchmarks(names: set[str]) -> str | None: + if not names: + return None + ordered = sorted(names) + body = "|".join(re.escape(name) for name in ordered) + return rf"^({body})(/|$)" + + +def relpath_or_abs(path: Path, root: Path) -> str: + try: + return path.relative_to(root).as_posix() + except ValueError: + return path.as_posix() + + +def main() -> int: + cli = parse_args() + + try: + repo_root = git_repo_root() + changed_files = git_changed_files(repo_root, cli.baseline, cli.head) + if cli.include_working_tree: + changed_files.update(git_working_tree_changed_files(repo_root)) + changed_line_map = git_changed_line_map( + repo_root, + cli.baseline, + cli.head, + cli.include_working_tree, + ) + changed_symbol_names = collect_changed_symbol_names(changed_line_map) + compile_commands_path = resolve_compile_commands( + repo_root, cli.compile_commands + ) + entries = load_compile_commands(compile_commands_path) + except FileNotFoundError as exc: + print(f"error: {exc}", file=sys.stderr) + return 2 + except subprocess.CalledProcessError as exc: + stderr = (exc.stderr or "").strip() + if stderr: + print(f"error: {stderr}", file=sys.stderr) + else: + print(f"error: command failed: {' '.join(exc.cmd)}", file=sys.stderr) + return 2 + + target_to_entries: dict[str, list[CompileCommandEntry]] = defaultdict(list) + for entry in entries: + if entry.target: + target_to_entries[entry.target].append(entry) + + for entry in entries: + compute_tu_dependencies(entry) + + affected_targets: set[str] = set() + for entry in entries: + has_changed_dependency = any(dep in changed_files for dep in entry.dependencies) + if has_changed_dependency: + if entry.target: + affected_targets.add(entry.target) + + benchmark_targets = identify_benchmark_targets(entries, repo_root) + all_targets = {entry.target for entry in entries if entry.target} + + infra_change = is_build_infra_change(repo_root, changed_files) + if infra_change: + affected_targets.update(all_targets) + + affected_benchmark_targets = sorted( + affected_targets.intersection(benchmark_targets) + ) + + benchmark_entries: list[CompileCommandEntry] = [] + for entry in entries: + if entry.target in benchmark_targets: + benchmark_entries.append(entry) + + impacted_benchmark_entries: list[CompileCommandEntry] = [] + for entry in benchmark_entries: + has_changed_dependency = any(dep in changed_files for dep in entry.dependencies) + if has_changed_dependency or entry.target in affected_benchmark_targets: + impacted_benchmark_entries.append(entry) + + ast_errors: dict[str, str] = {} + benchmark_target_to_names: dict[str, set[str]] = defaultdict(set) + benchmark_target_to_affected: dict[str, set[str]] = defaultdict(set) + warnings: list[str] = [] + + if impacted_benchmark_entries: + try: + clangxx = discover_clangxx(cli.clangxx) + except FileNotFoundError as exc: + clangxx = "" + warnings.append(str(exc)) + + for entry in impacted_benchmark_entries: + target_name = entry.target or "" + + if not clangxx: + fallback_names = benchmark_names_from_source(entry.source) + benchmark_target_to_names[target_name].update(fallback_names) + if changed_symbol_names: + benchmark_target_to_affected[target_name].update(fallback_names) + continue + + ast_result = ast_analyze_entry( + entry, + changed_files, + changed_symbol_names, + clangxx, + ) + if ast_result.ast_error: + ast_errors[relpath_or_abs(entry.source, repo_root)] = ( + ast_result.ast_error + ) + + if ast_result.benchmark_names: + benchmark_target_to_names[target_name].update( + ast_result.benchmark_names + ) + else: + benchmark_target_to_names[target_name].update( + benchmark_names_from_source(entry.source) + ) + + if ast_result.affected_names: + benchmark_target_to_affected[target_name].update( + ast_result.affected_names + ) + else: + source_is_changed = entry.source in changed_files + if source_is_changed: + benchmark_target_to_affected[target_name].update( + benchmark_target_to_names[target_name] + ) + + if infra_change and affected_benchmark_targets: + for target_name in affected_benchmark_targets: + for entry in target_to_entries.get(target_name, []): + benchmark_target_to_names[target_name].update( + benchmark_names_from_source(entry.source) + ) + benchmark_target_to_affected[target_name].update( + benchmark_names_from_source(entry.source) + ) + + all_affected_benchmarks: set[str] = set() + for names in benchmark_target_to_affected.values(): + all_affected_benchmarks.update(names) + + dep_scan_failures = { + relpath_or_abs(entry.source, repo_root): entry.dep_error + for entry in entries + if entry.dep_error + } + + report: dict[str, Any] = { + "baseline": cli.baseline, + "head": cli.head, + "include_working_tree": cli.include_working_tree, + "changed_symbols": sorted(changed_symbol_names), + "compile_commands": relpath_or_abs(compile_commands_path, repo_root), + "changed_files": sorted( + relpath_or_abs(path, repo_root) for path in changed_files + ), + "affected_targets": sorted(affected_targets), + "affected_benchmark_targets": affected_benchmark_targets, + "affected_benchmarks": { + target: sorted(names) + for target, names in sorted(benchmark_target_to_affected.items()) + if names + }, + "suggested_filter_regex": regex_for_benchmarks(all_affected_benchmarks), + "dependency_scan_failures": dep_scan_failures, + "ast_failures": ast_errors, + "warnings": warnings, + } + + if cli.format == "json": + json.dump(report, sys.stdout, indent=2) + sys.stdout.write("\n") + return 0 + + print(f"Baseline: {cli.baseline}") + print(f"Head: {cli.head}") + print(f"Compile commands: {report['compile_commands']}") + print("") + + print(f"Changed files ({len(report['changed_files'])}):") + for item in report["changed_files"]: + print(f"- {item}") + if not report["changed_files"]: + print("- none") + + print("") + print(f"Affected targets ({len(report['affected_targets'])}):") + for item in report["affected_targets"]: + print(f"- {item}") + if not report["affected_targets"]: + print("- none") + + print("") + print(f"Affected benchmark targets ({len(report['affected_benchmark_targets'])}):") + for item in report["affected_benchmark_targets"]: + print(f"- {item}") + if not report["affected_benchmark_targets"]: + print("- none") + + print("") + print("Affected benchmark functions:") + if report["affected_benchmarks"]: + for target, names in report["affected_benchmarks"].items(): + print(f"- {target}:") + for name in names: + print(f" - {name}") + else: + print("- none") + + print("") + print("Suggested --benchmark_filter regex:") + print(report["suggested_filter_regex"] or "none") + + if dep_scan_failures: + print("") + print("Dependency scan failures:") + for source, error in dep_scan_failures.items(): + print(f"- {source}: {error}") + + if ast_errors: + print("") + print("AST failures:") + for source, error in ast_errors.items(): + print(f"- {source}: {error}") + + if warnings: + print("") + print("Warnings:") + for warning in warnings: + print(f"- {warning}") + + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/.kilo/skills/benchmarks-compare-revisions/SKILL.md b/.kilo/skills/benchmarks-compare-revisions/SKILL.md index 23996a5..49b0d6e 100644 --- a/.kilo/skills/benchmarks-compare-revisions/SKILL.md +++ b/.kilo/skills/benchmarks-compare-revisions/SKILL.md @@ -5,7 +5,12 @@ description: Compare benchmark performance between two git revisions by building # Benchmarks Compare Revisions Skill -Use this skill to compare performance between two git revisions. It focuses on the compare workflow and relies on the existing benchmarks skill for build/run details (see .kilo/skills/benchmarks/SKILL.md). +Use this skill to compare performance between two git revisions. + +This workflow now depends on: + +1. `.kilo/skills/benchmarks-affected/SKILL.md` to determine affected benchmark targets/functions and produce a benchmark filter. +2. `.kilo/skills/benchmarks/SKILL.md` for build/run operational details. ## Goal @@ -21,7 +26,27 @@ BASELINE=abc1234 CONTENDER=def5678 ``` -## Step 1 — Build both revisions (Release only) +## Step 1 — Compute affected benchmark scope first + +Run `benchmarks-affected` from the contender checkout to derive the compare scope. + +Do not duplicate `benchmarks-affected` internals here (compile database selection, AST analysis, or fallback heuristics). Follow that skill directly and consume only its outputs. + +Inputs to pass through: + +- `--baseline ${BASELINE}` +- optional compile-commands path if auto-detection is not desired +- optional output format (`json` recommended for parsing) + +Consume these outputs from `benchmarks-affected`: + +- `suggested_filter_regex` -> set `FILTER` +- `affected_benchmark_targets` -> optionally constrain which benchmark binary/binaries to run +- `affected_benchmarks` -> function-level scope for validation/reporting + +If `FILTER` is empty, fall back to full benchmark binary compare (conservative mode). + +## Step 2 — Build both revisions (Release only) Use the existing benchmarks skill build steps, but set the build suffix to include the short hash for each revision: @@ -37,7 +62,7 @@ git checkout ${CONTENDER} # Follow .kilo/skills/benchmarks/SKILL.md build instructions with this suffix ``` -## Step 2 — Compare using compare.py +## Step 3 — Compare using compare.py Use Google Benchmark compare tooling with a JSON-first flow to avoid long-running binary-vs-binary retries. @@ -80,13 +105,16 @@ Run the comparison: python3 ${COMPARE_PY} -a benchmarks ${BASE_JSON} ${CONT_JSON} ``` -### Optional: filter to reduce noise and runtime - -Pass filter when generating JSON files: +Use the affected filter from Step 1 when generating JSON files: ```bash -FILTER="BM_Rank" -build/benchmarks-all_bench_${BASELINE}/benchmarks --benchmark_filter="${FILTER}" --benchmark_report_aggregates_only=true --benchmark_display_aggregates_only=true ... -build/benchmarks-all_bench_${CONTENDER}/benchmarks --benchmark_filter="${FILTER}" --benchmark_report_aggregates_only=true --benchmark_display_aggregates_only=true ... +if [ -n "${FILTER}" ]; then + FILTER_ARG="--benchmark_filter=${FILTER}" +else + FILTER_ARG="" +fi + +build/benchmarks-all_bench_${BASELINE}/benchmarks ${FILTER_ARG} --benchmark_report_aggregates_only=true --benchmark_display_aggregates_only=true ... +build/benchmarks-all_bench_${CONTENDER}/benchmarks ${FILTER_ARG} --benchmark_report_aggregates_only=true --benchmark_display_aggregates_only=true ... ``` ## Retry and Timeout Policy @@ -96,7 +124,7 @@ build/benchmarks-all_bench_${CONTENDER}/benchmarks --benchmark_filter="${FILTER} 3. Maximum retries per benchmark group: 1. 4. If still failing, emit blocked/partial findings instead of repeated attempts. -## Step 3 — Record findings +## Step 4 — Record findings Capture the compare.py output (terminal transcript or redirected file) and note any statistically significant regressions or wins. @@ -105,5 +133,5 @@ Capture the compare.py output (terminal transcript or redirected file) and note 1. **Release only**: never compare Debug binaries. 2. **Short hash suffixes**: keep build dirs isolated per revision (example: `bench_`). 3. **Same host, same conditions**: do not compare across different machines or power profiles. -4. **Filter for focus**: narrow `--benchmark_filter` to the changed area. +4. **Filter from analysis**: use `benchmarks-affected` output instead of hand-crafted filters whenever possible. 5. **Pin frequency**: for stable numbers, follow benchmark skill guidance on CPU governor.