diff --git a/bootstrap.md b/bootstrap.md index cbe3e0a..87a9f70 100644 --- a/bootstrap.md +++ b/bootstrap.md @@ -350,6 +350,14 @@ If no naming mechanism is available, skip session naming. for a code review of C code, suggest adding the `memory-safety-c` protocol. - **Suggest taxonomies** when the task involves classification. For example, if investigating stack corruption, suggest the `stack-lifetime-hazards` taxonomy. +- **Evaluate taxonomy relevance** before including template-declared + taxonomies. If a template declares a default taxonomy that is clearly + irrelevant to the user's specific investigation (e.g., `stack-lifetime-hazards` + for a power trace analysis, or a CWE taxonomy for a non-security task), + ask the user whether to include it. Omit irrelevant taxonomies rather + than wasting context window on classification schemes that do not apply. + When omitting a template-declared taxonomy, note the omission briefly + so the user understands the deviation from defaults. - **Ask for the audit domain** when the selected template is `investigate-security`, `review-code`, `review-cpp-code`, or `exhaustive-bug-hunt`. The library includes CWE-derived per-domain diff --git a/formats/investigation-report.md b/formats/investigation-report.md index 0bbe07d..c542c2a 100644 --- a/formats/investigation-report.md +++ b/formats/investigation-report.md @@ -27,6 +27,11 @@ Before writing the report, **enumerate and classify all findings first** If the invoking template or workflow explicitly requires the full 9-section structure, use the full format regardless of finding count. +Templates that include a "use the full investigation report format" +instruction (e.g., `investigate-bug`, `investigate-trace`) always +require the full format because the causal chain, prevention, and open +questions sections contain the most actionable content for root cause +investigations. ## Abbreviated Format diff --git a/manifest.yaml b/manifest.yaml index 5725b6d..1564442 100644 --- a/manifest.yaml +++ b/manifest.yaml @@ -1333,6 +1333,18 @@ templates: taxonomies: [stack-lifetime-hazards] format: investigation-report + - name: investigate-trace + path: templates/investigate-trace.md + description: > + Investigate a performance, power, or behavioral issue using + profiling traces, ETW/ETL captures, or telemetry data. Apply + root cause analysis with iterative deepening, call stack + analysis, energy-vs-metric divergence, and cross-process + amplification detection. + persona: systems-engineer + protocols: [anti-hallucination, self-verification, operational-constraints, root-cause-analysis] + format: investigation-report + - name: find-and-fix-bugs path: templates/find-and-fix-bugs.md description: > diff --git a/protocols/guardrails/anti-hallucination.md b/protocols/guardrails/anti-hallucination.md index 8a3c70a..5f1cd71 100644 --- a/protocols/guardrails/anti-hallucination.md +++ b/protocols/guardrails/anti-hallucination.md @@ -28,6 +28,15 @@ Every claim in your output MUST be categorized as one of: - **ASSUMED**: Not established by context. The assumption MUST be flagged with `[ASSUMPTION]` and a justification for why it is reasonable. +**Data-driven tasks**: When the source data is authoritative machine +telemetry or tool output (e.g., profiler results, trace queries, compiler +diagnostics, monitoring metrics), direct observations and measurements +reported by the tool have implicit KNOWN status and do not require explicit +`[KNOWN]` labels. However, **causal explanations**, **inferred +correlations**, and **interpretations** of that data retain full labeling +requirements — these are INFERRED or ASSUMED claims even when derived +from authoritative measurements. + When the number of claims categorized as ASSUMED exceeds 30% of the total number of categorized claims in your output, stop and request additional context instead of proceeding. diff --git a/protocols/guardrails/operational-constraints.md b/protocols/guardrails/operational-constraints.md index cb295f4..fda60f8 100644 --- a/protocols/guardrails/operational-constraints.md +++ b/protocols/guardrails/operational-constraints.md @@ -29,6 +29,12 @@ creep, non-reproducible analysis, and context window exhaustion. exhaustive or comprehensive review, you may exceed 50 files but only in batches of at most 50 files, with a summary after each batch before continuing. +- **For trace, telemetry, or log analysis**: the equivalent scoping + constraint is data categories and time ranges, not file counts. Before + querying, identify which data categories (e.g., CPU sampling, disk I/O, + energy estimation, network activity) and which time ranges are relevant. + Do NOT process all available categories or the full trace duration + without first establishing which subset matters. - Before reading code or data, establish your **search strategy**: - What directories, files, or patterns are likely relevant? - What naming conventions, keywords, or symbols should guide search? @@ -64,6 +70,11 @@ Use a funnel approach: - Summarize intermediate findings as you go. - Prefer reading specific functions over entire files. - Use search tools (grep, find, symbol lookup) before reading files. +- **For structured data sources** (trace queries, database results, API + responses): limit query result volume to what is needed for the current + analysis layer. Retrieve summary/aggregated data first, then drill into + detail only for top contributors. Do NOT retrieve full detail for all + items in a single query. ### 5. Tool Usage Discipline diff --git a/protocols/reasoning/root-cause-analysis.md b/protocols/reasoning/root-cause-analysis.md index 1af1ba4..8147fb4 100644 --- a/protocols/reasoning/root-cause-analysis.md +++ b/protocols/reasoning/root-cause-analysis.md @@ -10,6 +10,7 @@ description: > and elimination. Language-agnostic. applicable_to: - investigate-bug + - investigate-trace - root-cause-ci-failure --- @@ -62,6 +63,34 @@ For each hypothesis, starting with the most plausible: - **ELIMINATED**: Evidence directly contradicts it. - **INCONCLUSIVE**: Evidence is insufficient; state what is needed. +## Phase 3a: Iterative Deepening + +Investigation MUST proceed in layers of increasing resolution. Each layer +informs the next — do NOT skip layers or jump directly to deep analysis. + +1. **Broad survey**: Identify top contributors at the coarsest granularity + (e.g., by process, module, subsystem, or component). Rank by impact. +2. **Attribution**: For the top 5–10 contributors, break down by the next + level of detail (e.g., by module within a process, by function within + a module, by allocation site within a function). +3. **Deep analysis**: For the top contributors at the attribution level, + obtain the most detailed evidence available (e.g., call stacks, data + flow traces, lock contention chains, allocation histories). Call stacks + and execution traces reveal *why* something is happening — module-level + data only reveals *where*. +4. **Cross-component tracing**: Identify causal chains that span component + or process boundaries (see Phase 4a). + +Do NOT write the final report until layer 3 is complete for the top +contributors, up to 5, using the most detailed evidence available. +If fewer than 5 contributors exist, analyze all of them. If available +evidence does not support layer-3 completion for some contributors, you +MAY proceed to the final report only if you explicitly document the +limitation, identify which contributors remain inconclusive, and state +what additional evidence would be needed. Premature reporting without +this disclosure produces surface-level findings that miss the actual +root cause. + ## Phase 4: Root Cause Identification 1. Distinguish between the **root cause** (fundamental defect) and the @@ -73,6 +102,29 @@ For each hypothesis, starting with the most plausible: 3. Ask: "If we fix only the proximate cause, will the root cause produce other failures?" If yes, the fix is incomplete. +## Phase 4a: Cross-Component Causal Chains + +When the investigation involves multiple components, processes, or +subsystems, trace causal chains across boundaries: + +1. **Identify trigger-response pairs**: Does activity in component A + cause work in component B? For example, a file write by one process + may trigger scanning by an antivirus service, which triggers hashing + by an EDR agent, which triggers network inspection by another service. +2. **Map the amplification cascade**: A single action may fan out into + disproportionate downstream work. Document the full chain: + `Trigger → Reactor₁ → Reactor₂ → ... → Observed symptom`. +3. **Quantify amplification**: For each link in the chain, estimate the + cost ratio (e.g., "1 file write triggers 3 scan operations, each + consuming 50ms of CPU"). The amplification factor often explains why + a seemingly minor activity produces outsized impact. +4. **Identify the leverage point**: The most effective fix targets the + link in the chain with the highest amplification factor, not + necessarily the initial trigger or the final symptom. + +Skip this phase when the investigation is confined to a single component +with no cross-boundary interactions. + ## Phase 5: Remediation 1. Propose a fix for the **root cause**, not just the symptom. diff --git a/templates/investigate-bug.md b/templates/investigate-bug.md index 2d1689d..acafde2 100644 --- a/templates/investigate-bug.md +++ b/templates/investigate-bug.md @@ -85,6 +85,10 @@ and producing a structured investigation report. - Identify tests that would have caught this bug - Suggest defensive measures to prevent recurrence +8. **Use the full investigation report format**. Root cause investigation + requires the causal chain, prevention, and open questions sections — do + not use the abbreviated format. + ## Non-Goals Explicitly define what is OUT OF SCOPE for this investigation. diff --git a/templates/investigate-trace.md b/templates/investigate-trace.md new file mode 100644 index 0000000..f3492f9 --- /dev/null +++ b/templates/investigate-trace.md @@ -0,0 +1,218 @@ + + + +--- +name: investigate-trace +description: > + Systematically investigate a performance, power, or behavioral issue + using profiling traces, ETW/ETL captures, or telemetry data. Apply + root cause analysis with iterative deepening and produce an + investigation report. +persona: systems-engineer +protocols: + - guardrails/anti-hallucination + - guardrails/self-verification + - guardrails/operational-constraints + - reasoning/root-cause-analysis +format: investigation-report +params: + problem_description: "Natural language description of the issue under investigation" + trace_context: "Trace capture method, providers/profiles used, and analysis tool capabilities" + environment: "OS, hardware, workload scenario, and capture conditions" +input_contract: null +output_contract: + type: investigation-report + description: > + A structured investigation report with findings, root cause analysis, + evidence from trace data, and remediation plan. +--- + +# Task: Investigate Trace + +You are tasked with investigating a performance, power, or behavioral issue +using profiling trace data and producing a structured investigation report. + +## Inputs + +**Problem Description**: +{{problem_description}} + +**Trace / Telemetry Context**: +{{trace_context}} + +**Environment**: +{{environment}} + +## Instructions + +1. **Apply the root-cause-analysis protocol** systematically: + - Characterize the symptom precisely + - Generate 3–5 competing hypotheses before investigating any + - Evaluate evidence for each hypothesis + - Apply iterative deepening (Phase 3a): broad survey → attribution → + deep analysis → cross-component tracing + - Apply cross-component causal chain analysis (Phase 4a) when + multiple processes or components are involved + - Identify the root cause, not just the proximate trigger + +2. **Apply the anti-hallucination protocol** throughout: + - Base analysis ONLY on the provided trace data and context + - Direct observations from trace queries (metrics, measurements, + counters) have implicit KNOWN status + - Causal explanations and correlations MUST be explicitly labeled + as INFERRED or [ASSUMPTION] + - If you cannot determine the root cause from the available data, + say so and describe exactly what additional traces or data + categories are needed + - Do NOT fabricate process names, PIDs, metric values, or trace + events that are not evidenced in the provided data + +3. **Format the output** according to the investigation-report format + specification. **Use the full investigation report format** (all + sections). Root cause investigation requires the causal chain, + prevention, and open questions sections — do not use the abbreviated + format. + +4. **Call stack analysis is primary** — not optional: + - For each top contributor identified in the broad survey, obtain + call stacks grouped by process and thread + - Identify the dominant call chains — these reveal the actual + workload (e.g., file scanning vs. idle polling vs. network + inspection vs. background sync) + - Module-level attribution only tells you *where* — call stacks + tell you *why*. Do NOT stop at module-level attribution. + - When call stacks are unavailable, state this as a limitation and + describe what the stacks would have revealed + +5. **Energy-vs-metric divergence analysis**: + - Compare each process's CPU sample percentage against its energy + estimation percentage (or equivalent resource metric) + - Processes with disproportionately high energy relative to CPU + time indicate frequent wake/sleep patterns that prevent deep + idle states — these are often worse for battery life than + processes with high sustained CPU + - Flag any process where the energy-to-CPU ratio exceeds 3:1 as + a high-priority finding. When CPU% is below 1%, do not rely on + the ratio alone — only elevate to high priority when energy% + is also significant (≥ 3%); otherwise note it as a + low-confidence anomaly + +6. **Cross-process amplification analysis**: + - Analyze whether background processes amplify each other's impact + - A file write by Process A may trigger scans by Process B, + hashing by Process C, and network inspection by Process D + - Trace these causal chains across process boundaries + - Document the full amplification cascade: + `Trigger → Reactor₁ → Reactor₂ → ... → Observed symptom` + - This "amplification cascade" is often the true root cause of + death-by-a-thousand-cuts performance or power drain + +7. **Apply the self-verification protocol** before finalizing: + - Sample at least 3–5 specific findings and re-verify against + the trace data + - Ensure every causal claim is labeled INFERRED or [ASSUMPTION] + - Confirm coverage: state what data categories were examined and + what was not + +8. **Apply the operational-constraints protocol** when working with + the trace: + - Scope by data categories and time ranges before querying + - Prefer deterministic methods (structured queries, aggregations) + - Document your query strategy for reproducibility + - Retrieve summary data first, drill into detail only for top + contributors + +9. **Remediation must be specific**: + - Provide concrete fix recommendations (e.g., specific registry + keys, power settings, driver configuration, scheduled task + changes, service configuration, `powercfg` commands), not + vague advice + - Assess the risk of each proposed fix + - Identify monitoring or alerting that would have caught this + earlier + - Suggest defensive measures to prevent recurrence + +## Analysis Steps + +Process the trace systematically using iterative deepening: + +1. **Process the trace** with relevant data categories (e.g., CPU + sampling, energy estimation, disk I/O, processor frequency, + interrupt handling, processor idle states, device power state, + process metadata, services) +2. **Broad survey**: Query top consumers by primary metric (CPU + samples, energy estimation, disk I/O bytes) grouped by process. + Rank by impact. +3. **Call stack analysis**: For the top 5–10 consumers, obtain call + stacks. Identify dominant call chains to understand *what* each + process was actually doing. +4. **Divergence check**: Compare CPU percentage vs. energy percentage + for each top consumer. Flag disproportionate energy consumers. +5. **Cross-process tracing**: Identify amplification cascades where + one process's activity triggers work in others. +6. **Supplementary analysis**: Check for: + - Timer resolution requests preventing deep idle states + - Interrupt/DPC activity and wake sources + - Disk I/O patterns during expected-idle periods + - Power state transitions and frequency scaling + - Background service and scheduled task activity + - Network-related wake events +7. **Synthesize**: Combine all layers into a coherent root cause + analysis with causal chains. + +## Non-Goals + +Explicitly define what is OUT OF SCOPE for this investigation. +State each non-goal clearly so the investigation does not expand +beyond its intended boundaries. Examples: + +- Do NOT investigate application-level bugs in the processes found — + only identify them as contributors and recommend actions. +- Do NOT attempt to modify system configuration directly — only + recommend changes. +- Do NOT investigate hardware defects (e.g., battery health, + component failures). + +Adjust these non-goals based on the specific investigation context +provided in {{problem_description}}. + +## Investigation Plan + +Before beginning analysis, produce a concrete step-by-step plan +tailored to this specific investigation. The plan should: + +1. **Identify data categories**: Which trace data categories are + relevant to this investigation? +2. **Define time ranges**: What time periods are relevant (idle + periods, workload periods, transitions)? +3. **Enumerate metrics**: What metrics will be queried at each + iterative deepening layer? +4. **Plan cross-process analysis**: Which processes are likely + to interact, and what causal chains should be checked? +5. **Report**: Produce the output according to the specified format. + +This plan replaces ad-hoc exploration with systematic analysis. + +## Quality Checklist + +Before finalizing, verify: + +- [ ] Every finding cites specific evidence from the trace (process + name, PID, metric values, timestamps, and call stacks when + available; if stack data is unavailable, document what would + be needed to obtain it) +- [ ] Every finding has a severity rating with justification +- [ ] Root cause is identified, not just the proximate trigger +- [ ] Iterative deepening completed: broad survey → module → stack → + cross-process for the top contributors (up to 5), limited by + available meaningful contributors and stack data +- [ ] Energy-vs-CPU divergence checked for top consumers where both + energy and CPU data are available +- [ ] Cross-process amplification cascades documented where present +- [ ] Remediation recommendations are specific and actionable +- [ ] At least 3 findings have been re-verified against the trace data +- [ ] Coverage statement documents what data categories were and were + not examined, including any limitation where fewer than 5 + contributors were analyzable or stack/energy data was unavailable +- [ ] No fabricated process names, PIDs, or metric values — unknowns + marked with [UNKNOWN: ]