Skip to content

review: anonymize author identity, calibrate severity, surface subprocess errors#232

Open
ftabba wants to merge 3 commits into
sashiko-dev:mainfrom
ftabba:pr/review-improvements
Open

review: anonymize author identity, calibrate severity, surface subprocess errors#232
ftabba wants to merge 3 commits into
sashiko-dev:mainfrom
ftabba:pr/review-improvements

Conversation

@ftabba
Copy link
Copy Markdown

@ftabba ftabba commented Jun 1, 2026

Three improvements to the review pipeline: author anonymization, severity
calibration, and better error visibility on subprocess failures.

Changes

1. Anonymize author identity in the model's view (default on)

Strip author-reputation cues (Signed-off-by, Reviewed-by, the author
header, e-mail addresses) from the commit-message region before it
reaches the model. The diff body, e-mail routing, stored patches, and
the public report are all unaffected. Fixes: trailers are deliberately
preserved as they carry technical provenance (SHA + description) the
model needs to trace code history.

Motivation: in separate testing outside this eval set, models
consistently deferred to well-known maintainer names and softened their
analysis compared to the same patch attributed to lesser-known
contributors. The stripping removes only identity cues; nothing
structurally important to the review is lost.

Gated by [review] anonymize_authors (default true). Set to false to
restore the previous behaviour.

2. Calibrate severity through three explicit axes

The reviewer previously collapsed every finding onto a single
Critical/High/Medium/Low label in one step, which in practice
over-stated severity: "how bad if true" dominated over "how reachable"
and "how confident". The new severity.md rubric requires the model to
reason through three independent axes (reachability, consequence,
confidence) before collapsing to a label, with explicit cap rules
(e.g., unreachable findings cap at Low, low-confidence caps at Medium).

Eval results on a 40-commit KVM/arm64 pKVM series with 5 confirmed
ground-truth bugs:

  • False-positive rate dropped from 0.54 (baseline) to 0.44
  • No loss in recall on confirmed bugs
  • Severity labels more spread and better calibrated (baseline produced
    near-uniform Critical/High; calibrated runs use the full range)
  • One new real selftest bug found that no baseline run caught

Pure prompt/rubric change. The four labels, output schema, and report
format are unchanged.

3. Surface subprocess stderr on failure

sashiko-cli local previously swallowed the subprocess's stderr on
failure, showing only "no output (exit code: N)". Now keeps a bounded
100-line tail and replays it on failure so the actual cause (stage
error, auth failure, panic) is visible. Successful runs are unchanged.

Assisted-by: Antigravity:gemini-3.1-pro
Signed-off-by: Fuad Tabba tabba@google.com

A commit message carries author-reputation cues (the From/author header
and Signed-off-by, Reviewed-by, Acked-by and similar trailers) that name
well-known contributors. A language-model reviewer can defer to those
cues ("X wrote this, so it is probably correct") and soften its
analysis, which is the opposite of what review should do.

Add a `[review] anonymize_authors` flag, defaulting on. When enabled,
the reviewer's view of each patch is stripped of author-identity cues
before it reaches the model:

  - the author field is dropped,
  - identity trailer lines (From, Author, Signed-off-by, Reviewed-by,
    Acked-by, Tested-by, Co-developed-by, Co-authored-by, Reported-by,
    Suggested-by, Cc) are removed from the commit-message region of the
    diff, the `git show` text, and the full commit message,
  - e-mail addresses in that region are redacted.

The strip acts only on the commit-message region: everything from the
first `diff --git` line onward is left byte-for-byte unchanged, so the
code the model reads and quotes is identical whether the flag is on or
off. It also acts only on the copy assembled for the model. The stored
patch, the patch applied to the worktree, e-mail routing and MAINTAINERS
handling (which use the parsed author on a separate path), and the
public review report are all unaffected.

The flag defaults on, so a reputation-blind review is the out-of-the-box
behaviour and an operator opts back into showing identity by setting
`anonymize_authors = false`.

Assisted-by: Antigravity:gemini-3.1-pro
Signed-off-by: Fuad Tabba <tabba@google.com>
@ftabba ftabba marked this pull request as draft June 1, 2026 11:31
Fuad Tabba added 2 commits June 1, 2026 12:34
The reviewer collapses every finding onto a single Critical/High/Medium/
Low label in one step, with only a free-form reasoning string to justify
it. In practice this over-states severity: a finding gets Critical for a
bad-sounding consequence even when the code is barely reachable or the
reviewer is only speculating that the triggering condition can occur.
"How bad if true", "how reachable", and "how sure" all collapse into the
one label, and the first tends to dominate.

Make the reasoning explicit before the label is chosen. severity.md now
asks the reviewer to state three independent axes and then collapse them:

  - Reachability: unprivileged / privileged / guest /
    internal / unreachable.
  - Consequence: crash / data-corruption / info-leak / perf /
    commit-msg-only / style.
  - Confidence: low / medium / high (high requires a concrete
    reproducible path with every precondition named).

with collapse rules that cap the inflation directly: an unreachable or
commit-msg-only/style finding is at most Low however bad the consequence
would be, low confidence caps the label at Medium, and Critical/High
needs a damaging consequence AND real reachability AND medium-or-better
confidence together. The Stage 10 prompt requires the three axis values
and the collapse to be recorded at the start of the existing
severity_explanation field.

This is a pure prompt/rubric change. The four labels, the output schema,
and the public report are unchanged: the axes live as reasoning inside
the existing explanation field, so there is no migration and no new
output. The goal is a more honest single label, not a new label set.

Assisted-by: Antigravity:gemini-3.1-pro
Signed-off-by: Fuad Tabba <tabba@google.com>
`sashiko-cli local` runs the review in a subprocess and streams its stderr only
to drive the four-phase progress display, discarding every other line. When the
subprocess failed before producing a result, the user saw only "Review
subprocess produced no output (exit code: N)" with the actual cause (a stage
error, an auth or quota failure, a panic) silently dropped.

Keep a bounded tail (the last 100 lines) of the subprocess's stderr while
rendering progress, and replay it on the no-output failure path. Successful runs
are unchanged (clean progress), while failures now explain themselves.

Assisted-by: Antigravity:gemini-3.1-pro
Signed-off-by: Fuad Tabba <tabba@google.com>
@ftabba ftabba force-pushed the pr/review-improvements branch from 8eae9e3 to 09861b7 Compare June 1, 2026 11:34
@ftabba ftabba marked this pull request as ready for review June 1, 2026 11:44
@TheSven73
Copy link
Copy Markdown
Contributor

Have you considered that LLMs may be able to easily circumvent identity anonymization?

They're pretty good at guessing. They can now correctly answer multiple-choice questions without even being given the question.

My anecdata: I randomly took one of Christoph Hellwig's patches, stripped all author references, and asked Gemini Flash 3.5 to guess the author without searching the web.

It guessed correctly. Below is a patch from the Linux kernel mailing list. determine who is the most likely author. DO NOT SEARCH THE WEB TO FIND THE ANSWER
[PATCH 1/8] shmem: provide a shmem_write_folio wrapper

Provide a wrapper for the shmem abuses in drm to preparare for swap I/O
refactoring by keepin swap_iocb handling entirely contained in mm/.

---
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 2 +-
 drivers/gpu/drm/ttm/ttm_backup.c          | 2 +-
 include/linux/shmem_fs.h                  | 5 +----
 mm/shmem.c                                | 7 ++++++-
 mm/swap.h                                 | 4 ++++
 5 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 06543ae60706..ef9440166295 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -325,7 +325,7 @@ void __shmem_writeback(size_t size, struct address_space *mapping)
 		if (folio_mapped(folio))
 			folio_redirty_for_writepage(&wbc, folio);
 		else
-			error = shmem_writeout(folio, NULL, NULL);
+			error = shmem_write_folio(folio);
 	}
 }
diff --git a/drivers/gpu/drm/ttm/ttm_backup.c b/drivers/gpu/drm/ttm/ttm_backup.c
index 81df4cb5606b..c5b813a563e7 100644
--- a/drivers/gpu/drm/ttm/ttm_backup.c
+++ b/drivers/gpu/drm/ttm/ttm_backup.c
@@ -117,7 +117,7 @@ ttm_backup_backup_page(struct file *backup, struct page *page,
 	if (writeback && !folio_mapped(to_folio) &&
 	    folio_clear_dirty_for_io(to_folio)) {
 		folio_set_reclaim(to_folio);
-		ret = shmem_writeout(to_folio, NULL, NULL);
+		ret = shmem_write_folio(to_folio);
 		if (!folio_test_writeback(to_folio))
 			folio_clear_reclaim(to_folio);
 		/*
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index acb8dd961b45..f35c752f27af 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -12,8 +12,6 @@
 #include <linux/userfaultfd_k.h>
 #include <linux/bits.h>
-struct swap_iocb;
-
 /* inode in-kernel data */
 #ifdef CONFIG_TMPFS_QUOTA
@@ -122,8 +120,7 @@ static inline bool shmem_mapping(const struct address_space *mapping)
 void shmem_unlock_mapping(struct address_space *mapping);
 struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
 					pgoff_t index, gfp_t gfp_mask);
-int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
-		struct list_head *folio_list);
+int shmem_write_folio(struct folio *folio);
 void shmem_truncate_range(struct inode *inode, loff_t start, uoff_t end);
 int shmem_unuse(unsigned int type);
diff --git a/mm/shmem.c b/mm/shmem.c
index 56c23a7b15c7..d10735e49b25 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1738,7 +1738,12 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
 	folio_mark_dirty(folio);
 	return AOP_WRITEPAGE_ACTIVATE;	/* Return with folio locked */
 }
-EXPORT_SYMBOL_GPL(shmem_writeout);
+
+int shmem_write_folio(struct folio *folio)
+{
+	return shmem_writeout(folio, NULL, NULL);
+}
+EXPORT_SYMBOL_GPL(shmem_write_folio);
 #if defined(CONFIG_NUMA) && defined(CONFIG_TMPFS)
 static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
diff --git a/mm/swap.h b/mm/swap.h
index 77d2d14eda42..4f86ef338a60 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -473,4 +473,8 @@ static inline unsigned int folio_swap_flags(struct folio *folio)
 }
 #endif /* CONFIG_SWAP */
+
+int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
+		struct list_head *folio_list);
+
 #endif /* _MM_SWAP_H */
--
2.53.0

Based on the code, architectural goals, and specific phrasing in the commit message, the most likely author of this patch is Christoph Hellwig (often known in the kernel community as hch).

Here is a breakdown of the clues that point to him:

Subsystem Boundary Enforcement: The primary goal of the patch is architectural cleanup—hiding internal Memory Management (mm/) structures (swap_iocb) from the Direct Rendering Manager (drm/) drivers. Christoph Hellwig is highly prolific and well-known for enforcing strict subsystem boundaries and cleaning up cross-subsystem code.

The Phrase "shmem abuses in drm": Calling out driver or subsystem "abuses" (especially when graphics drivers bend core MM or block-layer rules) is a trademark style of his direct, no-nonsense commit messages.

Swap I/O Refactoring: He has been heavily involved in rewriting and refactoring the Linux kernel's swap I/O path, block layer integrations, and removing legacy or leaking interfaces.

The Typos: The short, blunt description containing minor typos ("preparare", "keepin") is characteristic of his fast-paced, high-volume patch submissions.

@ftabba
Copy link
Copy Markdown
Author

ftabba commented Jun 1, 2026

Have you considered that LLMs may be able to easily circumvent identity anonymization?

They're pretty good at guessing. They can now correctly answer multiple-choice questions without even being given the question.

I did, but still I think this adds value. Shorter commits are more difficult to distinguish, and it would require some active work on part of the LLM (which might not be a good thing :) ).

@rgushchin
Copy link
Copy Markdown
Member

rgushchin commented Jun 1, 2026

Thank you, Fuad, for the pull request!

First, please, don't mix orthogonal changes into a single PR.

  1. I'm not sure it's necessarily a bad thing (softer reviews of patches from significant contributors), unless Sashiko is missing real bad bugs. But it should be relatively easy to verify: you can create a benchmark or distill the existing benchmark to leave only original changes from prominent names and test whether hiding their names would help with the finding rate. If not, I'm not super enthusiastic about this change because it might make Sashiko more annoying for it's core users group.
  2. My experience with asking LLMs to assess the confidence was quite poor. So I'm a bit skeptical there. But reachability might be a good idea. Also, I'm not sure how the false-positive rate can depend on the severity estimation. Were some issues dismissed? An even distribution of issues across severities is not a goal (especially on a small set of 40 bugs). Based on all data I have, Sashiko in general is dealing better and actually quite well with critical and high-severity issues. So I believe the question is how to reduce the false positive rate across medium and low-severity bugs.
  3. Sounds like a good fix, please, post it separately.

Thanks!

@ftabba
Copy link
Copy Markdown
Author

ftabba commented Jun 1, 2026

Thank you, Fuad, for the pull request!

First, please, don't mix orthogonal changes into a single PR.

Sure.

  1. I'm not sure it's necessarily a bad thing (softer reviews of patches from significant contributors), unless Sashiko is missing real bad bugs. But it should be relatively easy to verify: you can create a benchmark or distill the existing benchmark to leave only original changes from prominent names and test whether hiding their names would help with the finding rate. If not, I'm not super enthusiastic about this change because it might make Sashiko more annoying for it's core users group.

Sure, I'll drop it.

  1. My experience with asking LLMs to assess the confidence was quite poor. So I'm a bit skeptical there. But reachability might be a good idea. Also, I'm not sure how the false-positive rate can depend on the severity estimation. Were some issues dismissed? An even distribution of issues across severities is not a goal (especially on a small set of 40 bugs). Based on all data I have, Sashiko in general is dealing better and actually quite well with critical and high-severity issues. So I believe the question is hot to reduce the false positive rate across medium and low-severity bugs.

Confidence axis: Yes, LLM self-assessed confidence is shaky. Here it's just a cap rule (low confidence can't produce Critical/High), not a primary signal. If it turns out models just default to "medium" for everything, the reachability axis alone still does most of the work.

FP rate: The rubric doesn't dismiss findings after the fact. The mechanism is upstream: forcing the model to reason about reachability and confidence before committing to a finding causes it to self-filter speculative ones earlier. That said, the improvement (0.54 → 0.44) is across 2 runs on one 40-commit series, so I can't cleanly separate the rubric effect from variance.

Distribution: I phrased that badly. Not aiming for even distribution. The baseline just labeled nearly everything Critical/High, including dead-code observations and commit-message nitpicks. The rubric makes the labels more accurate, not more uniform.

Medium/Low FP: Agree, that's where the noise is. The reachability axis targets exactly this: making the model state "who can actually reach this" before picking a label tends to deflate the speculative stuff.

If the confidence axis is a concern, I can drop it and keep just reachability + consequence. Reachability is the axis with the clearest signal, and a simpler two-axis rubric is easier to validate.

  1. Sounds like a good fix, please, post it separately.

Will do.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants