Skip to content

PMI co-change, anomaly, and incomplete-change detection#7

Merged
alderpath merged 4 commits into
masterfrom
feature/incomplete-pmi-anomaly
May 29, 2026
Merged

PMI co-change, anomaly, and incomplete-change detection#7
alderpath merged 4 commits into
masterfrom
feature/incomplete-pmi-anomaly

Conversation

@alderpath

Copy link
Copy Markdown
Contributor

Summary

Three new reports and CLI modes built on PMI (Pointwise Mutual Information):

  • core risk --mode co-change — Fuses PMI (vocabulary coupling) with git co-change history. Tells agents what files to read before editing. Tells humans what files they might have forgotten.
  • core risk --mode anomaly — Finds identifier pairs with outlier PMI scores. Code-only identifiers only, prose/suppressed. 3,042 scored pairs on questo repo, mean PMI 2.32.
  • core verify --mode incomplete — Given a diff, flags files with high co-change affinity that were NOT modified. Catches forgotten files before merge.

Also adds CoOccurrenceMatrix.pmi(a, b) and top_pmi_for(phrase) methods to analyze.py.

What changed

File Change
quale/analyze.py Added pmi(), top_pmi_for() methods
quale/reports/__init__.py Added co_change_report, anomaly_report, incomplete_change_report
quale/cli.py Wired new modes into risk_cmd and verify_cmd
tests/test_reports.py 16 new tests (PMI, co-change, anomaly, incomplete)

Verification

  • core risk --mode co-change --format json — returns structured predictions with fused_score
  • core risk --mode anomaly --format json — returns PMI outliers with statistics
  • core verify --mode incomplete --files quale/cli.py — detects missing co-change partners
  • 16/16 new tests pass, 383/386 total pass (3 pre-existing fails)

alderpath added 4 commits May 28, 2026 23:14
…ge detection

New reports:
- co_change_report — fuses PMI (CoOccurrenceMatrix) with git co-change
  history into a single fused_score per file pair
- anomaly_report — finds identifier pairs with PMI outlier scores,
  filtered to code-only identifiers with prose-token suppression
- incomplete_change_report — given a diff/changed-files, flags files
  with high co-change affinity that were NOT modified

New CLI modes:
- 'quale core risk --mode co-change' — co-change prediction
- 'quale core risk --mode anomaly' — PMI anomaly detection
- 'quale core verify --mode incomplete' — incomplete change detection

New CoOccurrenceMatrix methods:
- pmi(a, b) — log2(P(a,b)/P(a)P(b))
- top_pmi_for(phrase) — ranked PMI partners

16 new tests: PMI symmetry, PMI zero-invariants, co-change report
schema, anomaly statistics, incomplete-change detection.
@alderpath alderpath merged commit 7a85d8f into master May 29, 2026
8 checks passed
@alderpath alderpath deleted the feature/incomplete-pmi-anomaly branch May 29, 2026 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant