PMI co-change, anomaly, and incomplete-change detection#7
Merged
Conversation
…ge detection New reports: - co_change_report — fuses PMI (CoOccurrenceMatrix) with git co-change history into a single fused_score per file pair - anomaly_report — finds identifier pairs with PMI outlier scores, filtered to code-only identifiers with prose-token suppression - incomplete_change_report — given a diff/changed-files, flags files with high co-change affinity that were NOT modified New CLI modes: - 'quale core risk --mode co-change' — co-change prediction - 'quale core risk --mode anomaly' — PMI anomaly detection - 'quale core verify --mode incomplete' — incomplete change detection New CoOccurrenceMatrix methods: - pmi(a, b) — log2(P(a,b)/P(a)P(b)) - top_pmi_for(phrase) — ranked PMI partners 16 new tests: PMI symmetry, PMI zero-invariants, co-change report schema, anomaly statistics, incomplete-change detection.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three new reports and CLI modes built on PMI (Pointwise Mutual Information):
core risk --mode co-change— Fuses PMI (vocabulary coupling) with git co-change history. Tells agents what files to read before editing. Tells humans what files they might have forgotten.core risk --mode anomaly— Finds identifier pairs with outlier PMI scores. Code-only identifiers only, prose/suppressed. 3,042 scored pairs on questo repo, mean PMI 2.32.core verify --mode incomplete— Given a diff, flags files with high co-change affinity that were NOT modified. Catches forgotten files before merge.Also adds
CoOccurrenceMatrix.pmi(a, b)andtop_pmi_for(phrase)methods toanalyze.py.What changed
quale/analyze.pypmi(),top_pmi_for()methodsquale/reports/__init__.pyco_change_report,anomaly_report,incomplete_change_reportquale/cli.pyrisk_cmdandverify_cmdtests/test_reports.pyVerification
core risk --mode co-change --format json— returns structured predictions withfused_scorecore risk --mode anomaly --format json— returns PMI outliers withstatisticscore verify --mode incomplete --files quale/cli.py— detects missing co-change partners