Skip to content

KL Divergence: use epsilon smoothing for zero-frequency events#141

Open
taserz wants to merge 1 commit into
evllabs:masterfrom
taserz:fix/kl-divergence-epsilon-smoothing
Open

KL Divergence: use epsilon smoothing for zero-frequency events#141
taserz wants to merge 1 commit into
evllabs:masterfrom
taserz:fix/kl-divergence-epsilon-smoothing

Conversation

@taserz
Copy link
Copy Markdown

@taserz taserz commented May 12, 2026

Fixes #139

The divergence function was only iterating over events in the unknown histogram and skipping any event where the known histogram had a frequency of zero. So if a word appears in the unknown document but not the known one, that term gets dropped entirely. That understates the divergence and throws away a real signal.

Switched to iterating over the union of both histograms and substituting epsilon (1e-10) for any zero known-frequency term instead of skipping it. For events where both frequencies are already positive nothing changes, so existing test values are unaffected.

Fixes evllabs#139. The divergence function was iterating only over events in
the unknown histogram and silently skipping any event where the known
histogram frequency was zero. Events present in the unknown document
but absent in the known one were dropped entirely, understating the
divergence and discarding a real authorship signal.

Now iterates over the union of both histograms and substitutes epsilon
(1e-10) for any zero known-frequency term instead of skipping it.
Behavior is unchanged for events where both frequencies are positive.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@taserz taserz force-pushed the fix/kl-divergence-epsilon-smoothing branch from 342bee9 to 0a6d906 Compare May 12, 2026 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KL Divergence Silently Skips Events

1 participant