Angular Separation and Chord Distance: fix cosine similarity calculation#142
Open
taserz wants to merge 1 commit into
Open
Angular Separation and Chord Distance: fix cosine similarity calculation#142taserz wants to merge 1 commit into
taserz wants to merge 1 commit into
Conversation
Fixes evllabs#140. Both functions had the wrong denominator for cosine similarity. They were accumulating the sum of raw frequencies instead of the sum of squared frequencies. For a normalized histogram sum(xi) = 1, so the denominator collapsed to 1 * 1 = 1 and both functions were effectively returning the dot product with no normalization. Fixed by accumulating sum(xi^2) and sum(yi^2) in the loop and using sqrt(sumXX * sumYY) as the denominator, which is the standard L2 norm. Angular Separation now returns arccos(cosine) / pi instead of 1 - cosine. The old formula is a cosine distance approximation, not an actual angular measure. The arccos version gives a properly normalized angle in [0, 1]. Chord Distance's sqrt(2 - 2 * cosine) formula was already mathematically correct and just needed an accurate cosine value to work properly. Test expected values updated: identical vectors now correctly return 0.0 for both functions instead of 0.9 and sqrt(1.8). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
c1cec4d to
b01052a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #140
Both functions had the wrong denominator for cosine similarity. They were summing raw frequencies instead of squared frequencies, which for a normalized histogram means the denominator collapses to sum(xi) * sum(yi) = 1 * 1 = 1. Effectively both were computing the dot product with no normalization at all.
Fixed by accumulating sum(xi^2) and sum(yi^2) in the loop and using sqrt(sumXX * sumYY) as the denominator, which is the standard L2 norm approach.
Angular Separation also now returns arccos(cosine) / pi instead of 1 - cosine. The old formula is a cosine distance approximation, not an actual angular measure. The arccos version gives a properly normalized angle in the range [0, 1].
Chord Distance's sqrt(2 - 2 * cosine) formula was already mathematically correct and just needed an accurate cosine value to work. Test expected values updated to match corrected output — identical vectors now return 0.0 for both functions instead of 0.9 and sqrt(1.8).