Conversation
…st_distance get_highest_distance had an incorrect specification for its return value; the simpler fix was to change the specification but a better fix for later would be to switch the tuple order to be what we originally wanted
MBJean
left a comment
There was a problem hiding this comment.
The new tests LGTM, thanks for taking care of that! The one thing I'll want to consider is: we're duplicating the common testing constants and texts that are currently in corpus_analysis/testing. Could we just use what already exists and so reduce the amount of additional material we're adding in this PR?
I spiked out a rearchitecture in PR #163 that might make using these existing files make more sense, as they're brought into a 'testing' module separate from 'corpus_analysis' and what is currently in 'gender_analysis'.
|
|
||
| # Get all of the medians for the documents | ||
| for document in results: | ||
| for gender in results['document']: |
There was a problem hiding this comment.
I think that makes sense in terms of not reduplicating. Might be best to just hold off merging this until that PR is merged and then any changes that need to be made to this PR can be fixed and merged in.
Codecov Report
@@ Coverage Diff @@
## master #162 +/- ##
==========================================
+ Coverage 60.64% 64.16% +3.51%
==========================================
Files 13 13
Lines 1649 1649
Branches 436 436
==========================================
+ Hits 1000 1058 +58
+ Misses 575 515 -60
- Partials 74 76 +2
Continue to review full report at Codecov.
|
Specific testing for instance_distance.py to increase coverage.
Uses the small test files so that all of the statistics and numbers are correct/can be done by hand. Was going to extend coverage on other files, though, I wasn't sure how to calculate those values by hand.