char_list and char_disamb functions in document.py#164
Conversation
fyang3
commented
May 18, 2021
- added Honorifics
- char_list and char_disamb
- temporarily resolve the proximity/init dependency issue
- added Honorifics - char_list and char_disamb - temporarily resolve the proximity/init dependency issue
MBJean
left a comment
There was a problem hiding this comment.
Overall, great work! I've left some comments below, including a few things that I think should be changed. Let me know what you think.
| from gender_analysis.analysis.dependency_parsing import * | ||
| from gender_analysis.analysis.dunning import * | ||
| from gender_analysis.analysis.gender_frequency import * | ||
| from gender_analysis.analysis.instance_distance import * |
There was a problem hiding this comment.
A note for posterity. This is a temporary measure to prevent circular imports caused by proximity.py importing Corpus for type hinting. PR #163 attempts to address the issue more fundamentally.
| set(self.filter_honr(char_list[j][0]))): | ||
| char_cluster.append(char_list[j]) | ||
| to_return.append(char_cluster) | ||
| return to_return |
There was a problem hiding this comment.
As we talked about in Slack briefly, this method probably requires some thinking through. If I'm reading the test output in 710 correctly, it looks like the disambiguation is overly generous, and I suspect we can figure out a more optimized way to traverse those character lists. Let's chat through some issues in office hours.
added rough draft for coref_resolution draft for simple HCI-console-based approach for disambiguate