Addition of coauthorship dataset#180
Conversation
toponetx/datasets/graph.py
Outdated
| cochains = np.load(DIR / "150250_cochains.npy", allow_pickle=True) | ||
|
|
||
| simplices = [] | ||
| for dim in list(range(len(cochains)))[::-1]: |
There was a problem hiding this comment.
No need to explicitly construct a list, use the step parameter of range:
| for dim in list(range(len(cochains)))[::-1]: | |
| for dim in range(len(cochains) - 1, -1, -1): |
toponetx/datasets/graph.py
Outdated
| li = list(cochains[dim].keys()) | ||
| simplices += [list(l) for l in li] |
There was a problem hiding this comment.
No need to explicitly construct the list here, you are only iterating over its values, which you can do with the iterator you have in the first place:
| li = list(cochains[dim].keys()) | |
| simplices += [list(l) for l in li] | |
| simplices += [list(l) for l in cochains[dim].keys()] |
toponetx/datasets/graph.py
Outdated
| sc = SimplicialComplex(simplices) | ||
|
|
||
| for i in range(len(cochains)): | ||
| dic = {tuple(sorted(list(k))): v for k, v in cochains[i].items()} |
There was a problem hiding this comment.
Again, explicit list construction is not necessary, sorted happily works on an iterable:
| dic = {tuple(sorted(list(k))): v for k, v in cochains[i].items()} | |
| dic = {tuple(sorted(k)): v for k, v in cochains[i].items()} |
Codecov Report
@@ Coverage Diff @@
## main #180 +/- ##
==========================================
+ Coverage 77.18% 77.29% +0.11%
==========================================
Files 22 22
Lines 2402 2414 +12
==========================================
+ Hits 1854 1866 +12
Misses 548 548
|
Hellsegga
left a comment
There was a problem hiding this comment.
Fixed in the last commit + see comment about storage of dataset.
USFCA-MSDS
left a comment
There was a problem hiding this comment.
I think this has been addressed by now.
|
|
||
| References | ||
| ---------- | ||
| [SNN20] Stefania Ebli, Michael Defferrard and Gard Spreemann. |
There was a problem hiding this comment.
I do no think the description of the dataset is clear enough.
- What is the dimension of the SC?
- What is the problem exactly ? Can you describe the training pairs in more details
- Can you provide a brief statistics about the dataset?
toponetx/datasets/graph.py
Outdated
| - the number of citations attributed to the given collaborations of k authors. | ||
|
|
||
| """ | ||
| cochains = np.load(DIR / "150250_cochains.npy", allow_pickle=True) |
There was a problem hiding this comment.
Can you change the name of the file?
There was a problem hiding this comment.
Agreed, let's choose one name for this dataset and stick to it:
"coauthorship", then "coauthorship.npy" and the variable itself can be called coauthorship
| from toponetx.transform.graph_to_simplicial_complex import graph_2_clique_complex | ||
|
|
||
| __all__ = ["karate_club"] | ||
| __all__ = ["karate_club", "coauthorship"] |
There was a problem hiding this comment.
I am not sure the addition of this dataset to graph is justified, maybe it needs a different file all together. Can you justify your choice for graph? other people might not find it intuitive anyway,
There was a problem hiding this comment.
Yes, can you suggest a name for the file?
I'll be away for a week, I can fix the code based on the comments when I'm back.
|
I have fixed the changes suggested above. However please advise about where to place it. The dataset originates from a citation graph which is why I placed it in graph.py first, but it is already pre-processed as a simplicial complex and therefore I agree graph.py may not be ideal place. But I'm not sure where it should be placed. |
|
@USFCA-MSDS we need to merge this to allow the challenge's participant @Hellsegga to move forward with their submission. |
This is the dataset used in the Simplicial Neural Networks paper (https://arxiv.org/abs/2010.03633). I'd like to include it here to be able to use it in my implementation in TopoModelX (pyt-team/TopoModelX#98). It can hopefully be useful to test other methods as well.