Skip to content

Datasets are not sorted by time, model uses information from the future #3

@artursil

Description

@artursil

I went through the code and one thing is bothering me. I think there is a major bug in the implementation. It is possible that I don't understand something, so please correct me if I'm wrong, but as of my current understanding this code trains and validates using the information "from the future" .

If you examine values in the code below you will see that there negative values for the delta.

TGSRec/model.py

Line 557 in 0c7ba17

src_ngh_t_batch_delta = cut_time_l[:, np.newaxis] - src_ngh_t_batch

I can see that mask is created only for the 0 values so negatives values are still used.

TGSRec/model.py

Line 575 in 0c7ba17

mask = src_ngh_node_batch_th == 0

Data here is sorted by edge_ids not timestamps, so the possible fix for that would by sorting by x[2], instead of x[1]

TGSRec/graph.py

Lines 34 to 39 in 0c7ba17

for i in range(len(adj_list)):
curr = adj_list[i]
curr = sorted(curr, key=lambda x: x[1])
n_idx_l.extend([x[0] for x in curr])
e_idx_l.extend([x[1] for x in curr])
n_ts_l.extend([x[2] for x in curr])

If you look at the:
TGSRec/datasets/ml-100k/u.data

data is not sorted, by the timestamp and there is no point in your codebase, where this sorting happens (I guess).

I tried to run experiments for ml-100 for both scenarios: your original implementation and with the sorted input data and the results I got are significantly worse, at least for the early stages of training. I haven't run it for 200 epochs, so maybe the final results are closer to each other, but firstly I would like to see if my assumption is correct.

Results afters 20 epochs:
Without sorting:

valid acc: 0.7069337926425662
valid auc: 0.8038448618385599
valid f1: 0.7070636805233961
valid ap: 0.8172432697828477

With sorting:

valid acc: 0.5271334211112526
valid auc: 0.7374971517076878
valid f1: 0.6789490018391757
valid ap: 0.7001478155001294

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions