About Visual knowledge sub-graph

in baseclip_graph_v1.py:
215-220:
                inputs_text = self.base_text_features.unsqueeze(dim=1)    #[100, 1, 1024]
                inputs_img = img_feature.unsqueeze(dim=1)
                node_cluster_tt =  node_cluster_t[:, :, index, :].repeat(inputs_text.size()[0], 1, 1)  #[100, 100, 1024] t->t
                node_cluster_it =  node_cluster_i[:, :, index, :].repeat(inputs_text.size()[0], 1, 1)  # i -> t
                feat_tt = torch.cat([inputs_text, node_cluster_tt], dim=1) 
                feat_it = torch.cat([inputs_text, node_cluster_it], dim=1) 
                
Is inputs_img useless?  

In paper, "As shown in Fig. 3, to construct the visual knowledge sub-graph Gv = {Cv, Ev}, we pass the augmented image group from the same class into visual encoder to obtain their visual features, and then compute the mean features of them as the nodes."

feat_it =torch.cat([inputs_img , node_cluster_it], dim=1) is right ??  

I am confused about this problem and would appreciate your response. Thank you very much.
            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Visual knowledge sub-graph #12

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

About Visual knowledge sub-graph #12

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions