Skip to content

About Visual knowledge sub-graph #12

@thinker9527

Description

@thinker9527

in baseclip_graph_v1.py:
215-220:
inputs_text = self.base_text_features.unsqueeze(dim=1) #[100, 1, 1024]
inputs_img = img_feature.unsqueeze(dim=1)
node_cluster_tt = node_cluster_t[:, :, index, :].repeat(inputs_text.size()[0], 1, 1) #[100, 100, 1024] t->t
node_cluster_it = node_cluster_i[:, :, index, :].repeat(inputs_text.size()[0], 1, 1) # i -> t
feat_tt = torch.cat([inputs_text, node_cluster_tt], dim=1)
feat_it = torch.cat([inputs_text, node_cluster_it], dim=1)

Is inputs_img useless?

In paper, "As shown in Fig. 3, to construct the visual knowledge sub-graph Gv = {Cv, Ev}, we pass the augmented image group from the same class into visual encoder to obtain their visual features, and then compute the mean features of them as the nodes."

feat_it =torch.cat([inputs_img , node_cluster_it], dim=1) is right ??

I am confused about this problem and would appreciate your response. Thank you very much.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions