-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Description
GPT4Point/lavis/models/gpt4point_models/gpt4point_qformer.py
Lines 156 to 183 in 3ed52d9
| ###============== Point-text Matching ===================### | |
| text_input_ids_world = concat_all_gather(text_tokens.input_ids) # [bs, 32] | |
| text_attention_mask_world = concat_all_gather(text_tokens.attention_mask) # [bs, 32] | |
| point_embeds_world = all_gather_with_grad(point_embeds) # [bs, 257, 1408] | |
| with torch.no_grad(): | |
| sim_t2p[:, rank * bs : rank * bs + bs].fill_diagonal_(-10000) | |
| sim_p2t[:, rank * bs : rank * bs + bs].fill_diagonal_(-10000) | |
| weights_t2p = F.softmax(sim_t2p, dim=1) | |
| weights_p2t = F.softmax(sim_p2t, dim=1) | |
| # select a negative point for each text | |
| point_embeds_neg = [] | |
| for b in range(bs): | |
| neg_idx = torch.multinomial(weights_t2p[b], 1).item() | |
| point_embeds_neg.append(point_embeds_world[neg_idx]) | |
| point_embeds_neg = torch.stack(point_embeds_neg, dim=0) | |
| # select a negative text for each point | |
| text_ids_neg = [] | |
| text_atts_neg = [] | |
| for b in range(bs): | |
| neg_idx = torch.multinomial(weights_p2t[b], 1).item() | |
| text_ids_neg.append(text_input_ids_world[neg_idx]) | |
| text_atts_neg.append(text_attention_mask_world[neg_idx]) | |
| text_ids_neg = torch.stack(text_ids_neg, dim=0) | |
| text_atts_neg = torch.stack(text_atts_neg, dim=0) |
The neg_idx seems to select the most similar point sample for each text sample, and the most similar text sample for each point sample.
Why the "most similar" instead of "least similar"?
Metadata
Metadata
Assignees
Labels
No labels