CoCa multimodal transformer layer implementation #571

ebsmothers · 2023-07-19T01:31:55Z

Hi, thanks for your CoCa implementation! I have a question on the multimodal transformer: typically in a decoder layer I would expect to see self-attention, then cross-attention, then an MLP. But it seems like here a single layer is actually doing self-attention, MLP, cross-attention, then another MLP (since both resblock and cross_attn have an MLP). Is there a specific reason for doing it this way? Thanks in advance.

The text was updated successfully, but these errors were encountered:

gpucce · 2023-08-07T18:22:03Z

Hi, @ebsmothers the main reason is that this was mostly inspired by https://github.com/lucidrains/CoCa-pytorch/blob/main/coca_pytorch/coca_pytorch.py which uses parallel feedforward instead of the classic one both in self and cross attention.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CoCa multimodal transformer layer implementation #571

CoCa multimodal transformer layer implementation #571

ebsmothers commented Jul 19, 2023

gpucce commented Aug 7, 2023

CoCa multimodal transformer layer implementation #571

CoCa multimodal transformer layer implementation #571

Comments

ebsmothers commented Jul 19, 2023

gpucce commented Aug 7, 2023