You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Suggestion for aligning temp embeddings across modalities: when you make the embedding sum for one modality, e.g.,
Frame 0 (RGB): x + pos_emb + temp_emb + mod_emb, and for another one, e.g., <frame_0_…>: x + pos_emb + mod_emb + temp_emb
make sure the temp_emb is the same for those two different modalities if the position is the same
According to https://docs.google.com/presentation/d/1AY3QV1N_hoi9aXI1r8QTqrNmDK9LyorgJDQMPWb8hBo/edit#slide=id.g2e696416940_0_144, we have to add the temporal/frame encoding to IMAGE-based modality embeddings (but not sequence based ones).
A good starting point: check out this
ml-4m/fourm/models/encoder_embeddings.py
Line 206 in 4c2c9a5
Things to consider: make sure the embedding for temporal frame doesn't interfere with the positional patch embedding somehow?
Definition of Done: all image based encoder embeddings are augmented with a temporal embedding.
@vesteinn @garjania
The text was updated successfully, but these errors were encountered: