Skip to content

Latest commit

 

History

History
112 lines (87 loc) · 14.7 KB

MultiModal-CVPR2021.md

File metadata and controls

112 lines (87 loc) · 14.7 KB

Multi-modal learning paper in CVPR2021

the Navigation of CVPR 2021 papers

Text-to-Image Generation

No. Model Name Title Links Pub. Organization
0 XMC-GAN Cross-Modal Contrastive Learning for Text-to-Image Generation paper CVPR 2021 Google Research

Autonomous Driving

No. Model Name Title Links Pub. Organization
0 MVDNet Robust Multimodal Vehicle Detection in Foggy Weather Using Complementary Lidar and Radar Signals paper code CVPR 2021 University of California SanDiego
1 - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving paper CVPR 2021 Max Planck Institute for Intelligent Systems

Navigation

No. Model Name Title Links Pub. Organization
0 VLN Robust Multimodal Vehicle Detection in Foggy Weather Using Complementary Lidar and Radar Signals paper code CVPR 2021 University of California SanDiego
1 SSM Structured Scene Memory for Vision-Language Navigation paper CVPR 2021 Beijing Institute of Technology

OCR

No. Model Name Title Links Pub. Organization
0 - Semantic-Aware Video Text Detection paper CVPR 2021 National Laboratory of Pattern Recognition
1 TRBA What If We Only Use Real Datasets for Scene Text Recognition?Toward Scene Text Recognition With Fewer Labels paper code CVPR 2021 The University of Tokyo
2 Multiplexed TextSpotter A Multiplexed Network for End-to-End, Multilingual OCR paper CVPR 2021 Facebook AI
3 STKM Self-attention based Text Knowledge Mining for Text Detection paper CVPR 2021 Shenzhen University
4 TextOCR TextOCR: Towards large-scale end-to-end reasoningfor arbitrary-shaped scene text CVPR 2021 Facebook AI Research

Video Moment Retreival

No. Model Name Title Links Pub. Organization
0 - Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval paper CVPR 2021 Hunan University

video-audio-text

No. Model Name Title Links Pub. Organization
0 How2Sign: A Large-scale Multimodal Datasetfor Continuous American Sign Language paper dataset CVPR 2021 Universitat Polit`ecnica de Catalunya

Image&Language

No. Model Name Title Links Pub. Organization
0 Image Change Captioning by Learning from an Auxiliary Task paper CVPR 2021 University of Manitoba
1 UC^2 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training paper CVPR 2021 University of California, Davis
2 - How Transferable are Reasoning Patterns in VQA? paper code CVPR 2021 INSA Lyon
3 M3p M3P: Learning Universal Representations via Multitask MultilingualMultimodal Pre-training paper CVPR 2021 HiT
4 CC12M Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts paper CVPR 2021 Google Research
5 - Separatin Skills and Concepts for Novel Visual Questions Answering paper CVPR 2021 UIUC
6 VinVL VinVL: Revisiting Visual Representations in Vision-Language Models paper code CVPR 2021 Microsoft
7 - Domain-robus VQA with diverse datasets and methods but no target labels paper CVPR 2021 University of Pittsburgh
8 PCME Probabilistic Embeddings for Cross-Modal Retrieval paper code CVPR 2021 NAVER AI Lab
9 - Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers paper CVPR 2021 DeepMind
10 TAP TAP: Text-Aware Pre-training for Text-VQA and Text-Caption paper CVPR 2021 University of Rochester
11 Causal Attention Causal Attention for Vision-Language Tasks paper code CVPR 2021 Nanyang Technological University,Singapore
12 VirTex VirTex: Learning Visual Representations from Textual Annotations paper CVPR 2021 University of Michigan
13 - Predicting Human Scanpaths in Visual Question Answering paper CVPR 2021 Univeristy of Minnesota
14 Kaleido-BERT Kaleido-BERT: Vision-Language Pre-training on Fashion Domain paper code CVPR 2021 Alibaba Group
15 - Seeing Out of tHe bOx:End-to-End Pre-training for Vision-Language Representation Learning paper CVPR 2021 Univeristy of Science and Technology Beijing
16 - Learning by Planning: Language-Guided Global Image Editing paper code CVPR 2021 University of Rochester
17 KRISP KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA paper code
18 - Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval paper CVPR 2021 Peking University

Video&Text

No. Model Name Title Links Pub. Organization
0 ClipBERT Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling paper code CVPR 2021 UNC
1 - SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Networkfor Video Reasoning over Traffic Events paper code Singapore University of Technology and Design
2 - Open-book Video Captioning with Retrieve-Copy-Generate Network paper CVPR 2021 institute of Automation, Chinese Academy of Sciences
3 NExT-QA NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions paper code CVPR 2021 National University of Singapore
4 AGQA AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning paper CVPR 2021 Stanford University
5 - Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering paper CVPR 2021 Yonsei University, Souch Korea
6 - Look Before you Speak: Visually Contextualized Utterances paper CVPR 2021 Google Research

3D cross-modal retreival

No. Model Name Title Links Pub. Organization
0 - Cross-Modal Center Loss for 3D Cross-Modal Retrieval paper CVPR 2021 The City University of New York

Video-to-Text Generation

No. Model Name Title Links Pub. Organization
0 Vx2Text VX2TEXT: End-to-End Learning of Video-Based Text GenerationFrom Multimodal Inputs paper CVPR 2021 Columbia University

Image-to-Video Synthesis

No. Model Name Title Links Pub. Organization
0 cINNs Stochastic Image-to-Video Synthesis using cINNs paper CVPR 2021 Heidelberg University
1 Understanding Object Dynamics for Interactive Image-to-Video Synthesis paper code CVPR 2021 Heidelberg University

Audio&Visual

No. Model Name Title Links Pub. Organization
0 - Can audio-visual integration strengthen robustnessunder multimodal attacks? paper CVPR 2021 University of Rochester
1 - Audio-Visual Instance Discrimination with Cross-Modal Agreement paper CVPR 2021 UC San Diego
2 - VISUALVOICE: Audio-Visual Speech Separation with Cross-Modal Consistency paper code CVPR 2021 The University of Texas at Austin

Language-guided video actor segmentation

No. Model Name Title Links Pub. Organization
0 - Collaborative Spatial-Temporal Modeling for Language-QueriedVideo Actor Segmentation paper CVPR 2021 Chinese Academy of Sciences