the Navigation of CVPR 2021 papers
No. | Model Name | Title | Links | Pub. | Organization |
---|---|---|---|---|---|
0 | XMC-GAN | Cross-Modal Contrastive Learning for Text-to-Image Generation | paper | CVPR 2021 | Google Research |
No. | Model Name | Title | Links | Pub. | Organization |
---|---|---|---|---|---|
0 | MVDNet | Robust Multimodal Vehicle Detection in Foggy Weather Using Complementary Lidar and Radar Signals | paper code | CVPR 2021 | University of California SanDiego |
1 | - | Multi-Modal Fusion Transformer for End-to-End Autonomous Driving | paper | CVPR 2021 | Max Planck Institute for Intelligent Systems |
No. | Model Name | Title | Links | Pub. | Organization |
---|---|---|---|---|---|
0 | VLN | Robust Multimodal Vehicle Detection in Foggy Weather Using Complementary Lidar and Radar Signals | paper code | CVPR 2021 | University of California SanDiego |
1 | SSM | Structured Scene Memory for Vision-Language Navigation | paper | CVPR 2021 | Beijing Institute of Technology |
No. | Model Name | Title | Links | Pub. | Organization |
---|---|---|---|---|---|
0 | - | Semantic-Aware Video Text Detection | paper | CVPR 2021 | National Laboratory of Pattern Recognition |
1 | TRBA | What If We Only Use Real Datasets for Scene Text Recognition?Toward Scene Text Recognition With Fewer Labels | paper code | CVPR 2021 | The University of Tokyo |
2 | Multiplexed TextSpotter | A Multiplexed Network for End-to-End, Multilingual OCR | paper | CVPR 2021 | Facebook AI |
3 | STKM | Self-attention based Text Knowledge Mining for Text Detection | paper | CVPR 2021 | Shenzhen University |
4 | TextOCR | TextOCR: Towards large-scale end-to-end reasoningfor arbitrary-shaped scene text | CVPR 2021 | Facebook AI Research |
No. | Model Name | Title | Links | Pub. | Organization |
---|---|---|---|---|---|
0 | - | Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval | paper | CVPR 2021 | Hunan University |
No. | Model Name | Title | Links | Pub. | Organization |
---|---|---|---|---|---|
0 | How2Sign: A Large-scale Multimodal Datasetfor Continuous American Sign Language | paper dataset | CVPR 2021 | Universitat Polit`ecnica de Catalunya |
No. | Model Name | Title | Links | Pub. | Organization |
---|---|---|---|---|---|
0 | Image Change Captioning by Learning from an Auxiliary Task | paper | CVPR 2021 | University of Manitoba | |
1 | UC^2 | UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training | paper | CVPR 2021 | University of California, Davis |
2 | - | How Transferable are Reasoning Patterns in VQA? | paper code | CVPR 2021 | INSA Lyon |
3 | M3p | M3P: Learning Universal Representations via Multitask MultilingualMultimodal Pre-training | paper | CVPR 2021 | HiT |
4 | CC12M | Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts | paper | CVPR 2021 | Google Research |
5 | - | Separatin Skills and Concepts for Novel Visual Questions Answering | paper | CVPR 2021 | UIUC |
6 | VinVL | VinVL: Revisiting Visual Representations in Vision-Language Models | paper code | CVPR 2021 | Microsoft |
7 | - | Domain-robus VQA with diverse datasets and methods but no target labels | paper | CVPR 2021 | University of Pittsburgh |
8 | PCME | Probabilistic Embeddings for Cross-Modal Retrieval | paper code | CVPR 2021 | NAVER AI Lab |
9 | - | Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers | paper | CVPR 2021 | DeepMind |
10 | TAP | TAP: Text-Aware Pre-training for Text-VQA and Text-Caption | paper | CVPR 2021 | University of Rochester |
11 | Causal Attention | Causal Attention for Vision-Language Tasks | paper code | CVPR 2021 | Nanyang Technological University,Singapore |
12 | VirTex | VirTex: Learning Visual Representations from Textual Annotations | paper | CVPR 2021 | University of Michigan |
13 | - | Predicting Human Scanpaths in Visual Question Answering | paper | CVPR 2021 | Univeristy of Minnesota |
14 | Kaleido-BERT | Kaleido-BERT: Vision-Language Pre-training on Fashion Domain | paper code | CVPR 2021 | Alibaba Group |
15 | - | Seeing Out of tHe bOx:End-to-End Pre-training for Vision-Language Representation Learning | paper | CVPR 2021 | Univeristy of Science and Technology Beijing |
16 | - | Learning by Planning: Language-Guided Global Image Editing | paper code | CVPR 2021 | University of Rochester |
17 | KRISP | KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA | paper code | ||
18 | - | Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval | paper | CVPR 2021 | Peking University |
No. | Model Name | Title | Links | Pub. | Organization |
---|---|---|---|---|---|
0 | ClipBERT | Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling | paper code | CVPR 2021 | UNC |
1 | - | SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Networkfor Video Reasoning over Traffic Events | paper code | Singapore University of Technology and Design | |
2 | - | Open-book Video Captioning with Retrieve-Copy-Generate Network | paper | CVPR 2021 | institute of Automation, Chinese Academy of Sciences |
3 | NExT-QA | NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions | paper code | CVPR 2021 | National University of Singapore |
4 | AGQA | AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning | paper | CVPR 2021 | Stanford University |
5 | - | Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering | paper | CVPR 2021 | Yonsei University, Souch Korea |
6 | - | Look Before you Speak: Visually Contextualized Utterances | paper | CVPR 2021 | Google Research |
No. | Model Name | Title | Links | Pub. | Organization |
---|---|---|---|---|---|
0 | - | Cross-Modal Center Loss for 3D Cross-Modal Retrieval | paper | CVPR 2021 | The City University of New York |
No. | Model Name | Title | Links | Pub. | Organization |
---|---|---|---|---|---|
0 | Vx2Text | VX2TEXT: End-to-End Learning of Video-Based Text GenerationFrom Multimodal Inputs | paper | CVPR 2021 | Columbia University |
No. | Model Name | Title | Links | Pub. | Organization |
---|---|---|---|---|---|
0 | cINNs | Stochastic Image-to-Video Synthesis using cINNs | paper | CVPR 2021 | Heidelberg University |
1 | Understanding Object Dynamics for Interactive Image-to-Video Synthesis | paper code | CVPR 2021 | Heidelberg University |
No. | Model Name | Title | Links | Pub. | Organization |
---|---|---|---|---|---|
0 | - | Can audio-visual integration strengthen robustnessunder multimodal attacks? | paper | CVPR 2021 | University of Rochester |
1 | - | Audio-Visual Instance Discrimination with Cross-Modal Agreement | paper | CVPR 2021 | UC San Diego |
2 | - | VISUALVOICE: Audio-Visual Speech Separation with Cross-Modal Consistency | paper code | CVPR 2021 | The University of Texas at Austin |
No. | Model Name | Title | Links | Pub. | Organization |
---|---|---|---|---|---|
0 | - | Collaborative Spatial-Temporal Modeling for Language-QueriedVideo Actor Segmentation | paper | CVPR 2021 | Chinese Academy of Sciences |