AI City 2023: Comprehensive Visual Features and Pseudo Labeling for Robust Natural Language-based Vehicle Retrieval
The 3rd Place Solution to The 7th NVIDIA AI City Challenge (2023) Track 2: Tracked-Vehicle Retrieval by Natural Language Descriptions.
The 3nd Place Submission to AICity Challenge 2023 Tracked-Vehicle Retrieval by Natural Language Descriptions
Rank | Team ID | Team name | MRR Score |
---|---|---|---|
1 | 9 | HCMIU-CVIP | 0.8263 |
2 | 28 | IOV | 0.8179 |
3 | 85 | AIO-NLRetrieve (Ours) | 0.4795 |
4 | 151 | AIO2022 | 0.4659 |
5 | 76 | DUT_ReID | 0.4392 |
- Preprocess the dataset to prepare
frames, motion maps, video clips
scripts/extract_vdo_frms.py
is a Python script that is used to extract frames.
scripts/get_motion_maps.py
is a Python script that is used to get motion maps.
scripts/get_clip_maps.py
is a Python script that is used to get video clips.
scripts/extract_clip_feature_tracks.ipynb
is a Python script that is used to get clip features
- You can download the necessary files here
The directory structures in data is as follows
data/
└── AIC23_Track2_NL_Retrieval/
└── data/
├── clip_feats/
├── data/
│ ├── bk_map/
│ ├── motion_heatmap/
│ └── motion_map_iou/
├── train/
│ ├── S01/
│ ├── S03/
│ ├── S04/
├── validation/
│ ├── S02/
│ ├── S05/
├── train-tracks_nlpaug.json
├── train-tracks_nlpaug_2.json
├── train-tracks_nlpaug_3.json
└── ...
The configuration files are in configs
and train different models by (set up the right data path first):
bash run/single_baseline_aug1.sh
bash run/single_baseline_aug1_plus.sh
bash run/single_baseline_aug2.sh
bash run/circle_loss.sh
bash run/view_triplet_hard.sh
bash run/dual_baseline_aug3.sh
You can also change the RESTORE_FROM
in your configuration file to checkpoints, and load checkpoints to eval (download the checkpoints first).
Take dual_baseline_aug1
as an example:
bash run/eval_only.sh
Change the RESTORE_FROM
in your configuration file and extract the embeddings
In addition, acquire the car and text features used in short-distance modeling by running the following code:
-
Run
python3 scripts/get_location_info.py
to generate location information for each camera, which will be used in our post-processing stage. -
Run
python3 scripts/get_relation_info.py
to generate relationship features for test tracks, which will be used in our post-processing stage.
base run/submit.py
Copy the sim_mat.npy
file generated from previous step and paste it to post_process_module/post_process/post-process-part1/sim_mat
folder.
run the post_process.py
files in 2 post procesing modules. See post_processing module for more details.
If you use this method or this code in your research, please cite as:
@InProceedings{Ngo_2023_CVPR,
author = {Ngo, Bach Hoang and Nguyen, Dat Thanh and Do-Tran, Nhat-Tuong and Thien, Phuc Pham Huy and An, Minh-Hung and Nguyen, Tuan-Ngoc and Hoang, Loi Nguyen and Nguyen, Vinh Dinh and Dinh, Vinh},
title = {Comprehensive Visual Features and Pseudo Labeling for Robust Natural Language-Based Vehicle Retrieval},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2023},
pages = {5409-5418}
}