An Efficient Framework for Text-to-Image Retrieval Using Complete Feature Aggregation and Cross-Knowledge Conversion
Official Implementation of the paper An Efficient Framework for Text-to-Image Retrieval Using Complete Feature Aggregation and Cross-Knowledge Conversion.
- (14/11/2023) Code released!
-
Download CUHK-PEDES dataset from here, ICFG-PEDES dataset from here and RSTPReid dataset from here.
-
Organize them in your data directory as follows:
|-- your dataset root dir/
| |-- <CUHK-PEDES>/
| |-- imgs
| |-- cam_a
| |-- cam_b
| |-- ...
| |-- reid_raw.json
|
| |-- <ICFG-PEDES>/
| |-- imgs
| |-- test
| |-- train
| |-- ICFG-PEDES.json
|
| |-- <RSTPReid>/
| |-- imgs
| |-- data_captions.json
- Run the following scripts:
python data_process.py --dataset_name "CUHK-PEDES" --dataset_root_dir [CUHK-PEDES DIR]
python data_process.py --dataset_name "ICFG-PEDES" --dataset_root_dir [ICFG-PEDES DIR]
python data_process.py --dataset_name "RSTPReid" --dataset_root_dir [RSTPReid DIR]
Training
python3 Retrieval.py --config "your/config/file" --checkpoint "your/checkpoint/file" --output_dir "/your.output/file" --pick_best_r1
Evaluate
python3 Retrieval.py --config "your/config/file" --checkpoint "your/checkpoint/file" --output_dir "/your.output/file" --evaluate
The checkpoints can be found in this drive
This paper is sponsored by AI VIETNAM. The implementation of this paper relies on resources from X2-VLM and timm. We sincerely appreciate the original authors for their open-sourcing.
If you finds thiscode useful for your research, please cite our paper.