This repository is the official implementation of the paper FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding. (WACV 2025)
Zhuo Cao, Bingqing Zhang, Heming Du, Xin Yu, Xue Li, Sen Wang
The University of Queensland, Australia
Preparation | Training | Inference and Evaluation | Model Zoo
-
Set up the environment for running the experiments.
-
Clone this repository.
git clone https://github.com/Zhuo-Cao/FlashVTG.git
-
Download the packages we used for training. Python version 3.12.2 is required for reproduce.
pip install -r requirements.txt
-
-
Download datasets
Download QVHighlights and other datasets, please follow the instruction of CGDETR.
For feature extracted by InternVideo2, you can download from Hugging Face.
We provide training scripts for all datasets in FlashVTG/scripts/
directory.
For Internvideo2 feature:
bash FlashVTG/scripts/qv_internvideo2/train.sh
For SlowFast+CLIP feature:
bash FlashVTG/scripts/train_qv_slowclip.sh
For Internvideo2 feature:
bash FlashVTG/scripts/charades_sta_internvideo2/train.sh
For VGG feature:
bash FlashVTG/scripts/charades_sta/train_vgg.sh
bash FlashVTG/scripts/tacos/train.sh
bash FlashVTG/scripts/tvsum/train.sh
bash FlashVTG/scripts/youtube_uni/train.sh
Using inference.sh
to do inference. Hint: data/MR.py
for Moment Retrieval task and data/HD.py
for Highlight Detection task. Here is a sample shows how to use inference.sh
.
bash FlashVTG/scripts/inference.sh data/MR.py results/QVHihlights_IV2/model_best.ckpt 'val'
For QVHighlights test set, you could do the evaluation on codalab. For more details, check standalone_eval/README.md.
We provide multiple checkpoints and training logs here. Configuration can be find in each opt.json
file.
Dataset | Model file |
---|---|
QVHighlights (Slowfast + CLIP) | checkpoint and trainng log |
QVHighlights (InternVideo2) | checkpoint and trainng log |
Charades (InternVideo2) | checkpoint and trainng log |
Charades (VGG) | checkpoint and trainng log |
TACoS | checkpoint and trainng log |
This work is supported by Australian Research Council (ARC) Discovery Project DP230101753 and the code is based on CGDETR.