Skip to content

Latest commit

 

History

History
83 lines (69 loc) · 8.39 KB

install.md

File metadata and controls

83 lines (69 loc) · 8.39 KB

Installment

Environment

git clone https://github.com/showlab/UniVTG
cd UniVTG

conda create --name univtg python=3.8
pip install -r requirements.txt

Datasets

An engineering contribution is that we unify most video temporal tasks by the same features, which makes pre-training or cross-training flexible.

  1. Download the features and metadata for pertaining and downstream datasets. (skip pretraining if not needed)
Dataset Task Metadata Video (Slowfast R50) Video (CLIP B/32) Text (CLIP B/32)
Point (Ego4D) PT 548 MB 27.1 GB 5.7 GB 30.7 GB
Interval (VideoCC) PT 155 MB 300 GB 62.5 GB 12.6 GB
Curve (VideoCC) PT 3.8GB 👆 👆 132 MB
QVHighlights MR + HL 5 MB 4.0 GB 940 MB 172 MB
Charades-STA MR 4 MB 1.3 GB 305 MB 178 MB
NLQ MR 3 MB 1.8 GB 404 MB 184 MB
TACoS MR 2 MB 81 MB 18 MB 244 MB
YoutubeHL HL 1 MB 427 MB 95 MB 2 MB
TVSum HL 1 MB 28 MB 6 MB 1 MB
QFVS VS 1MB 455 MB 👈 1MB
ActivityNet (optional) MR 10 MB 4.5 GB 1.0 GB 958 MB
DiDeMo (optional) MR 6 MB 1.1 GB 269 MB 443 MB
HACS (optional) MR 15 MB 13.1 GB 3.0 GB 177 MB
COIN (optional) MR 8 MB 2.3 GB 556 MB 30 MB
  1. Unzip the downloaded tar by
tar -xvf {tar_name}.tar
mv data/home/qinghonglin/univtg/data/{dset_name}/* .  # Replace dset_name accordingly

For VideoCC Slowfast features, first group multiple sub-zips into the same one, then unzip it.

gunzip vid_slowfast_*.gz
cat vid_slowfast_* > vid_slowfast.tar
  1. Organize the data / features in the following structure

    univtg
    ├── eval
    ├── data
    │   ├── qfvs
    │   ├── tvsum
    │   ├── youtube
    │   ├── tacos
    │   ├── ego4d
    │   ├── charades
    │   │   ├── metadata
    │   │   │   ├──charades_test.jsonl
    │   │   │   └──charades_train.jsonl
    │   │   ├── txt_clip
    │   │   ├── vid_clip
    │   │   └── vid_slowfast
    │   └── qvhighlights
    │       ├── metadata
    │       │   ├──qvhighlights_test.jsonl
    │       │   ├──qvhighlights_train.jsonl
    │       │   └──qvhighlights_val.jsonl
    │       ├── txt_clip
    │       ├── vid_clip
    │       └── vid_slowfast
    ├── main
    ├── model
    ├── utils
    ├── README.md
    └── ···
  2. (Optional) We extract video features (Slowfast R/50 and CLIP B/32) based on this repo: HERO_Video_Feature_Extractor, you can use it extract other benchmarks or videos; We extract text features (CLIP B/32) by run_on_video/text_extractor.py