You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TODO: Modify and run save_vq_tokens.py to tokenize RGB videos.
save_vq_tokens.py is the file which you run in 4M to pretokenize images, e.g., to go from images of the modality rgb to examples of the modality tok_rgb. It takes a pretrained tokenizer and input dataset directory (among other things) and applies the tokenizer to the images in the input dataset to create the tokens in a new output dataset directory.
We want to get tokens for the rgb videos, so going from and input directory of root/video_rgb to the tokenized examples in the output directory root/video_tok_rgb.
The steps to do this are to modify save_vq_tokens.py to have the following capabilities:
you can run the pretrained rgb tokenizer on each frame of each video.
It saves the tokens as .npy files in webdataset format in the directory root/video_tok_rgb.
Proposed input directory format:
root/video_rgb/shard-00000.tar
| ├── 00000.mp4 # this corresponds to one video.
| ├── 00001.mp4
| └── ...
Proposed output directory format:
root/video_tok_rgb/shard-00000.tar
| ├── 00000.npy # this corresponds to one video. shape: something like (num_frames, H, C, W)
| ├── 00001.npy
| └── ...
Definition of Done:
we have a script which can, given an input directory (e.g. video_rgb), pretrained tokenizer (e.g., from https://huggingface.co/collections/EPFL-VILAB/4m-tokenizers-66019388bda47e9bcff3f887), and output directory (e.g., video_tok_rgb), generate the tokenized representations of those videos according to the structure above saved to the output directory.
This script is run and we actually have tokenized rgb videos in root/video_tok_rgb.
kdu4108
changed the title
Modify and run save_vq_tokens.py to tokenize RGB videos.
Transform from video_rgb format into video_tok_rgb format and save in video_tok_rgb/ directory.
Jul 4, 2024
TODO: Modify and run save_vq_tokens.py to tokenize RGB videos.
save_vq_tokens.py
is the file which you run in 4M to pretokenize images, e.g., to go from images of the modalityrgb
to examples of the modalitytok_rgb
. It takes a pretrained tokenizer and input dataset directory (among other things) and applies the tokenizer to the images in the input dataset to create the tokens in a new output dataset directory.We want to get tokens for the rgb videos, so going from and input directory of
root/video_rgb
to the tokenized examples in the output directoryroot/video_tok_rgb
.The steps to do this are to modify
save_vq_tokens.py
to have the following capabilities:video_rgb
modality directory proposed in this post [PARENT ISSUE] Data preprocessing and pseudolabeling #3 (comment))..npy
files in webdataset format in the directoryroot/video_tok_rgb
.Proposed input directory format:
Proposed output directory format:
Definition of Done:
video_rgb
), pretrained tokenizer (e.g., from https://huggingface.co/collections/EPFL-VILAB/4m-tokenizers-66019388bda47e9bcff3f887), and output directory (e.g.,video_tok_rgb
), generate the tokenized representations of those videos according to the structure above saved to the output directory.root/video_tok_rgb
.(This is a subtask of #3)
The text was updated successfully, but these errors were encountered: