DVGFormer: Learning Camera Movement Control from Real-World Drone Videos

Paper | Project Page | Github | Data

Official implementation of our paper:
Learning Camera Movement Control from Real-World Drone Videos
Yunzhong Hou, Liang Zheng, Philip Torr

"To record as is, not to create from scratch."

Abstract: This study seeks to automate camera movement control for filming existing subjects into attractive videos, contrasting with the creation of non-existent content by directly generating the pixels. We select drone videos as our test case due to their rich and challenging motion patterns, distinctive viewing angles, and precise controls. Existing AI videography methods struggle with limited appearance diversity in simulation training, high costs of recording expert operations, and difficulties in designing heuristic-based goals to cover all scenarios. To avoid these issues, we propose a scalable method that involves collecting real-world training data to improve diversity, extracting camera trajectories automatically to minimize annotation costs, and training an effective architecture that does not rely on heuristics. Specifically, we collect 99k high-quality trajectories by running 3D reconstruction on online videos, connecting camera poses from consecutive frames to formulate 3D camera paths, and using Kalman filter to identify and remove low-quality data. Moreover, we introduce DVGFormer, an auto-regressive transformer that leverages the camera path and images from all past frames to predict camera movement in the next frame. We evaluate our system across 38 synthetic natural scenes and 7 real city 3D scans. We show that our system effectively learns to perform challenging camera movements such as navigating through obstacles, maintaining low altitude to increase perceived speed, and orbiting tower and buildings, which are very useful for recording high-quality videos.

Project Updates

🔥🔥 News: 2024/12/22: We have released model checkpoints and dataset on huggingface.
🔥 News: 2024/12/13: We have released the DroneMotion-99k dataset. Check out the README file for the steps needed after downloading the HDF5 archieve.
🔥 News: 2024/12/13: We have released the the code for the DVGFormer!
🔥 News: 2024/12/13: Our paper is now online!

Todo List

Initialize repo
Code release
Blender scene files for evaluation
DroneMotion-99k dataset
- HDF5 archieve of filtered 3D camera trajectories
- scripts for downloading the corresponding YouTube videos
Release model checkpoints

Model Checkpoint

Please refer to huggingface for checkpoint download link.

Installation

Create and activate a Conda environment:

conda create -n dvgformer python=3.10
conda activate dvgformer
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -c conda-forge ffmpeg
pip install -r requirements.txt

Download evaluation data

For real city 3D scans from Google Earth, please download from this link.

For synthetic natural scenes, you can either generate your own version from the official git repo princeton-vl/infinigen or directly download from this link. Note that our version has very basic graphic settings and you might need to generate your own version if you need higher graphics.

After downloading the evaluation environments, your folder should look like this
```
dvgformer/
├── infinigen/
│   ├── arctic/
│   ...
│   └── snowy_mountain/
├── blosm/
│   ├── himeji/
│   ...
│   └── sydney/
├── src/
├── README.md
...
```
Download training data

We provide the Colmap 3D reconstruction results and the filtered camera movement sequences in our DroneMotion-99k dataset. You can download either a minimal dataset with 10 videos and 129 sequences link or the full dataset with 13,653 videos and 99,003 camera trajectories link.

After downloading the training data, your folder should look like this
```
dvgformer/
├── youtube_drone_videos/
│   ├── dataset_full.h5
│   └── dataset_mini.h5
├── src/
├── README.md
...
```
Due to the YouTube policy, we cannot share the video MP4s or the frames. As an alternative, we include a python script download_videos.py that can help you automatically download the videos and extract the frames.
```
python download_videos.py --hdf5_fpath youtube_drone_videos/dataset_mini.h5
python download_videos.py --hdf5_fpath youtube_drone_videos/dataset_full.h5
```
This should update your downloaded HDF5 dataset file with the video frames.

You can also adjust the number of workers for the download process or the frame extraction process in download_videos.py by specifying --num_download_workers or --num_extract_workers.

Running DVGFormer Model

Inference: You can download the model checkpoint from huggingface. You can also directly load the pretrained model from this code

import torch
from src.models import DVGFormerModel

model = DVGFormerModel.from_pretrained(
    'yunzhong-hou/DVGFormer'
    ).to('cuda').to(torch.bfloat16)

For blender evaluation, you can run the following script.

python blender_eval.py

Train your own model: We use two RTX 3090 in our experiments. Please run the following script for training your own model.
```
bash run_gpu01.sh
```

Citation

If you find this project useful, please consider citing:

@article{hou2024dvgformer,
  author    = {Hou, Yunzhong and Zheng, Liang and Torr, Philip},
  title     = {Learning Camera Movement Control from Real-World Drone Videos},
  journal   = {arXiv preprint},
  year      = {2024},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DVGFormer: Learning Camera Movement Control from Real-World Drone Videos

Project Updates

Todo List

Model Checkpoint

Installation

Running DVGFormer Model

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
blosm		blosm
infinigen		infinigen
src		src
youtube_drone_videos		youtube_drone_videos
.gitignore		.gitignore
README.md		README.md
blender_eval.py		blender_eval.py
download_videos.py		download_videos.py
requirements.txt		requirements.txt
run_gpu01.sh		run_gpu01.sh
train.py		train.py

hou-yz/dvgformer

Folders and files

Latest commit

History

Repository files navigation

DVGFormer: Learning Camera Movement Control from Real-World Drone Videos

Project Updates

Todo List

Model Checkpoint

Installation

Running DVGFormer Model

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages