Skip to content

Latest commit

 

History

History
117 lines (83 loc) · 4.27 KB

README.md

File metadata and controls

117 lines (83 loc) · 4.27 KB

Keyframes-GAN (IEEE TMM 2023)

arXiv GitHub Stars

Table of Contents

Overview

Inference example

This is the official repo of the paper Perceptual Quality Improvement in Videoconferencing using Keyframes-based GAN.

In this work we propose a novel GAN architecture for compression artifacts reduction in videoconferencing. In this context, the speaker is typically in front of the camera and remains the same for the entire duration of the transmission. With this assumption, we can maintain a set of reference keyframes of the person from the higher quality I-frames that are transmitted within the video streams. First, we extract multi-scale features from the compressed and reference frames. Then, these features are combined in a progressive manner with Adaptive Spatial Feature Fusion blocks based on facial landmarks and with Spatial Feature Transform blocks. This allows to restore the high frequency details lost after the video compression.

Architecture

Prerequisites and Installation

  1. Clone the repo
git clone https://github.com/LorenzoAgnolucci/Keyframes-GAN.git
  1. Create a virtual env and install all the dependencies with
pip install -r requirements.txt
  1. Even if it is not required, we strongly recommend to install dlib with GPU support

  2. For metrics computation, you need to run

pip install -e pybrisque/
  1. Download the pretrained models

and move them inside the pretrained_models folder

Usage

For testing, you need one or more HQ mp4 videos. These videos will be compressed with a given CRF. The face from each frame will be cropped, aligned and then restored with our model exploiting HQ keyframes.

Testing

  1. Move the HQ videos under a directory named {BASE_PATH}/original/

  2. Run

python preprocessing.py --base_path {BASE_PATH} --crf 42

where crf is a given Constant Rate Factor (default 42)

  1. Run
python video_inference.py --base_path {BASE_PATH} --crf 42 --max_keyframes 5

where crf must be equal to the one of step 2 and max_keyframes is the max cardinality of the set of keyframes (default 5)

  1. If needed, run
python compute_metrics.py --gt_path {BASE_PATH}/original --inference_path inference/DMSASFFNet/max_keyframes_5/LFU

where gt_path is the directory that contains the HQ videos and inference_path is the directory that contains the restored frames

Training

  1. Modify the file BasicSR/options/train/DMSASFFNet/train_DMSASFFNet.yml to indicate the path of your training and validation datasets

  2. Start training by running the following command with BasicSR as the current working directory:

python basicsr/train.py -opt options/train/DMSASFFNet/train_DMSASFFNet.yml

Please refer to BasicSR for more information on the fields of the options file.

Citation

@article{agnolucci2023perceptual,
  title={Perceptual quality improvement in videoconferencing using keyframes-based {GAN}},
  author={Agnolucci, Lorenzo and Galteri, Leonardo and Bertini, Marco and Del Bimbo, Alberto},
  journal={IEEE Transactions on Multimedia},
  volume={26},
  pages={339--352},
  year={2023},
  publisher={IEEE}
}

Acknowledgments

We rely on BasicSR for the implementation of our model and for metrics computation.