lipsync is a Python library that moves lips in a video (or image) to match a given audio file. It is based on Wav2Lip, but many unneeded files and libraries have been removed, and the code has been updated to work with the latest versions of Python.
-
Video lip synchronization
Synchronize lips in an existing video to match a new audio file. -
Image lip animation
Provide a single image and an audio file to generate a talking video. -
Runs on CPU and CUDA
You can choose whether to run on your CPU or a CUDA-enabled GPU for faster processing. -
Caching
If you use the same video multiple times with different audio files, lipsync can cache frames and reuse them. This makes future runs much faster.
lipsync works with two different pre-trained models:
-
Wav2Lip (Download wav2lip.pth)
- More accurate lip synchronization
- Lips in the result may appear somewhat blurred
-
Wav2Lip + GAN (Download wav2lip_gan.pth)
- Lips in the result are clearer
- Synchronization may be slightly less accurate
pip install lipsync
Below is a simple example in Python. This assumes you have the model weights (either wav2lip.pth
or wav2lip_gan.pth
) in a weights/
folder.
from lipsync import LipSync
lip = LipSync(
model='wav2lip',
checkpoint_path='weights/wav2lip.pth',
nosmooth=True,
device='cuda',
cache_dir='cache',
img_size=96,
save_cache=True,
)
lip.sync(
'source/person.mp4',
'source/audio.wav',
'result.mp4',
)
- model: Only
'wav2lip'
at the moment - checkpoint_path: Path to the model weights (e.g.,
wav2lip.pth
,wav2lip_gan.pth
) - nosmooth: Set
True
to disable smoothing - device:
'cpu'
or'cuda'
- cache_dir: Directory for saving frames
- save_cache: Set
True
to save frames tocache_dir
for faster re-runs
Please be mindful when using lipsync. This library can generate videos that look convincing, so it could be used to spread disinformation or harm someone’s reputation. We encourage using it only for entertainment or scientific purposes, and always with respect and consent from any people involved.
The software can only be used for personal/research/non-commercial purposes. Please cite the following paper if you have use this code:
@inproceedings{10.1145/3394171.3413532,
author = {Prajwal, K R and Mukhopadhyay, Rudrabha and Namboodiri, Vinay P. and Jawahar, C.V.},
title = {A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild},
year = {2020},
isbn = {9781450379885},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3394171.3413532},
doi = {10.1145/3394171.3413532},
booktitle = {Proceedings of the 28th ACM International Conference on Multimedia},
pages = {484–492},
numpages = {9},
keywords = {lip sync, talking face generation, video generation},
location = {Seattle, WA, USA},
series = {MM '20}
}