Taking AWS keywords as an example, this project introduces how to cover up the "aws" logo and the word "aws" that appear in the video of the aws meeting. Specifically, mask covering or blurring can be used.
This item is processed by mask. Specifically, use the PaddleOCR library to perform text recognition on the image of each frame of the video. If "recognized string".lower() == "aws", then blacken the area corresponding to the string (b, g, r)=(0, 0, 0).
The specific roadmap is as follows:
1.1 Create and activate conda environment
conda create -n video_process python=3.8 -y
conda activate video_process
1.2 Install dependencies
Install Paddle framework.
The cpu version of paddle is installed here. You can choose to install the appropriate version according to your machine environment. Refer to the link:PaddlePaddle Installation
python -m pip install paddlepaddle==2.4.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
Clone the repo and install revolved packages.
There are two channels for Github and Gitee to provide downloads, domestic friends can copy this repo through Gitee.
git clone "this repo address"
cd "this repo main directory"
pip install -r requirements.txt
Enter the main directory, create a new one and enter the pretrained_model
folder, download two models about this task below
2.1 Download English text detection and character recognition model
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar
tar xf en_PP-OCRv3_det_infer.tar
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar
tar xf en_PP-OCRv3_rec_infer.tar
If you would like to download other language detection and recognition models, please refer to PP-OCR series model list
3.1 Perform end-to-end inference of image folders
The input is the image folder to be predicted, and the output is multiple predicted images. Visual recognition results are saved to the ./inference_results folder by default.
python3 tools/infer_keyword/infer_end_to_end.py \
--keyword="aws" \
--image_dir=/home/jackdance/Desktop/aws_video/some_frame \
--det_model_dir="./pretrained_model/en_PP-OCRv3_det_infer/" \
--rec_model_dir="./pretrained_model/en_PP-OCRv3_rec_infer/" \
--rec_char_dict_path="ppocr/utils/en_dict.txt" \
--use_mp=True \
--total_process_num=8
Parameter comment:
keyword
: Keywords that need to be replaced or blocked (only English keywords can be specified here, if Chinese is specified, it is necessary to download the Chinese text detection and recognition model and modify the character set path for text recognition)image_dir
: input image foldervideo
: input videodet_model_dir
: the path to text detection modelrec_model_dir
: the path to text recognition modelrec_char_dict_path
: the path to the text recognition character set,ppocr/utils/en_dict.txt
is just for English, other language character set can be found inppocr/utils
.use_mp
: whether to enable multiprocessingtotal_process_num
: numbers of processes when using multiprocessing
3.2 Perform end-to-end inference of video
The input is a single video and the output is a processed single video
PS: input video sample Extraction code: f93p
python3 tools/infer_keyword/infer_end_to_end.py \
--keyword="aws" \
--video=/home/jackdance/Desktop/aws_video/aws_first_2mins.mp4 \
--det_model_dir="./pretrained_model/en_PP-OCRv3_det_infer/" \
--rec_model_dir="./pretrained_model/en_PP-OCRv3_rec_infer/" \
--rec_char_dict_path="ppocr/utils/en_dict.txt" \
--use_mp=True \
--total_process_num=8
Use Dockerfile
to build or directly pull
the Image.
Method 1: use Dockerfile
to build
# build dockerfile
docker build -t video_process:v0.2 .
# run Image
docker run -it \
--gpus "device=0" \
-v the path to host:the path to docker container \
-p 5002:5002 \
--privileged=True \
--name video_process \
video_process:v0.2 \
/bin/bash
Method 2: Directly pull
the Image
# pull from dockerhub
docker pull jackdance/video_process:v0.2
# run Image
docker run -it \
--gpus "device=0" \
-v the path to host:the path to docker container \
-p 5002:5002 \
--privileged=True \
--name video_process \
video_process:v0.2 \
/bin/bash
The first picture is a frame picture with aws characters in the original video, and the second picture is the corresponding processed picture
-
December 8, 2022
Merge image folder inference and video inference into one script.
-
December 9, 2022
Add additional language detection and recognition models.
-
December 12, 2022
💃 Add Docker deployment.
-
December 13, 2022
🕺 Merge audio to video.
If you also like this project, you may wish to give a star
(^.^)✨ . If any questions, you can raise issue
~