GitHub - JackDance/text_occlusion_in_video: 🔥借助于PaddleOCR，通过文字识别的方式，对输入视频中的关键字进行文字替换或遮盖（With the help of PaddleOCR, replace or cover the keywords in the input video through text recognition）

Background Introduction

Taking AWS keywords as an example, this project introduces how to cover up the "aws" logo and the word "aws" that appear in the video of the aws meeting. Specifically, mask covering or blurring can be used.

Technical route

This item is processed by mask. Specifically, use the PaddleOCR library to perform text recognition on the image of each frame of the video. If "recognized string".lower() == "aws", then blacken the area corresponding to the string (b, g, r)=(0, 0, 0).

The specific roadmap is as follows：

Run Steps

1. Environment configuration

1.1 Create and activate conda environment

conda create -n video_process python=3.8 -y
conda activate video_process

1.2 Install dependencies

Install Paddle framework.

The cpu version of paddle is installed here. You can choose to install the appropriate version according to your machine environment. Refer to the link：PaddlePaddle Installation

python -m pip install paddlepaddle==2.4.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

Clone the repo and install revolved packages.

There are two channels for Github and Gitee to provide downloads, domestic friends can copy this repo through Gitee.

git clone "this repo address"
cd "this repo main directory"
pip install -r requirements.txt

2. Download pretrained models

Enter the main directory, create a new one and enter the pretrained_model folder, download two models about this task below

2.1 Download English text detection and character recognition model

wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar
tar xf en_PP-OCRv3_det_infer.tar

wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar
tar xf en_PP-OCRv3_rec_infer.tar

If you would like to download other language detection and recognition models, please refer to PP-OCR series model list

3. Perform Inference

3.1 Perform end-to-end inference of image folders

The input is the image folder to be predicted, and the output is multiple predicted images. Visual recognition results are saved to the ./inference_results folder by default.

python3 tools/infer_keyword/infer_end_to_end.py \
--keyword="aws" \
--image_dir=/home/jackdance/Desktop/aws_video/some_frame \
--det_model_dir="./pretrained_model/en_PP-OCRv3_det_infer/" \
--rec_model_dir="./pretrained_model/en_PP-OCRv3_rec_infer/" \
--rec_char_dict_path="ppocr/utils/en_dict.txt" \
--use_mp=True \
--total_process_num=8

Parameter comment：

keyword: Keywords that need to be replaced or blocked (only English keywords can be specified here, if Chinese is specified, it is necessary to download the Chinese text detection and recognition model and modify the character set path for text recognition)
image_dir: input image folder
video: input video
det_model_dir: the path to text detection model
rec_model_dir: the path to text recognition model
rec_char_dict_path: the path to the text recognition character set, ppocr/utils/en_dict.txt is just for English, other language character set can be found in ppocr/utils.
use_mp: whether to enable multiprocessing
total_process_num: numbers of processes when using multiprocessing

3.2 Perform end-to-end inference of video

The input is a single video and the output is a processed single video

PS: input video sample Extraction code: f93p

python3 tools/infer_keyword/infer_end_to_end.py \
--keyword="aws" \
--video=/home/jackdance/Desktop/aws_video/aws_first_2mins.mp4 \
--det_model_dir="./pretrained_model/en_PP-OCRv3_det_infer/" \
--rec_model_dir="./pretrained_model/en_PP-OCRv3_rec_infer/" \
--rec_char_dict_path="ppocr/utils/en_dict.txt" \
--use_mp=True \
--total_process_num=8

Docker deployment

Use Dockerfileto build or directly pull the Image.

Method 1: use Dockerfile to build

# build dockerfile
docker build -t video_process:v0.2 .

# run Image
docker run -it \
--gpus "device=0" \
-v the path to host:the path to docker container \
-p 5002:5002 \
--privileged=True \
--name video_process \
video_process:v0.2 \
/bin/bash

Method 2: Directly pull the Image

# pull from dockerhub
docker pull jackdance/video_process:v0.2

# run Image
docker run -it \
--gpus "device=0" \
-v the path to host:the path to docker container \
-p 5002:5002 \
--privileged=True \
--name video_process \
video_process:v0.2 \
/bin/bash

Sample result

The first picture is a frame picture with aws characters in the original video, and the second picture is the corresponding processed picture

Update record

December 8, 2022

Merge image folder inference and video inference into one script.
December 9, 2022

Add additional language detection and recognition models.
December 12, 2022

💃 Add Docker deployment.
December 13, 2022

🕺 Merge audio to video.

If you also like this project, you may wish to give a star (^.^)✨ . If any questions, you can raise issue~

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
PPOCRLabel		PPOCRLabel
StyleText		StyleText
benchmark		benchmark
configs		configs
deploy		deploy
doc		doc
ppocr		ppocr
ppstructure		ppstructure
test_tipc		test_tipc
tools		tools
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
__init__.py		__init__.py
paddleocr.py		paddleocr.py
requirements.txt		requirements.txt
setup.py		setup.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Background Introduction

Technical route

Run Steps

1. Environment configuration

2. Download pretrained models

3. Perform Inference

Docker deployment

Sample result

Update record

About

Releases

Packages

Languages

License

JackDance/text_occlusion_in_video

Folders and files

Latest commit

History

Repository files navigation

Background Introduction

Technical route

Run Steps

1. Environment configuration

2. Download pretrained models

3. Perform Inference

Docker deployment

Sample result

Update record

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages