PaddleMIX Inference Deployment

PaddleMIX utilizes Paddle Inference and provides a Python-based deployment solution. There are two deployment methods:

APPflow Deployment:
- By setting the static_mode = True variable in APPflow, you can enable static graph inference. Additionally, you can accelerate inference using TensorRT. Note that not all models support static graph or TensorRT. Please refer to the Multi Modal And Scenario section for specific model support.
Single Model Deployment:

For APPflow usage, you can set the static_mode = True variable to enable static graph inference and optionally accelerate inference using TensorRT.

1.1 Exmaples

>>> from paddlemix.appflow import Appflow
>>> from PIL import Image

>>> task = Appflow(app="openset_det_sam",
                   models=["GroundingDino/groundingdino-swint-ogc","Sam/SamVitH-1024"],
                   static_mode=True,
                   precision="fp32")
>>> image_pil = Image.open("beauty.png").convert("RGB")
>>> result = task(image=image_pil,prompt="women")

1.2 Parameter Explanation

Parameter	Required?	Meaning
--app	Yes	Application name
--models	Yes	Model(s) used. Can be one model, or multiple models
--static_mode	Optional	Whether to use static graph inference, default to False
--precision	Optional	When `static_mode == True`, it defaults to using FP32. You can optionally select `trt_fp32` or `trt_fp16`.

Instructions：

Some models do not support static graph or TensorRT. For specific information, please refer to Multi Modal And Scenario.
The generated static graph will be located in the folder corresponding to the model name, for example: GroundingDino/groundingdino-swint-ogc/.

2. Single Model Prediction Deployment

Python-based prediction deployment mainly involves two steps:

Exporting the predictive model
Performing prediction using Python

Currently supported models:

blip2
groundingdino
sam
qwen_vl

Using groundingdino as an exmaple.

2.1 Exporting Predictive Model

cd deploy/groundingdino
# 导出groundingdino模型
python export.py \
--dino_type GroundingDino/groundingdino-swint-ogc

Will be exported to the following directory, including model_state.pdiparams, model_state.pdiparams.info, model_state.pdmodeland other files.

2.2 Python-based Inference

 python predict.py  \
 --text_encoder_type GroundingDino/groundingdino-swint-ogc \
 --model_path output_groundingdino/GroundingDino/groundingdino-swint-ogc \
 --input_image https://bj.bcebos.com/v1/paddlenlp/models/community/GroundingDino/000000004505.jpg \
 --output_dir ./groundingdino_predict_output \
 --prompt "bus"

3. BenchMark

Note: environment Paddle 3.0 PaddleMIX release/2.0 PaddleNLP 2.7.2 A100 80G。

3.1 benchmark cmd

Add -- benchmark after running in the 'deploy' corresponding model directory to obtain the running time of the model. example: GroundingDino benchmark：

 cd deploy/groundingdino
 python predict.py  \
 --text_encoder_type GroundingDino/groundingdino-swint-ogc \
 --model_path output_groundingdino/GroundingDino/groundingdino-swint-ogc \
 --input_image https://bj.bcebos.com/v1/paddlenlp/models/community/GroundingDino/000000004505.jpg \
 --output_dir ./groundingdino_predict_output \
 --prompt "bus" \
 --benchmark True

Model	image size	dtype	Paddle Deploy
qwen-vl-7b	448*448	fp16	669.8 ms
llava-1.5-7b	336*336	fp16	981.2 ms
llava-1.6-7b	336*336	fp16	778.7 ms
groundingDino/groundingdino-swint-ogc	800*1193	fp32	100 ms
Sam/SamVitH-1024	1024*1024	fp32	121 ms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_en.md

README_en.md

PaddleMIX Inference Deployment

1.1 Exmaples

1.2 Parameter Explanation

2. Single Model Prediction Deployment

2.1 Exporting Predictive Model

2.2 Python-based Inference

3. BenchMark

3.1 benchmark cmd

Files

README_en.md

Latest commit

History

README_en.md

File metadata and controls

PaddleMIX Inference Deployment

1.1 Exmaples

1.2 Parameter Explanation

2. Single Model Prediction Deployment

2.1 Exporting Predictive Model

2.2 Python-based Inference

3. BenchMark

3.1 benchmark cmd