Skip to content

Latest commit

 

History

History
108 lines (82 loc) · 4.31 KB

README_en.md

File metadata and controls

108 lines (82 loc) · 4.31 KB

PaddleMIX Inference Deployment

[中文文档]

PaddleMIX utilizes Paddle Inference and provides a Python-based deployment solution. There are two deployment methods:

  1. APPflow Deployment:

    • By setting the static_mode = True variable in APPflow, you can enable static graph inference. Additionally, you can accelerate inference using TensorRT. Note that not all models support static graph or TensorRT. Please refer to the Multi Modal And Scenario section for specific model support.
  2. Single Model Deployment:

For APPflow usage, you can set the static_mode = True variable to enable static graph inference and optionally accelerate inference using TensorRT.

1.1 Exmaples

>>> from paddlemix.appflow import Appflow
>>> from PIL import Image

>>> task = Appflow(app="openset_det_sam",
                   models=["GroundingDino/groundingdino-swint-ogc","Sam/SamVitH-1024"],
                   static_mode=True,
                   precision="fp32")
>>> image_pil = Image.open("beauty.png").convert("RGB")
>>> result = task(image=image_pil,prompt="women")

1.2 Parameter Explanation

Parameter Required? Meaning
--app Yes Application name
--models Yes Model(s) used. Can be one model, or multiple models
--static_mode Optional Whether to use static graph inference, default to False
--precision Optional When static_mode == True, it defaults to using FP32. You can optionally select trt_fp32 or trt_fp16.

Instructions:

  • Some models do not support static graph or TensorRT. For specific information, please refer to Multi Modal And Scenario.

  • The generated static graph will be located in the folder corresponding to the model name, for example: GroundingDino/groundingdino-swint-ogc/.

2. Single Model Prediction Deployment

Python-based prediction deployment mainly involves two steps:

  • Exporting the predictive model
  • Performing prediction using Python

Currently supported models:

Using groundingdino as an exmaple.

2.1 Exporting Predictive Model

cd deploy/groundingdino
# 导出groundingdino模型
python export.py \
--dino_type GroundingDino/groundingdino-swint-ogc

Will be exported to the following directory, including model_state.pdiparams, model_state.pdiparams.info, model_state.pdmodeland other files.

2.2 Python-based Inference

 python predict.py  \
 --text_encoder_type GroundingDino/groundingdino-swint-ogc \
 --model_path output_groundingdino/GroundingDino/groundingdino-swint-ogc \
 --input_image https://bj.bcebos.com/v1/paddlenlp/models/community/GroundingDino/000000004505.jpg \
 --output_dir ./groundingdino_predict_output \
 --prompt "bus"

3. BenchMark

Note: environment Paddle 3.0 PaddleMIX release/2.0 PaddleNLP 2.7.2 A100 80G。

3.1 benchmark cmd

Add -- benchmark after running in the 'deploy' corresponding model directory to obtain the running time of the model. example: GroundingDino benchmark:

 cd deploy/groundingdino
 python predict.py  \
 --text_encoder_type GroundingDino/groundingdino-swint-ogc \
 --model_path output_groundingdino/GroundingDino/groundingdino-swint-ogc \
 --input_image https://bj.bcebos.com/v1/paddlenlp/models/community/GroundingDino/000000004505.jpg \
 --output_dir ./groundingdino_predict_output \
 --prompt "bus" \
 --benchmark True
Model image size dtype Paddle Deploy
qwen-vl-7b 448*448 fp16 669.8 ms
llava-1.5-7b 336*336 fp16 981.2 ms
llava-1.6-7b 336*336 fp16 778.7 ms
groundingDino/groundingdino-swint-ogc 800*1193 fp32 100 ms
Sam/SamVitH-1024 1024*1024 fp32 121 ms