[中文文档]
PaddleMIX utilizes Paddle Inference and provides a Python-based deployment solution. There are two deployment methods:
-
APPflow Deployment:
- By setting the
static_mode = True
variable in APPflow, you can enable static graph inference. Additionally, you can accelerate inference using TensorRT. Note that not all models support static graph or TensorRT. Please refer to the Multi Modal And Scenario section for specific model support.
- By setting the
-
Single Model Deployment:
For APPflow usage, you can set the static_mode = True
variable to enable static graph inference and optionally accelerate inference using TensorRT.
>>> from paddlemix.appflow import Appflow
>>> from PIL import Image
>>> task = Appflow(app="openset_det_sam",
models=["GroundingDino/groundingdino-swint-ogc","Sam/SamVitH-1024"],
static_mode=True,
precision="fp32")
>>> image_pil = Image.open("beauty.png").convert("RGB")
>>> result = task(image=image_pil,prompt="women")
Parameter | Required? | Meaning |
---|---|---|
--app | Yes | Application name |
--models | Yes | Model(s) used. Can be one model, or multiple models |
--static_mode | Optional | Whether to use static graph inference, default to False |
--precision | Optional | When static_mode == True , it defaults to using FP32. You can optionally select trt_fp32 or trt_fp16 . |
Instructions:
-
Some models do not support static graph or TensorRT. For specific information, please refer to Multi Modal And Scenario.
-
The generated static graph will be located in the folder corresponding to the model name, for example:
GroundingDino/groundingdino-swint-ogc/
.
Python-based prediction deployment mainly involves two steps:
- Exporting the predictive model
- Performing prediction using Python
Currently supported models:
Using groundingdino as an exmaple.
cd deploy/groundingdino
# 导出groundingdino模型
python export.py \
--dino_type GroundingDino/groundingdino-swint-ogc
Will be exported to the following directory, including model_state.pdiparams
, model_state.pdiparams.info
, model_state.pdmodel
and other files.
python predict.py \
--text_encoder_type GroundingDino/groundingdino-swint-ogc \
--model_path output_groundingdino/GroundingDino/groundingdino-swint-ogc \
--input_image https://bj.bcebos.com/v1/paddlenlp/models/community/GroundingDino/000000004505.jpg \
--output_dir ./groundingdino_predict_output \
--prompt "bus"
Note: environment Paddle 3.0 PaddleMIX release/2.0 PaddleNLP 2.7.2 A100 80G。
Add -- benchmark after running in the 'deploy' corresponding model directory to obtain the running time of the model. example: GroundingDino benchmark:
cd deploy/groundingdino
python predict.py \
--text_encoder_type GroundingDino/groundingdino-swint-ogc \
--model_path output_groundingdino/GroundingDino/groundingdino-swint-ogc \
--input_image https://bj.bcebos.com/v1/paddlenlp/models/community/GroundingDino/000000004505.jpg \
--output_dir ./groundingdino_predict_output \
--prompt "bus" \
--benchmark True
Model | image size | dtype | Paddle Deploy |
---|---|---|---|
qwen-vl-7b | 448*448 | fp16 | 669.8 ms |
llava-1.5-7b | 336*336 | fp16 | 981.2 ms |
llava-1.6-7b | 336*336 | fp16 | 778.7 ms |
groundingDino/groundingdino-swint-ogc | 800*1193 | fp32 | 100 ms |
Sam/SamVitH-1024 | 1024*1024 | fp32 | 121 ms |