PROJECT NOT UNDER ACTIVE MANAGEMENT
This project will no longer be maintained by Intel.
Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.
Intel no longer accepts patches to this project.
If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.
Contact: webadmin@linux.intel.com
Customers across various industries expect quick and accurate responses to their queries. Artificial Inteligence(AI)-Powered Customer Care Chatbots aim to provide this, but building efficient chatbots that can understand user intent and entities in real-time queries is challenging.
This workflow demonstrates how to construct an AI-Powered Customer Care Chatbot using Intel's oneAPI AI Analytics Toolkit to predict user intent and entities in queries. By leveraging Intel's hardware and optimized software, it accelerates the performance of the chatbot. This results in faster and more accurate responses, leading to improved customer satisfaction and more efficient customer support operations.
Check out more workflow examples in the Developer Catalog.
This workflow provides a high-level technical overview of building an AI-Powered Customer Care Chatbot using Intel® oneAPI AI Analytics Toolkit. Developers can understand why this workflow is relevant, its benefits, and what they will learn by trying it:
-
Relevance to Developers:
- This workflow is essential for Natural Language Processing (NLP) and chatbot developers.
- Developers interested in harnessing Intel's hardware acceleration, especially Intel® Extension for PyTorch* , will find it valuable.
-
Chosen Workflow:
- The workflow covers the complete chatbot lifecycle, from training to real-time prediction.
- It emphasizes integrating Intel's technologies for optimized Machine Learning (ML).
-
What Developers Will Learn:
- Setting up an optimized environment for Intel®-accelerated ML.
- Training NLP chatbots for intent classification and named entity recognition.
- Leveraging Intel's hardware acceleration for efficient model training and inference.
- Constructing chatbots that deliver fast and precise responses to customer queries.
- Hands-on experience with Intel® oneAPI AI Analytics Toolkit and PyTorch* .
This workflow equips developers with the knowledge and tools to create high-performance AI-Powered Customer Care Chatbots, enhancing customer service across various industries.
For more details, visit the AI-Powered Customer Care Chatbots GitHub repository.
In this section, we describe the code base and how to replicate the results. The included code demonstrates a complete framework for
- Setting up a virtual environment for Intel®-accelerated ML
- Training an NLP AI-Powered Customer Care Chatbot for intent classification and name entity recognition using PyTorch*/Intel® Extension for PyTorch*
- Predicting from the trained model on new data using PyTorch*/Intel® Extension for PyTorch*
The Intel® Extension for PyTorch* extends PyTorch* with optimizations for an extra performance boost on Intel® hardware. Most of the optimizations will be included in stock PyTorch* releases eventually, and the intention of the extension is to deliver up-to-date features and optimizations for PyTorch* on Intel® hardware, examples include AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX).
Intel® Neural Compressor (INC) is an open-source Python* library designed to help you quickly deploy low-precision inference solutions on popular deep-learning frameworks such as TensorFlow*, PyTorch* , MXNet*, and ONNX* (Open Neural Network Exchange) runtime. The tool automatically optimizes low-precision recipes for deep-learning models to achieve optimal product objectives, such as inference performance and memory usage, with expected accuracy criteria.
There are workflow-specific hardware and software setup requirements depending on how the workflow is run.
Recommended Hardware | Precision |
---|---|
CPU: Intel® 2nd Gen Xeon® Platinum 8280 CPU @ 2.70GHz or higher | FP32, INT8 |
RAM: 187 GB | |
Recommended Free Disk Space: 20 GB or more |
- RAM: 16 GB total memory
- CPUs: 4
- Storage: 20GB
- Operating system: Ubuntu* 22.04 LTS
Intel® oneAPI is used to accelerate results for critical low-latency applications. It provides the capability to reuse the code present in different languages so that hardware utilization is optimized to provide these results.
To reproduce the results in this repository, we describe the following tasks
- How to create an execution environment which utilizes Intel® versions of libraries
- How to run the code to benchmark model training
- How to run the code to benchmark model inference
- How to quantize trained models using INC
- How to benchmark concurrency
Start by defining an environment variable that will store the workspace path, this can be an existing directory or one to be created in further steps. This ENVVAR will be used for all the commands executed using absolute paths.
export WORKSPACE=$PWD/customer-chatbot
Set the following environment variables:
export DATA_DIR=$WORKSPACE/data
export OUTPUT_DIR=$WORKSPACE/output
export CONFIG_DIR=$WORKSPACE/config
Create a working directory for the workflow and clone the Main Repository repository into your working directory.
mkdir -p $WORKSPACE && cd $WORKSPACE
git clone https://github.com/oneapi-src/customer-chatbot.git $WORKSPACE
Create following directories.
mkdir -p $OUTPUT_DIR/saved_models/ $DATA_DIR/atis-2/ $OUTPUT_DIR/logs
-
Download the appropriate Miniconda Installer for Linux.
wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
-
In your terminal window, run.
bash Miniconda3-latest-Linux-x86_64.sh
-
Delete downloaded file.
rm Miniconda3-latest-Linux-x86_64.sh
To learn more about Conda* installation, see the Conda* Linux installation instructions.
Before creating the environments, if you don't already have Anaconda*, install and setup Anaconda* for Linux following this link.
Install and set the libmamba solver as default solver. Run the following commands:
# If the user wants to set libmamba as conda's default solver
# for base environment, run the following two lines; if not
# continue executing from to line number 3. Newer versions of
# Anaconda have libmamba already installed and will be the default
# solver in September 2023.
conda install -n base conda-libmamba-solver
conda config --set solver libmamba
The $WORKSPACE/env/intel_env.yml
file contains all the dependencies to create the intel environment necesary for runnig the workflow.
Execute the next command to create the Conda* environment.
conda env create -f $WORKSPACE/env/intel_env.yml
conda activate customer_chatbot_intel
Environment setup is required only once. This step does not cleanup the existing environment with the same name hence we need to make sure there is no Conda* environment with the same name. During this setup, customer_chatbot_intel
Conda* environment will be created with the dependencies listed in the YAML configuration.
For running concurrency benchmarking we will need to install additional dependancies
Apache* Utils will also be needed:
sudo apt-get install apache2-utils git
Model Archiver will be used to produce .mar
files (this file can then be redistributed and served by anyone using TorchServe*):
python -m pip install torch-model-archiver captum
You then need to clone the TorchServe* repo:
export TORCH_SERVE_DIR=$WORKSPACE/src/concurrency_benchmarking/serve
git clone https://github.com/pytorch/serve.git --branch v0.9.0 $TORCH_SERVE_DIR
Once the repo has been cloned follow the next steps or follow the steps described at Quick start with TorchServe*:
cd $TORCH_SERVE_DIR
python ./ts_scripts/install_dependencies.py
python -m pip install torch==2.1.1 torchserve==0.9.0 torch-model-archiver==0.9.0 torch-workflow-archiver==0.2.11 click-config-file==0.6.0
After installing TorchServe*, Apache* Bench is needed in order to run the benchmarks. Follow the next instructions to install pip dependencies:
cd $TORCH_SERVE_DIR/benchmarks/
python -m pip install -r requirements-ab.txt
The dataset used for this demo is the commonly used Airline Travel Information Systems (ATIS) dataset, which consists of ~5000 utterances of customer requests for flight related details. Each of these utterances is annotated with the intent of the query and the entities involved within the query. For example, the phrase
I want to fly from Baltimore to Dallas round trip.
would be classified with the intent of atis_flight
, corresponding to a flight reservation and the entities would be Baltimore (fromloc.city_name)
, Dallas (toloc.city_name)
, and round_trip (round_trip)
.
Preprocessing code and data for this repository were originally sourced from https://github.com/sz128/slot_filling_and_intent_detection_of_SLU/tree/master/data/atis-2.
Please see this data set's applicable license for terms and conditions. Intel does not own the rights to this data set and does not confer any rights to it.
The benchmarking scripts expect all of the data files to be present in data/atis-2/
directory.
Create atis-2/
directory if not present in $DATA_DIR
.
mkdir -p $DATA_DIR/atis-2/
To setup the data for benchmarking under these requirements, do the following:
- Download all of the files from https://github.com/sz128/slot_filling_and_intent_detection_of_SLU/tree/master/data/atis-2 sand save them into the
atis-2
directory.
Please see this data set's applicable license for terms and conditions. Intel does not own the rights to this data set and does not confer any rights to it.
cd $DATA_DIR/atis-2/
wget https://raw.githubusercontent.com/sz128/slot_filling_and_intent_detection_of_SLU/master/data/atis-2/train
wget https://raw.githubusercontent.com/sz128/slot_filling_and_intent_detection_of_SLU/master/data/atis-2/test
wget https://raw.githubusercontent.com/sz128/slot_filling_and_intent_detection_of_SLU/master/data/atis-2/valid
wget https://raw.githubusercontent.com/sz128/slot_filling_and_intent_detection_of_SLU/master/data/atis-2/vocab.intent
wget https://raw.githubusercontent.com/sz128/slot_filling_and_intent_detection_of_SLU/master/data/atis-2/vocab.slot
- Combine the
atis-2/train
andatis-2/valid
files into one calledatis-2/train_all
. In Linux, this can be done from the current directory using
cat train valid > train_all
cd $WORKSPACE
You can execute the references pipelines using the following environments:
- Bare Metal
- Jupyter Notebook
Follow these instructions to set up and run this workflow on your own development system.
Our examples use the conda
package and environment on your local computer. If you don't already have conda
installed, go to Set Up Conda* or see the Conda* Linux installation instructions.
To run the benchmarks on a selected configuration, the corresponding environment needs to be setup and activated. For example, to benchmark the model training with Intel® oneAPI technologies, the environment customer_chatbot_intel
should be activated using:
conda activate customer_chatbot_intel
Benchmarking for training can be done using the python script run_training.py
.
The script reads and preprocesses the data, trains a joint classification and entity recognition model, and predicts on unseen test data using the trained model, while also reporting on the execution time for these 3 steps. Optionally, the script can also save the trained model weights, which is necessary to run the inference benchmarks.
The run benchmark script takes the following arguments:
usage: run_training.py [-h] [-l LOGFILE] [-s SAVE_MODEL_DIR] -d DATASET_DIR [--save_onnx]
optional arguments:
-h, --help show this help message and exit
-l LOGFILE, --logfile LOGFILE
log file to output benchmarking results to
-s SAVE_MODEL_DIR, --save_model_dir SAVE_MODEL_DIR
directory to save model under
-d DATASET_DIR, --dataset_dir DATASET_DIR
directory to dataset
--save_onnx also export an ONNX model
Execute run_training.py
script as follows:
python -m intel_extension_for_pytorch.cpu.launch $WORKSPACE/src/run_training.py --logfile $OUTPUT_DIR/logs/intel_train.log -s $OUTPUT_DIR/saved_models/intel -d $DATA_DIR/atis-2/
The saved model weights are independent of the technology used. The model is trained using a Bidirectional Encoder Representations from Transformers (BERT) pretrained model with sequence_length = 64, batch_size = 20, epochs = 3. These can be changed within the script.
Note: Intel® Extension for PyTorch* contains many environment specific configuration parameters which can be set using the included CPU launcher tool. Further details for this can be found at https://intel.github.io/intel-extension-for-pytorch/1.11.200/tutorials/performance_tuning/launch_script.html. While the above command sets many parameters automatically, for our specific environment (D4v5), we benchmark with the following command.
OMP_NUM_THREADS=4 KMP_BLOCKTIME=50 python -m intel_extension_for_pytorch.cpu.launch --disable_numactl $WORKSPACE/src/run_training.py --logfile $OUTPUT_DIR/logs/intel_train.log -s $OUTPUT_DIR/saved_models/intel -d $DATA_DIR/atis-2/
Benchmarking for inference for PyTorch* (.pt) models can be done using the python script run_inference.py
.
run_inference.py
: runs inference benchmarks using models optimized by Intel® Extension for PyTorch* .
The run_inference.py
script takes the following arguments:
usage: run_inference.py [-h] -s SAVED_MODEL_DIR [--is_jit] [--is_inc_int8] [-b BATCH_SIZE] -d
DATASET_DIR [-l LENGTH] [--logfile LOGFILE] [-n N_RUNS]
optional arguments:
-h, --help show this help message and exit
-s SAVED_MODEL_DIR, --saved_model_dir SAVED_MODEL_DIR
directory of saved model to benchmark.
--is_jit if the model is torchscript. defaults to False.
--is_inc_int8 saved model dir is a quantized int8 model. defaults to False.
-b BATCH_SIZE, --batch_size BATCH_SIZE
batch size to use. defaults to 200.
-d DATASET_DIR, --dataset_dir DATASET_DIR
directory to dataset
-l LENGTH, --length LENGTH
sequence length to use. defaults to 512.
--logfile LOGFILE logfile to use.
-n N_RUNS, --n_runs N_RUNS
number of trials to test. defaults to 100.
As attention based models are independent of the sequence length, we can test on different sequence lengths without introducing new parameters. Both scripts run n
times and prints the average time taken to call the predict on a batch of size b
with sequence lenght l
.
To run benchmarks on the oneAPI PyTorch* execution engine, use:
python -m intel_extension_for_pytorch.cpu.launch $WORKSPACE/src/run_inference.py -s $OUTPUT_DIR/saved_models/intel --batch_size 200 --length 512 --n_runs 5 --logfile $OUTPUT_DIR/logs/intel_bench.log -d $DATA_DIR/atis-2/
Note: Intel® Extension for PyTorch* contains many environment specific configuration parameters which can be set using the included CPU launcher tool. Further details for this can be found at https://intel.github.io/intel-extension-for-pytorch/1.11.200/tutorials/performance_tuning/launch_script.html. While the above command sets many parameters automatically, for our specific environment (D4v5), we benchmark with the following command.
OMP_NUM_THREADS=4 KMP_BLOCKTIME=50 python -m intel_extension_for_pytorch.cpu.launch --disable_numactl $WORKSPACE/src/run_inference.py -s $OUTPUT_DIR/saved_models/intel --batch_size 200 --length 512 --n_runs 5 --logfile $OUTPUT_DIR/logs/intel_bench.log -d $DATA_DIR/atis-2/
OMP_NUM_THREADS=4 KMP_BLOCKTIME=50 python -m intel_extension_for_pytorch.cpu.launch --disable_numactl $WORKSPACE/src/run_inference.py -s $OUTPUT_DIR/saved_models/intel --batch_size 1 --length 512 --n_runs 1000 --logfile $OUTPUT_DIR/logs/intel_bench.log -d $DATA_DIR/atis-2/
Quantization is the practice of converting the FP32 weights in deep neural networks to a lower precision, such as INT8 in order to accelerate computation time and reduce storage space of trained models. This may be useful if latency and throughput are critical. Intel® offers multiple algorithms and packages for quantizing trained models. In this repo, we include scripts to quantize the AI Chatbot model using Intel® Neural Compressor.
A trained model from the run_training.py
script above can be quantized
using Intel® Neural Compressor
through the run_quantize_inc.py
script. This converts the model from FP32 to INT8 while trying to
maintain a specified level of accuracy specified via a config.yaml
file. A simple config.yaml
has been
provided for basic accuracy aware quantization though several further options exist and can be explored in the link above.
usage: run_quantize_inc.py [-h] -s SAVED_MODEL -o OUTPUT_DIR [-l LENGTH] [-q QUANT_SAMPLES] -c INC_CONFIG -d DATASET_DIR
optional arguments:
-h, --help show this help message and exit
-s SAVED_MODEL, --saved_model SAVED_MODEL
saved pytorch (.pt) model to quantize.
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
directory to save quantized model to.
-l LENGTH, --length LENGTH
sequence length to use. defaults to 512.
-q QUANT_SAMPLES, --quant_samples QUANT_SAMPLES
number of samples to use for quantization. defaults to 100.
-c INC_CONFIG, --inc_config INC_CONFIG
INC conf yaml.
-d DATASET_DIR, --dataset_dir DATASET_DIR
directory to dataset
A workflow of "training -> INC quantization -> inference" benchmarking may look like
# run training, outputs as $OUTPUT_DIR/saved_models/intel/convai.pt
python -m intel_extension_for_pytorch.cpu.launch $WORKSPACE/src/run_training.py -s $OUTPUT_DIR/saved_models/intel --logfile $OUTPUT_DIR/logs/intel_train.log -d $DATA_DIR/atis-2/
# quantize the trained model, outputs into the $OUTPUT_DIR/saved_models/intel_int8/best_model.pt directory
python $WORKSPACE/src/run_quantize_inc.py -s $OUTPUT_DIR/saved_models/intel/convai.pt -o $OUTPUT_DIR/saved_models/intel_int8/ -c $CONFIG_DIR/config.yml -d $DATA_DIR/atis-2/
# benchmark the non-quantized model using intel
python -m intel_extension_for_pytorch.cpu.launch $WORKSPACE/src/run_inference.py -s $OUTPUT_DIR/saved_models/intel/ -b 1 -n 1000 --logfile $OUTPUT_DIR/logs/intel_bench.log -d $DATA_DIR/atis-2/
# benchmark the quantized model using intel
python -m intel_extension_for_pytorch.cpu.launch $WORKSPACE/src/run_inference.py -s $OUTPUT_DIR/saved_models/intel_int8/ -b 1 -n 1000 --is_inc_int8 --logfile $OUTPUT_DIR/logs/intel_bench_quant.log -d $DATA_DIR/atis-2/
A critical aspect of good AI Chatbots is their ability to quickly respond to multiple independent customer queries. From a technical perspective, this is a question of how well these models can be run to handle concurrency on a single server.
In order to benchmark this, we need to do the following
- Package trained/optimized models using torch-model-archiver
- Deploy a trained model to use TorchServe*
- Run the TorchServe* benchmarks using apache bench
- Collect the reports of the TorchServe* benchmark
To use the trained models in torch-serve, they first need to be converted to a TorchScript* model. To do this, use the convert_jit.py
script
usage: convert_jit.py [-h] -s SAVED_MODEL_DIR -o OUTPUT_MODEL [--is_inc_int8]
optional arguments:
-h, --help show this help message and exit
-s SAVED_MODEL_DIR, --saved_model_dir SAVED_MODEL_DIR
directory of saved model to benchmark.
-o OUTPUT_MODEL, --output_model OUTPUT_MODEL
saved torchscript (.pt) model
-d DATASET_DIR, --dataset_dir DATASET_DIR
directory to dataset
--is_inc_int8 saved model dir is a quantized int8 model. defaults to False.
If the model is not quantized using INC and assuming the saved model is saved in the $OUTPUT_DIR/saved_models/intel
directory:
python $WORKSPACE/src/convert_jit.py -s $OUTPUT_DIR/saved_models/intel -o $OUTPUT_DIR/saved_models/intel/convai_jit.pt -d $DATA_DIR/atis-2/
which will convert the saved model into a TorchScript* model called convai_jit.pt
.
If the model is quantized using INC, we need to specify the flag --is_inc_int8
and then use:
python $WORKSPACE/src/convert_jit.py -s $OUTPUT_DIR/saved_models/intel_int8 -o $OUTPUT_DIR/saved_models/intel_int8/convai_jit.pt --is_inc_int8 -d $DATA_DIR/atis-2/
After creating a TorchScript* model, the trained model needs to be packaged to a .mar
file using torch-model-archiver. Assuming the serialized model is saved as convai_jit.pt
in the current directory, a sample command to do this is:
torch-model-archiver --model-name convai --export-path $OUTPUT_DIR/saved_models/intel --version 1.0 --serialized-file $OUTPUT_DIR/saved_models/intel/convai_jit.pt --handler $WORKSPACE/src/concurrency_benchmarking/custom_handler.py
Or if working with the quantized model, use:
torch-model-archiver --model-name convai --export-path $OUTPUT_DIR/saved_models/intel_int8 --version 1.0 --serialized-file $OUTPUT_DIR/saved_models/intel_int8/convai_jit.pt --handler $WORKSPACE/src/concurrency_benchmarking/custom_handler.py
This will create a file called convai.mar
which can be used to deploy to TorchServe*.
To benchmark this model using the TorchServe* benchmarking tools,
-
Copy the
config.json
file and theconfig.properties
file into the clonedserve/benchmarks
directory:cp $CONFIG_DIR/config.properties $TORCH_SERVE_DIR/benchmarks/config.properties cp $CONFIG_DIR/config.json $TORCH_SERVE_DIR/benchmarks/config.json
-
Modify the config.json and config.properties to point to the relevant files and the desired experimental parameters, e.g.:
sed -i "s|file:///PATH_TO_MAR|file://${OUTPUT_DIR}/saved_models/intel/convai.mar|" $TORCH_SERVE_DIR/benchmarks/config.json sed -i "s|PATH_TO_INPUT_FILE|${WORKSPACE}/src/concurrency_benchmarking/input_data.json|" $TORCH_SERVE_DIR/benchmarks/config.json sed -i "s|PATH_TO_CONFIG_PROPERTIES|${WORKSPACE}/src/concurrency_benchmarking/serve/benchmarks/config.properties|" $TORCH_SERVE_DIR/benchmarks/config.json
Or if using the quantized model:
sed -i "s|file:///PATH_TO_MAR|file://${OUTPUT_DIR}/saved_models/intel_int8/convai.mar|" $TORCH_SERVE_DIR/benchmarks/config.json sed -i "s|PATH_TO_INPUT_FILE|${WORKSPACE}/src/concurrency_benchmarking/input_data.json|" $TORCH_SERVE_DIR/benchmarks/config.json sed -i "s|PATH_TO_CONFIG_PROPERTIES|${WORKSPACE}/src/concurrency_benchmarking/serve/benchmarks/config.properties|" $TORCH_SERVE_DIR/benchmarks/config.json
We included a simple
input_data.json
file to provide a test input for running the benchmarks. -
Run the benchmark using:
PATH=$CONDA_PREFIX/bin/:$PATH python $TORCH_SERVE_DIR/benchmarks/benchmark-ab.py --config $TORCH_SERVE_DIR/benchmarks/config.json
The reports should be stored in the temporary directory /tmp/benchmark
. Measurements for latency and throughput can be found in the file /tmp/benchmark/ab_report.csv
.
The available fields for the config.json
file, as an example, are:
{'url': "file:///PATH_TO_MAR",
'gpus': '',
'exec_env': 'local',
'batch_size': 1,
'batch_delay': 200,
'workers': 1,
'concurrency': 10,
'requests': 100,
'input': 'PATH_TO_INPUT',
'content_type': 'application/json',
'image': '',
'docker_runtime': '',
'backend_profiling': False,
'config_properties': 'PATH_TO_CONFIG_PROPERTIES',
'inference_model_url': 'predictions/benchmark',
The config.properties
file adjusts the parameters for the TorchServe* server.
The two most important fields are to either enable or disable Intel® Extension for PyTorch* Extensions using
ipex_enable=true
cpu_launcher_enable=true
Follow these steps to restore your $WORKSPACE
directory to an initial step. Please note that all downloaded dataset files, Conda* environment created, and logs created by workflow will be deleted. Before executing next steps back up your important files.
conda deactivate
conda remove --name customer_chatbot_intel --all -y
rm -rf $OUTPUT_DIR/saved_models/ $DATA_DIR/atis-2/ $OUTPUT_DIR/logs $TORCH_SERVE_DIR
Follow the instructions described on Get Started to set required environment variables.
Execute Set Up Conda* and Set Up environment steps.
To be able to run GettingStarted.ipynb the Conda* environment must install additional packages:
conda activate customer_chatbot_intel
conda install -c intel nb_conda_kernels jupyter notebook -y
cd $WORKSPACE
jupyter notebook
Open Jupyter Notebook in a web browser, select GettingStarted.ipynb and select conda env:customer_chatbot_intel as the jupyter kernel. Now you can follow the notebook's instructions step by step.
To clean Jupyter Notebook follow the instructions described in Clean Up Bare Metal.
Training output is stored in $OUTPUT_DIR/logs
directory. You can see information on training time and training loss and accuracy per epoch. The final information should look similarly to below:
INFO - =======> Test Accuracy on NER : 0.94
INFO - =======> Test Accuracy on CLS : 0.91
INFO - =======> Training Time : 309.539 secs
INFO - =======> Inference Time : 5.648 secs
INFO - =======> Total Time: 315.187 secs
Benchmark results are stored in the $OUTPUT_DIR/logs
directory. It includes a progress bar of the benchmark progress followed by the average time per batch like below:
INFO - Avg time per batch : 19.659 s
Quantization results are stored in the $OUTPUT_DIR/logs
directory. It includes statistics of the quitized models accuracy and latency compared to the baseline model such as below:
[INFO] FP32 baseline is: [Accuracy: 0.9443, Duration (seconds): 15.7158]
[INFO] |******Mixed Precision Statistics******|
[INFO] +-----------------+----------+---------+
[INFO] | Op Type | Total | INT8 |
[INFO] +-----------------+----------+---------+
[INFO] | Embedding | 3 | 3 |
[INFO] | Linear | 75 | 75 |
[INFO] +-----------------+----------+---------+
[INFO] Pass quantize model elapsed time: 1495.84 ms
[INFO] Tune 1 result is: [Accuracy (int8|fp32): 0.9302|0.9443, Duration (seconds) (int8|fp32): 7.5332|15.7158], Best tune result is: [Accuracy: 0.9302, Duration (seconds): 7.5332]
[INFO] |**********************Tune Result Statistics**********************|
[INFO] +--------------------+----------+---------------+------------------+
[INFO] | Info Type | Baseline | Tune 1 result | Best tune result |
[INFO] +--------------------+----------+---------------+------------------+
[INFO] | Accuracy | 0.9443 | 0.9302 | 0.9302 |
[INFO] | Duration (seconds) | 15.7158 | 7.5332 | 7.5332 |
[INFO] +--------------------+----------+---------------+------------------+
In this example, we focus on leveraging the Intel® oneAPI AI Analytics Toolkit on the task of training and deploying an accurate AI system to predict the Intent and Entities of a user query.
Using Intel® technologies can result in more efficient model experimentation and more robust deployed AI solutions, even when using state-of-the-art Deep Learning based NLP models.
For more information about or to read about other relevant workflow examples, see these guides and software resources:
- PyTorch*
- TorchServe* benchmarking tools
- Conda* Linux installation instructions
- Intel® AI Analytics Toolkit (AI Kit)
- Intel® oneAPI AI Analytics Toolkit
- Intel® Extension for PyTorch*
- Intel® Neural Compressor
- Intel® Distribution for Python*
If you have questions or issues about this use case, want help with troubleshooting, want to report a bug or submit enhancement requests, please submit a GitHub issue.
*Other names and brands that may be claimed as the property of others. Trademarks.
To the extent that any public or non-Intel datasets or models are referenced by or accessed using tools or code on this site those datasets or models are provided by the third party indicated as the content source. Intel does not create the content and does not warrant its accuracy or quality. By accessing the public content, or using materials trained on or with such content, you agree to the terms associated with that content and that your use complies with the applicable license. Intel expressly disclaims the accuracy, adequacy, or completeness of any such public content, and is not liable for any errors, omissions, or defects in the content, or for any reliance on the content. Intel is not liable for any liability or damages relating to your use of public content.