ASR (Audio-Speech-Recognition) microservice helps users convert speech to text. When building a talking bot with LLM, users will need to convert their audio inputs (What they talk, or Input audio from other sources) to text, so the LLM is able to tokenize the text and generate an answer. This microservice is built for that conversion stage.
To start the ASR microservice with Python, you need to first install python packages.
pip install -r requirements.txt
- Xeon CPU
cd integrations/dependency/whisper
nohup python whisper_server.py --device=cpu &
python check_whisper_server.py
Note: please make sure that port 7066 is not occupied by other services. Otherwise, use the command npx kill-port 7066
to free the port.
If the Whisper server is running properly, you should see the following output:
{'asr_result': 'Who is pat gelsinger'}
- Gaudi2 HPU
pip install optimum[habana]
cd dependency/
nohup python whisper_server.py --device=hpu &
python check_whisper_server.py
# Or use openai protocol compatible curl command
# Please refer to https://platform.openai.com/docs/api-reference/audio/createTranscription
wget https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav
curl http://localhost:7066/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F file="@./sample.wav" \
-F model="openai/whisper-small"
cd ../../..
python opea_asr_microservice.py
python check_asr_server.py
While the Whisper service is running, you can start the ASR service. If the ASR service is running properly, you should see the output similar to the following:
{'text': 'who is pat gelsinger'}
Alternatively, you can also start the ASR microservice with Docker.
- Xeon CPU
cd ../..
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/Dockerfile .
- Gaudi2 HPU
cd ../..
docker build -t opea/whisper-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/Dockerfile.intel_hpu .
docker build -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/Dockerfile .
- Xeon
docker run -p 7066:7066 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy opea/whisper:latest
- Gaudi2 HPU
docker run -p 7066:7066 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy opea/whisper-gaudi:latest
ip_address=$(hostname -I | awk '{print $1}')
docker run -d -p 9099:9099 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy -e ASR_ENDPOINT=http://$ip_address:7066 opea/asr:latest
# Use curl or python
# curl
wget https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav
curl http://localhost:9099/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F file="@./sample.wav" \
-F model="openai/whisper-small"
# python
python check_asr_server.py