Speech Recognition Demo

This demo demonstrates Automatic Speech Recognition (ASR) with a pretrained Mozilla* DeepSpeech 0.8.2 model.

It works with version 0.6.1 as well, and should also work with other models trained with Mozilla DeepSpeech 0.6.x/0.7.x/0.8.x with ASCII alphabets.

How It Works

The application accepts

Mozilla* DeepSpeech 0.8.2 neural network in Intermediate Representation (IR) format,
n-gram language model file in kenlm quantized binary format, and
an audio file in PCM WAV 16 kHz mono format.

After computing audio features, running a neural network to get per-frame character probabilities, and CTC decoding, the demo prints the decoded text together with the timings of the processing stages.

Prerequisites

The demo depends on the ctcdecode_numpy Python extension module, which implements CTC decoding in C++ for faster decoding. Please refer to Open Model Zoo demos for instructions on how to build the extension module and prepare the environment for running the demo. Alternatively, instead of using cmake you can run python -m pip install . inside ctcdecode-numpy directory to build and install ctcdecode-numpy.

Model preparation

You can download and convert a pre-trained Mozilla* DeepSpeech 0.8.2 or 0.6.1 model with OpenVINO Model Downloader and the provided conversion scripts. This is done with the following commands:

source <openvino_dir>/bin/setupvars.sh
<openvino_dir>/deployment_tools/open_model_zoo/tools/downloader/downloader.py --name mozilla-deepspeech-0.8.2
<openvino_dir>/deployment_tools/open_model_zoo/tools/downloader/converter.py --name mozilla-deepspeech-0.8.2

or

source <openvino_dir>/bin/setupvars.sh
<openvino_dir>/deployment_tools/open_model_zoo/tools/downloader/downloader.py --name mozilla-deepspeech-0.6.1
<openvino_dir>/deployment_tools/open_model_zoo/tools/downloader/converter.py --name mozilla-deepspeech-0.6.1

Please pay attention to the model license, Mozilla Public License 2.0.

Running Demo

Run the application with -h option to see help message. Here are the available command line options:

usage: speech_recognition_demo.py [-h] -i FILENAME [-d DEVICE] -m FILENAME
                                  [-L FILENAME] -p NAME [-b N] [-c N]
                                  [-l FILENAME]

Speech recognition demo

optional arguments:
  -h, --help            show this help message and exit
  -i FILENAME, --input FILENAME
                        Path to an audio file in WAV PCM 16 kHz mono format
  -d DEVICE, --device DEVICE
                        Optional. Specify the target device to infer on, for
                        example: CPU, GPU, FPGA, HDDL, MYRIAD or HETERO. The
                        sample will look for a suitable IE plugin for this
                        device. (default is CPU)
  -m FILENAME, --model FILENAME
                        Path to an .xml file with a trained model (required)
  -L FILENAME, --lm FILENAME
                        path to language model file (optional)
  -p NAME, --profile NAME
                        Choose pre/post-processing profile: mds06x_en for
                        Mozilla DeepSpeech v0.6.x, mds07x_en or mds08x_en for
                        Mozilla DeepSpeech v0.7.x/x0.8.x, other: filename of a
                        YAML file (required)
  -b N, --beam-width N  Beam width for beam search in CTC decoder (default
                        500)
  -c N, --max-candidates N
                        Show top N (or less) candidates (default 1)
  -l FILENAME, --cpu_extension FILENAME
                        Optional. Required for CPU custom layers. MKLDNN
                        (CPU)-targeted custom layers. Absolute path to a
                        shared library with the kernels implementations.

The typical command line is:

pip install -r requirements.txt
source <openvino_dir>/bin/setupvars.sh

python speech_recognition_demo.py \
    -p mds08x_en \
    -m <path_to_model>/mozilla_deepspeech_0.8.2.xml \
    -L <path_to_file>/deepspeech-0.8.2-models.kenlm \
    -i <path_to_audio>/audio.wav

For version 0.6.1 it is:

python speech_recognition_demo.py \
    -p mds06x_en \
    -m <path_to_model>/mozilla_deepspeech_0.6.1.xml \
    -L <path_to_file>/lm.binary \
    -i <path_to_audio>/audio.wav

Only 16-bit, 16 kHz, mono-channel WAVE audio files are supported.

An example audio file can be taken from <openvino_dir>/deployment_tools/demo/how_are_you_doing.wav.

Demo Output

The application shows time taken by the initialization and processing stages, and the decoded text for the audio file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Speech Recognition Demo

How It Works

Prerequisites

Model preparation

Running Demo

Demo Output

Files

README.md

Latest commit

History

README.md

File metadata and controls

Speech Recognition Demo

How It Works

Prerequisites

Model preparation

Running Demo

Demo Output