This is a demo implementation of the Unified Spectral and Prosodic Speech Enhancement (SE) framework in the paper based on a given small test sample set (i.e., the 'test_samples' folder).
- Python 3.6+
- Ubuntu 18.04
- Praat (available to download on the official website)
- keras version 2.2.4
- tensorflow version 1.14.0
- librosa version 0.7.0
- pystoi version 0.2.2
Under the 'prosodic_feature_extraction' folder, the praat script feat.praat is utilized to extract pitch and intensity prosodic features via Praat. The python script feat_comb_praat.py concatenates these features as a matrix and then output with .mat file.
- create 'feat_praat' folder
- create 'feat_mat' folder
- change directory$ & outdir$ paths in feat.praat and run in the terminal with
praat feat.praat
- change path in feat_comb_praat.py and run in the terminal with
python feat_comb_praat.py
After extracted the prosodic features of desired corpus, we use the training.py to train the SE models and evluate PESQ & STOI perofrmance results by the testing.py.
- Parameters in the training.py are,
- -epoch: number of training epochs
- -model_type: 'JointConcat', 'FixedConcat' and 'MultiTask' are supported
- -path_dataset: directory of the dataset
- Notice: if you use the 'FixedConcat' model, you will need to pretrain a FE model by the pretrain_fe.py
- Parameters in the testing.py are,
- -model_type: 'JointConcat', 'FixedConcat' and 'MultiTask' are supported
- -path_dataset: directory of the dataset
- -path_output: output directory of the enhanced WAV files
Under the 'trained_models_TIMIT' folder, we also provide the trained SE models (based on the TIMIT dataset), where the performances are the same as the paper. For utilizing the models to enhance input noisy wav files, you can directly change the directory parameters in the pred_process.py and run the following comment in the terminal
python pred_process.py
If you use this code, please cite the following paper:
Wei-Cheng Lin, Yu Tsao, Fei Chen and Hsin-Min Wang, "Investigation of Neural Network Approaches for Unified Spectral and Prosodic Feature Enhancement" in 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2019, pp. 1179–1184.
@inproceedings{lin2019investigation,
title={Investigation of Neural Network Approaches for Unified Spectral and Prosodic Feature Enhancement},
author={Lin, Wei-Cheng and Tsao, Yu and Chen, Fei and Wang, Hsin-Min},
booktitle={2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
pages={1179--1184},
year={2019},
organization={IEEE}
}