Skip to content

a demo code for the proposed spectral and prosodic acoustic feature enhancement under noisy environment

License

Notifications You must be signed in to change notification settings

winston-lin-wei-cheng/Unified-Spectral-Prosodic-SE

Repository files navigation

Unified Spectral and Prosodic SE

This is a demo implementation of the Unified Spectral and Prosodic Speech Enhancement (SE) framework in the paper based on a given small test sample set (i.e., the 'test_samples' folder).

Suggested Environment and Requirements

  1. Python 3.6+
  2. Ubuntu 18.04
  3. Praat (available to download on the official website)
  4. keras version 2.2.4
  5. tensorflow version 1.14.0
  6. librosa version 0.7.0
  7. pystoi version 0.2.2

Prosodic Feature Extraction (Praat)

Under the 'prosodic_feature_extraction' folder, the praat script feat.praat is utilized to extract pitch and intensity prosodic features via Praat. The python script feat_comb_praat.py concatenates these features as a matrix and then output with .mat file.

  1. create 'feat_praat' folder
  2. create 'feat_mat' folder
  3. change directory$ & outdir$ paths in feat.praat and run in the terminal with
praat feat.praat
  1. change path in feat_comb_praat.py and run in the terminal with
python feat_comb_praat.py

How to run

After extracted the prosodic features of desired corpus, we use the training.py to train the SE models and evluate PESQ & STOI perofrmance results by the testing.py.

  1. Parameters in the training.py are,
    • -epoch: number of training epochs
    • -model_type: 'JointConcat', 'FixedConcat' and 'MultiTask' are supported
    • -path_dataset: directory of the dataset
    • Notice: if you use the 'FixedConcat' model, you will need to pretrain a FE model by the pretrain_fe.py
  2. Parameters in the testing.py are,
    • -model_type: 'JointConcat', 'FixedConcat' and 'MultiTask' are supported
    • -path_dataset: directory of the dataset
    • -path_output: output directory of the enhanced WAV files

Trained Models

Under the 'trained_models_TIMIT' folder, we also provide the trained SE models (based on the TIMIT dataset), where the performances are the same as the paper. For utilizing the models to enhance input noisy wav files, you can directly change the directory parameters in the pred_process.py and run the following comment in the terminal

python pred_process.py

Reference

If you use this code, please cite the following paper:

Wei-Cheng Lin, Yu Tsao, Fei Chen and Hsin-Min Wang, "Investigation of Neural Network Approaches for Unified Spectral and Prosodic Feature Enhancement" in 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2019, pp. 1179–1184.

@inproceedings{lin2019investigation,
  title={Investigation of Neural Network Approaches for Unified Spectral and Prosodic Feature Enhancement},
  author={Lin, Wei-Cheng and Tsao, Yu and Chen, Fei and Wang, Hsin-Min},
  booktitle={2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
  pages={1179--1184},
  year={2019},
  organization={IEEE}
}

About

a demo code for the proposed spectral and prosodic acoustic feature enhancement under noisy environment

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages