Unified Spectral and Prosodic SE

This is a demo implementation of the Unified Spectral and Prosodic Speech Enhancement (SE) framework in the paper based on a given small test sample set (i.e., the 'test_samples' folder).

Suggested Environment and Requirements

Python 3.6+
Ubuntu 18.04
Praat (available to download on the official website)
keras version 2.2.4
tensorflow version 1.14.0
librosa version 0.7.0
pystoi version 0.2.2

Prosodic Feature Extraction (Praat)

Under the 'prosodic_feature_extraction' folder, the praat script feat.praat is utilized to extract pitch and intensity prosodic features via Praat. The python script feat_comb_praat.py concatenates these features as a matrix and then output with .mat file.

create 'feat_praat' folder
create 'feat_mat' folder
change directory$ & outdir$ paths in feat.praat and run in the terminal with

praat feat.praat

change path in feat_comb_praat.py and run in the terminal with

python feat_comb_praat.py

How to run

After extracted the prosodic features of desired corpus, we use the training.py to train the SE models and evluate PESQ & STOI perofrmance results by the testing.py.

Parameters in the training.py are,
- -epoch: number of training epochs
- -model_type: 'JointConcat', 'FixedConcat' and 'MultiTask' are supported
- -path_dataset: directory of the dataset
- Notice: if you use the 'FixedConcat' model, you will need to pretrain a FE model by the pretrain_fe.py
Parameters in the testing.py are,
- -model_type: 'JointConcat', 'FixedConcat' and 'MultiTask' are supported
- -path_dataset: directory of the dataset
- -path_output: output directory of the enhanced WAV files

Trained Models

Under the 'trained_models_TIMIT' folder, we also provide the trained SE models (based on the TIMIT dataset), where the performances are the same as the paper. For utilizing the models to enhance input noisy wav files, you can directly change the directory parameters in the pred_process.py and run the following comment in the terminal

python pred_process.py

Reference

If you use this code, please cite the following paper:

Wei-Cheng Lin, Yu Tsao, Fei Chen and Hsin-Min Wang, "Investigation of Neural Network Approaches for Unified Spectral and Prosodic Feature Enhancement" in 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2019, pp. 1179–1184.

@inproceedings{lin2019investigation,
  title={Investigation of Neural Network Approaches for Unified Spectral and Prosodic Feature Enhancement},
  author={Lin, Wei-Cheng and Tsao, Yu and Chen, Fei and Wang, Hsin-Min},
  booktitle={2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
  pages={1179--1184},
  year={2019},
  organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
enhanced_samples		enhanced_samples
prosodic_feature_extraction		prosodic_feature_extraction
test_samples		test_samples
trained_models_TIMIT		trained_models_TIMIT
.gitignore		.gitignore
LICENSE		LICENSE
PESQ		PESQ
README.md		README.md
model.py		model.py
pred_process.py		pred_process.py
pretrain_fe.py		pretrain_fe.py
testing.py		testing.py
training.py		training.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unified Spectral and Prosodic SE

Suggested Environment and Requirements

Prosodic Feature Extraction (Praat)

How to run

Trained Models

Reference

About

Releases

Packages

Languages

License

winston-lin-wei-cheng/Unified-Spectral-Prosodic-SE

Folders and files

Latest commit

History

Repository files navigation

Unified Spectral and Prosodic SE

Suggested Environment and Requirements

Prosodic Feature Extraction (Praat)

How to run

Trained Models

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages