Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding

NLU

Training

Install "transformers<=4.19.0".
Go to dataset directory: cd nlu/slurp/.
Prepare data: ./prepare_data.py /path/to/slurp data.
Train a model: ./finetune.sh (tested on A6000 GPU).
Run evaluation: ./evaluate.py data/test.target output/test_generations.txt.

Alternatively, you can use the pretrained models hosted on Hugging Face Hub.

Pretrained Models

Dataset	Directory	Pretrained model
SLURP	`nlu/slurp`	akreal/mbart-large-50-finetuned-slurp
SLUE	`nlu/slue`	akreal/mbart-large-50-finetuned-slue
CATSLU	`nlu/catslu`	akreal/mbart-large-50-finetuned-catslu
MEDIA	`nlu/media`	akreal/mbart-large-50-finetuned-media
PortMEDIA-Dom	`nlu/portmedia_dom`	akreal/mbart-large-50-finetuned-portmedia-dom
PortMEDIA-Lang	`nlu/portmedia_lang`	akreal/mbart-large-50-finetuned-portmedia-lang

SLU

Training

Install the regular ESPnet version.
Copy the model configuration file from this repository: cp slu/slurp/train_asr_conformer_xlsr_mbart.yaml /path/to/espnet/egs2/slurp_entity/asr1/conf/
Run the recipe: ./run.sh --asr_config conf/train_asr_conformer_xlsr_mbart.yaml.
If you want to use pretrained Adaptor, download it from the link in the next section and run the recipe with it: ./run.sh --asr_config conf/train_asr_conformer_xlsr_mbart.yaml --pretrained_model downloads/conformer08x08h_d1024_xlsr_ts_lr5e-5_attcela_7kh_ave.pth:::decoder --asr_tag postdec-aed_7kh.

Pretrained Models

The following are SLU models trained with the Adaptor that is pretrained on 7k hours with PostDec-AED loss.

Dataset	Recipe	Pretrained model
SLURP	`slurp_entity`	Link
SLUE	`slue-voxpopuli`	Link
CATSLU	`catslu_entity`	Link
MEDIA	`media`	Link
PortMEDIA-Dom	`portmedia_dom`	Link
PortMEDIA-Lang	`portmedia_lang`	Link

Cross-lingual PortMEDIA-Lang SLU model finetuned from the MEDIA SLU model: Link.

Adaptor

Training

Install the custom version of ESPnet: git clone --branch v.202207 --depth 1 git@github.com:espnet/espnet.git /path/to/espnet-adaptor-pretrain
Copy the modifications: rsync -avh adaptor/espnet/ /path/to/espnet-adaptor-pretrain/
Follow ESPnet installation instructions.
Run the recipe: cd /path/to/espnet-adaptor-pretrain/egs2/commonvoice/asr1; ./run.sh

Pretrained Models

Loss	Configuration	Pretrained model
PreEnc MC	`conf/train_adaptor_conformer_preenc-mc.yaml`	Link
PreEnc CTC	`conf/train_adaptor_conformer_preenc-ctc.yaml`	Link
PostEnc MC	`conf/train_adaptor_conformer_postenc-mc.yaml`	Link
PostDec MC	`conf/train_adaptor_conformer_postdec-mc.yaml`	Link
PostDec AED	`conf/train_adaptor_conformer_postdec-aed.yaml`	Link
PreEnc CTC + PostDec AED	`conf/train_adaptor_conformer_preenc-ctc_postdec-aed.yaml`	Link
PreEnc CTC + PostDec AED + PostEnc MC	`conf/train_adaptor_conformer_preenc-ctc_postenc-mc_postdec-aed.yaml`	Link
PostEnc MC (1K hours English data)		Link
PostDec AED (7K hours data)		Link

Citation

@INPROCEEDINGS{10389655,
  author={Denisov, Pavel and Vu, Ngoc Thang},
  booktitle={2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)}, 
  title={Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding}, 
  year={2023},
  volume={},
  number={},
  pages={1-8},
  keywords={Training;Error analysis;Conferences;Predictive models;Benchmark testing;Filling;Data models;spoken language understanding;self-supervised learning;end-to-end;sequence-to-sequence;multilingual},
  doi={10.1109/ASRU57964.2023.10389655}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
adaptor/espnet		adaptor/espnet
nlu		nlu
slu		slu
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding

NLU

Training

Pretrained Models

SLU

Training

Pretrained Models

Adaptor

Training

Pretrained Models

Citation

About

Releases

Packages

Languages

License

DigitalPhonetics/multilingual-seq2seq-slu

Folders and files

Latest commit

History

Repository files navigation

Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding

NLU

Training

Pretrained Models

SLU

Training

Pretrained Models

Adaptor

Training

Pretrained Models

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages