Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding
- Install
"transformers<=4.19.0"
. - Go to dataset directory:
cd nlu/slurp/
. - Prepare data:
./prepare_data.py /path/to/slurp data
. - Train a model:
./finetune.sh
(tested on A6000 GPU). - Run evaluation:
./evaluate.py data/test.target output/test_generations.txt
.
Alternatively, you can use the pretrained models hosted on Hugging Face Hub.
Dataset | Directory | Pretrained model |
---|---|---|
SLURP | nlu/slurp |
akreal/mbart-large-50-finetuned-slurp |
SLUE | nlu/slue |
akreal/mbart-large-50-finetuned-slue |
CATSLU | nlu/catslu |
akreal/mbart-large-50-finetuned-catslu |
MEDIA | nlu/media |
akreal/mbart-large-50-finetuned-media |
PortMEDIA-Dom | nlu/portmedia_dom |
akreal/mbart-large-50-finetuned-portmedia-dom |
PortMEDIA-Lang | nlu/portmedia_lang |
akreal/mbart-large-50-finetuned-portmedia-lang |
- Install the regular ESPnet version.
- Copy the model configuration file from this repository:
cp slu/slurp/train_asr_conformer_xlsr_mbart.yaml /path/to/espnet/egs2/slurp_entity/asr1/conf/
- Run the recipe:
./run.sh --asr_config conf/train_asr_conformer_xlsr_mbart.yaml
. - If you want to use pretrained Adaptor, download it from the link in the next section and run the recipe with it:
./run.sh --asr_config conf/train_asr_conformer_xlsr_mbart.yaml --pretrained_model downloads/conformer08x08h_d1024_xlsr_ts_lr5e-5_attcela_7kh_ave.pth:::decoder --asr_tag postdec-aed_7kh
.
The following are SLU models trained with the Adaptor that is pretrained on 7k hours with PostDec-AED loss.
Dataset | Recipe | Pretrained model |
---|---|---|
SLURP | slurp_entity |
Link |
SLUE | slue-voxpopuli |
Link |
CATSLU | catslu_entity |
Link |
MEDIA | media |
Link |
PortMEDIA-Dom | portmedia_dom |
Link |
PortMEDIA-Lang | portmedia_lang |
Link |
Cross-lingual PortMEDIA-Lang SLU model finetuned from the MEDIA SLU model: Link.
- Install the custom version of ESPnet:
git clone --branch v.202207 --depth 1 git@github.com:espnet/espnet.git /path/to/espnet-adaptor-pretrain
- Copy the modifications:
rsync -avh adaptor/espnet/ /path/to/espnet-adaptor-pretrain/
- Follow ESPnet installation instructions.
- Run the recipe:
cd /path/to/espnet-adaptor-pretrain/egs2/commonvoice/asr1; ./run.sh
Loss | Configuration | Pretrained model |
---|---|---|
PreEnc MC | conf/train_adaptor_conformer_preenc-mc.yaml |
Link |
PreEnc CTC | conf/train_adaptor_conformer_preenc-ctc.yaml |
Link |
PostEnc MC | conf/train_adaptor_conformer_postenc-mc.yaml |
Link |
PostDec MC | conf/train_adaptor_conformer_postdec-mc.yaml |
Link |
PostDec AED | conf/train_adaptor_conformer_postdec-aed.yaml |
Link |
PreEnc CTC + PostDec AED | conf/train_adaptor_conformer_preenc-ctc_postdec-aed.yaml |
Link |
PreEnc CTC + PostDec AED + PostEnc MC | conf/train_adaptor_conformer_preenc-ctc_postenc-mc_postdec-aed.yaml |
Link |
PostEnc MC (1K hours English data) | Link | |
PostDec AED (7K hours data) | Link |
@INPROCEEDINGS{10389655,
author={Denisov, Pavel and Vu, Ngoc Thang},
booktitle={2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
title={Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding},
year={2023},
volume={},
number={},
pages={1-8},
keywords={Training;Error analysis;Conferences;Predictive models;Benchmark testing;Filling;Data models;spoken language understanding;self-supervised learning;end-to-end;sequence-to-sequence;multilingual},
doi={10.1109/ASRU57964.2023.10389655}
}