More details about wav2mask ? #124
-
Hi, @jianfch . |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
|
Beta Was this translation helpful? Give feedback.
-
Hi, @jianfch . |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
audio2loudness
converts the waveform into list of amplitude values that corresponds with the 1500 timestamps tokens in the prediction (i.e. each is 0.02s and total is 30s). Essentially tells you how loud each 0.02s chunk of the audio is.wav2mask
does loudness equalization on that list of amplitude values and quantize those value (i.e. it zeros the low values). Then it converts this list of values into all mask for the 1500 timestamps tokens as a way tell the decoder which timestamp values to ignore because their relative loudness tell us that those timestamps are silent. On the other hand,vad=True
generates this mask using another neural net.