Skip to content

More details about wav2mask ? #124

Answered by jianfch
My-captain asked this question in Q&A
Discussion options

You must be logged in to vote

audio2loudness converts the waveform into list of amplitude values that corresponds with the 1500 timestamps tokens in the prediction (i.e. each is 0.02s and total is 30s). Essentially tells you how loud each 0.02s chunk of the audio is. wav2mask does loudness equalization on that list of amplitude values and quantize those value (i.e. it zeros the low values). Then it converts this list of values into all mask for the 1500 timestamps tokens as a way tell the decoder which timestamp values to ignore because their relative loudness tell us that those timestamps are silent. On the other hand, vad=True generates this mask using another neural net.

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@My-captain
Comment options

@dgoryeo
Comment options

@jianfch
Comment options

@dgoryeo
Comment options

Answer selected by My-captain
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants