Feature Request: More Spotify like volume normalization #608
Replies: 22 comments 39 replies
-
I'm also curious if the audio processing is currently done in 16 bit? I ask because I notice that the output of librespot is 16 bit. Lowering the gain of 16 bit audio by several dB with ReplayGain throws away bits. It would be advantageous audio quality wise to do audio processing in 24 or 32 bit mode and if the sound card will accept it just give it to them or if not truncate it to 16 bit. |
Beta Was this translation helpful? Give feedback.
-
Not accounting for noise shaping and other tricks 16 bits gets you a theoretical dynamic range of 96.33dB and 24 bit gets you 144.49dB so that would mean if you did processing in 24 bit mode you could lower the gain by up to 48.16dB before you had to start throwing away bits. |
Beta Was this translation helpful? Give feedback.
-
Another option of course is to do the gain reduction in hardware for sound cards that have a hardware volume control. Because that's basically all that your implementation of volume normalization does. Turn the volume up and down. The replay spec mentions that as an option. Doing it in hardware at least doesn't throw bits away. |
Beta Was this translation helpful? Give feedback.
-
@JasonLG1979 iirc the audio Is processed in 16 bits, since the Spotify files are 44,100, 16bit. If you want to examine the processing logic and potentially change it to 24/32 then that could potentially be worth having. My concern is that it would probably want to have a usage flag as I imagine that 32 bit processing will put a strain on some of the more memory constrained devices that librespot supports. Hardware based normalisation would be good to have, ideally we would just offload to the hardware where possible, otherwise fallback to a software implementation |
Beta Was this translation helpful? Give feedback.
-
That's not how lossy audio like vorbis works. The source file may have been 16 bit but in the process of converting it it was transformed into the frequency domain sorta like converting PCM to PWM. Lossy audio does not have a bit depth. The bit depth of the resulting PCM is decided by the decoder. I would think the decoder does it's work internally in at least 32 bit float if it's worth a crap anyway.
It would use more memory but no more CPU really. All you're doing is bit shifting. If the decoder won't output anything but 16 bit basically you just pad the bottom 8 or 16 bits with zeros and then do your gain adjustment just like before, Except now you're not throwing away bits.
That would also have to imply fixed or softvol volume, as in librespot is the only thing that should be turning the hardware volume up or down. |
Beta Was this translation helpful? Give feedback.
-
Only outputting 16 bit also affects the quality of librespot's software volume implementation. The same thing happens when you turn the volume down in 16 bit mode. You're throwing away bits. It would be nice to have "lossless" software volume control also. Turning S16_LE to S32_LE would be trivial I would think since i32 is a native rust data type. It should give you more than enough space to lower the volume to below the physical noise floor of a device before you have to start throwing away bits even with gain adjustment. S24_LE and S24_3LE might be a little tricky though. |
Beta Was this translation helpful? Give feedback.
-
Interested in this point, I dug around the source code. It seems that gain normalization is applied in 32 bit, then converted to 16 bit output: librespot/playback/src/player.rs Line 1098 in 7f705ed Same for the software volume control: librespot/playback/src/mixer/softmixer.rs Line 42 in 7f705ed For the ALSA sink this is even done in 64 bit: librespot/playback/src/mixer/alsamixer.rs Line 168 in 7f705ed So while this does not answer your feature request, at least the volume controls seem to be in HQ order! |
Beta Was this translation helpful? Give feedback.
-
No. The audio is spit out as 16bit 44.1 by the decoder and then processed. Converting a 16bit int into 32bit float then doing some math on it and then converting it back to a 16bit int is in no way HQ and you gain nothing. You're still throwing away bits, Best case you're wasting time converting back and forth, worst case you're introducing rounding errors/distortion converting an int to a float and then back to an int. The solution is to do the gain normalization in 24 or 32bit (or 64bit or whatever the decoder natively works in) during the decoding process and just leave it 24 or 32bit. That way you can still fit the whole 16bits inside the 24/32bits with room for gain normalization without throwing away bits. |
Beta Was this translation helpful? Give feedback.
-
You are absolutely right. Blame on me for missing that glaring point at such a late hour. Output should remain at high bit depth after processing, not casted back to 16 bit. |
Beta Was this translation helpful? Give feedback.
-
Feel free to create a PR if you want to/have time to. I'd be curious to see if the difference is noticeable or if this ends up as more of a case of 'doing it properly'. |
Beta Was this translation helpful? Give feedback.
-
It might be a while. I'd need to learn rust.
The difference would certainly be measurable I would think, but with all things audio, depending on the person and/or audio gear it may or may not be preservable? IMHO It never hurts to do things right though. |
Beta Was this translation helpful? Give feedback.
-
Why 24 bit resolution matters for volume control and normalization is described here: http://archimago.blogspot.com/2019/02/musings-why-bother-with-24-bit-dacs.html Dialing the volume down to -25 dB in 16 bit decreases dynamic range from 98,9 dBA (CD quality) to 73,7 dBA (3,7 dB higher than vinyl). In comparison, doing the same in 24 bit pretty much maintains CD quality at 96,6 dBA. This is within the 120 dB dynamic range of human hearing and so practically observable. I am enthusiastic about investing my time in this for the ALSA and Rodio backends. It would mean:
For 24 bit output, the following looks promising: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3d233fedc8ed595a1e88e815d23cd009 Is this something of interest? |
Beta Was this translation helpful? Give feedback.
-
I was under the impression that the ogg stream from Spotify was encoded in 16bit 44.1 to begin with? Or do I misunderstand? librespot/audio/src/lewton_decoder.rs Lines 32 to 34 in ed20f35 |
Beta Was this translation helpful? Give feedback.
-
That's true, it's encoded at 16 bit 44,1 kHz so that gives a dynamic range of 96,3 dB at 0 dBFS. Now if you go under 0 dBFS (such as when attenuating volume or applying negative replay gain) you are adjusting the magnitude of the encoded wave. For every 6 dB attenuation you lose 1 bit. Intuitively: at one point in the signal is encoded at 65535 (maximum amplitude). This is encoded as 1111 1111 1111 1111. Now you halve the volume. The signal should then be 32767 (half amplitude). This is encoded as 0111 1111 1111 1111. You have just lost one bit of information to reconstruct the same signal. This can be circumvented by taking the 16 bit Ogg Vorbis stream, padding it with 8 or 16 zeros to 24 or 32 bit, then do volume control and normalization on it and keep it at that bit depth. You now have 48 respectively another 96 dB of headroom to do volume control in without losing dynamic range. Staying with the example, 1111 1111 1111 1111 padded to 32 bit is 1111 1111 1111 1111 0000 0000 0000 0000. Halving the volume makes it 0111 1111 1111 1111 1000 0000 0000 0000. No more information lost. (This does not really concern the title of this issue, should we open a new one?) |
Beta Was this translation helpful? Give feedback.
-
I created a fork with 32-bit internal sample storage and 64-bit arithmetic for volume control and normalisation. So far the Rodio backend seems to be working on my Mac. Initial commit here, feedback welcome! Working on ALSA next as well as a command-line option to specify output depth. |
Beta Was this translation helpful? Give feedback.
-
Made some more progress tonight, successfully getting to compile with libvorbis, GStreamer and JACK audio. Notes:
As said on Gitter, I'll open a PR in a short while once I've gotten the final PulseAudio and PortAudio backends to compile. So I can squash after a few more nights of incremental development. Meanwhile I'll report here. |
Beta Was this translation helpful? Give feedback.
-
@JasonLG1979 I think you might like roderickvd/librespot@1037108. I just changed volume normalisation so that is clips specific samples instead of reducing the overall gain. |
Beta Was this translation helpful? Give feedback.
-
Would you like to create a work in progress PR? So it's easier to follow. You're saying that you don't use |
Beta Was this translation helpful? Give feedback.
-
Maybe it's also worth clarifying with the maintainers if this code would be merged. It's starting to sound like something you'd expect to find in music player software. Does this still belong in a library for using Spotify or some other generic library for applying replay gain? |
Beta Was this translation helpful? Give feedback.
-
I submitted PR #660, let's continue there. |
Beta Was this translation helpful? Give feedback.
-
OK let me state for starters that I am not a programmer but am an audio systems designer. I realize that this topic is mostly about softvol control, but I'd just like to toss in my request for a true hardware path for those of us that have and would like to use spotify to it's best capacity. My system is end to end designed for best capacity - there are precisely zero capacitors in my signal chain and meticulously designed for zero ground loops. It will be elsewhere for the weakest link. |
Beta Was this translation helpful? Give feedback.
-
No arguments here....
When lossless is just as available, I'll be going that way.
For now I am satisfied with spotify.
Whatever works the best it's what I'll be using, now and then.
…On Fri, Mar 12, 2021, 11:14 AM Jason Gray ***@***.***> wrote:
You realize that you will get quantization noise anyway because of
Spotify's (or actually: Vorbis') compressed nature?
If you *really* care about audio fidelity Spotify is not the service for
you. Audio fidelity is not Spotify's selling point. It's meant to be "good
enough".
In my mind this is about matching the behavior of the official clients and
making the best of what is available.
It would be interesting to have ReplayGain also use the Alsa volume
control, if available, instead of doing it in librespot software.
It would be trivial but it would make it so that you couldn't adjust the
volume without messing up the gain adjustment which would only be suitable
for systems that control the volume later in the chain.
This not only has a noise floor of -318 dB (which simply cannot be
audible) but is also better than 99% of DACs doing volume control in "only"
32-bit hardware.
Yep, the max dynamic range of any sound on Earth at sea level is 194 dB.
Pushing all quantization noise 124 dB below that makes 64 bit digital
volume control superior to any physical volume control in every measurable
way.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#608 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIWGVS6ES3JECRH6FIAUKDTTDIVW7ANCNFSM4YDP3QYA>
.
|
Beta Was this translation helpful? Give feedback.
-
Librespot already has volume normalization which I would assume (hopefully) follows the ReplayGain spec since that's what Spotify uses. But unlike Spotify it seems to use gain reduction as it's clipping prevention method whereas Spotify uses limiting. There also seems to be nothing in the librespot docs about how to currently approximate Spotify's 3 different volume normalisation options.
From what I can tell to approximate the 3 Spotify volume options the args are:
Loud
--enable-volume-normalisation --normalisation-pregain 6
Normal (Default)
--enable-volume-normalisation --normalisation-pregain 3
Quiet
--enable-volume-normalisation --normalisation-pregain -5
The problem is that with gain reduction as the clipping prevention method setting a positive pregain value basically breaks volume normalization. A drop in the pregain of a track that would clip can possibly make for a huge drop in perceived volume compared to other tracks.
What I would like to see is a choice of clipping prevention methods one being a limiter like what Spotify uses (Threshold -1dB, Attack 5ms, Release 100ms [for bonus points you could make it a look-ahead limiter so the Attack would be 0]) and the other being the current gain reduction method.
It would also be nice to have a set of args that would directly map to the 3 Spotify presets, applying the appropriate pregain and using the limiter.
For reference:
Here is the ReplayGain spec:
http://wiki.hydrogenaud.io/index.php?title=ReplayGain_specification
This explains Spotify's definition of volume normalization and the specs of their limiter:
https://artists.spotify.com/faq/mastering-and-loudness#what-is-loudness-normalization-and-why-is-it-used
This explains the volume normalization options in the official clients:
https://artists.spotify.com/faq/mastering-and-loudness#can-users-adjust-the-levels-of-my-music
Beta Was this translation helpful? Give feedback.
All reactions