Timestep distribution = Logit Normal and Loss weight function = Min_SNR_Gamma, Impact of new noise distribution on loss weight function ? #541

Zokreb · 2024-11-04T18:09:43Z

Zokreb
Nov 4, 2024

Hello community,

Maybe my question makes no sense, if this is the case, please say so, but i'd be happy if you could say why.

Since we have had access to stable diffusion fine-tuning, LoRas etc., we have seen improvement on loss weight function during training and the processing of noise with Min_SNR_Gamma and later Debiased Estimation.
This "math" have been developped when SD1.5 and later SDXL were king and those models were using uniform timestep distribution of noise.

With the arrival of SD3 and Flux models, we have had models that have been using Logit_Normal noise scheduling.
Does this have an impact on the loss weight function we use ? Are Min_SNR_Gamma or Debiased_Estimation still the "go to"* loss weight function ?
*When I say "go to", it's at least based on personal experience on SD1.5 and XL where I've always had my best results using either Min SNR or Debiased Estimation, i'm sure there can be some cases where the usual "math" (I do not know the appropriate word, was it MSE ?)

Thank you very much for your insights!

miasik · 2024-12-11T22:17:56Z

miasik
Dec 11, 2024

@Zokreb could you please share your experience related to MSE, MAE and Log-Cosh? Should we try other values except the default ones?

4 replies

Zokreb Dec 13, 2024
Author

Well, to be fair, I have none that's legitimate.
My question was to ask for feedback from people who actually know what they are doing :)
The only thing I can say is that I've trained some SD1.5 and SDXL to a point that satisfied what I wanted to achieve.
But for FLUX and worse, for SD3.5, I haven't been able to get good results. I'm quite confident the data set is OK, so I suppose there might be some settings that would be more appropriate than others. Considering those models are supposed to be trained on LogitNormal timestep distribution, then I suppose their could be some exploration there... But when i look at the samples during training, I'm under the impression that with LogitNormal distrib, the results are even worse than with uniform distribution.

miasik Dec 14, 2024

Maybe I'm wrong but I got that you're familiar with using "Min_SNR_Gamma" and "Debiased Estimation". If I'm right, share your experience, please. What are good settings to get better fine-tuning?

Zokreb Dec 15, 2024
Author

Hi Miasik,
The following is not based on empirical measures or training metrics analytics, it's just based on personal preference :
I always got better results with min_snr compared to debiased estimation.

I do not why that is since debiased estimation is supposed to be the most recent one (so you would guess it's the best, right ?)
But, regardless of the settings, all things being equal, i prefer min_snr. I find that debiased estimation is quicker to "generalize" compared to min_snr, but it's missing some tiny details, like cloth texture, skin texture, jewelry etc. And, from my subjective point of view, i prefer the results. And i find it consistent with any optimizer, learning rate, initial checkpoint, or any other hyper parameter for that matter.

I really insists, this is not based on careful analytics of training metrics, just on personnal preference of the outputs i get after a training.

tl;dr: min_snr all the way :)

miasik Dec 15, 2024

Hi Miasik, The following is not based on empirical measures or training metrics analytics, it's just based on personal preference : I always got better results with min_snr compared to debiased estimation.
I really insists, this is not based on careful analytics of training metrics, just on personnal preference of the outputs i get after a training.
tl;dr: min_snr all the way :)

Thank you so much!
Based on that I'm going to try running the first epoch with big effective bath size and "debiased estimation" because I want to generalize the source model and than I'll run next epochs with smaller EBS and "min_snr".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timestep distribution = Logit Normal and Loss weight function = Min_SNR_Gamma, Impact of new noise distribution on loss weight function ? #541

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Timestep distribution = Logit Normal and Loss weight function = Min_SNR_Gamma, Impact of new noise distribution on loss weight function ? #541

Zokreb Nov 4, 2024

Replies: 1 comment · 4 replies

miasik Dec 11, 2024

Zokreb Dec 13, 2024 Author

miasik Dec 14, 2024

Zokreb Dec 15, 2024 Author

miasik Dec 15, 2024

Zokreb
Nov 4, 2024

Replies: 1 comment 4 replies

miasik
Dec 11, 2024

Zokreb Dec 13, 2024
Author

Zokreb Dec 15, 2024
Author