Training doesn't converge #50

fengziyue · 2021-01-26T23:21:47Z

Hi Fangchang:

Thank you so much for sharing this great project!

I have tested your pre-trained self-supervised model, it's RMSE is around 1300, matched with your paper.
But when I try to train the model with this command:
python main.py --train-mode sparse+photo
on 2 Tesla-V100 GPU for around 15 epochs, it can only converge to RMSE ~8k-9k and never further. I didn't change any hyper parameter from your code, just the batch-size is smaller than you mentioned (8).

Are there any parameters or options I need to change from this Github repo? Or do you have any suggestions on training?

Thank you so much!

Sincerely,
Ziyue Feng

The text was updated successfully, but these errors were encountered:

Zoengkyun · 2021-04-07T01:53:43Z

@fengziyue
I have the same question.
I tried batchSize =8 with 2TITAN RTX, but it doesn't converge either.Same situation as you,RMSE ~8k-9k.
then I increased the weight of photometric_loss to 1, the first epoch converged to RMSE1400 +, and later epoches diverged.
I see in his trained model file(sparse+photo), bs =16,(4TITAN RTX?).I don't have that many GPUs to experiment
Have you tried batchSize =16?

fengziyue · 2021-04-07T02:01:22Z

No I didn’t make it converge

…

On Tue, Apr 6, 2021 at 9:54 PM Zoengkyun ***@***.***> wrote: @fengziyue <https://github.com/fengziyue> I have the same question. I tried batchSize =8 with 2TITAN RTX, but it doesn't converge either.Same situation as you,RMSE ~8k-9k. then I increased the weight of photometric_loss to 1, the first epoch converged to RMSE1400 +, and later epoches diverged. I see in his trained model file(sparse+photo), bs =16,(4TITAN RTX?).I don't have that many GPUs to experiment Have you tried batchSize =16? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#50 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFCA33X42GHN743A2FO3RTDTHO3LFANCNFSM4WUI5MIQ> .

Zoengkyun · 2021-04-07T02:09:58Z

@fangchangma
Thanks for sharing !
We can't make it converge with self-supervised
Do you have any suggestions on training?
This is very important,thank you!

Thermaloo · 2023-06-29T01:38:06Z

@fangchangma
I also encountered the same problem. Could you give us some responses?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training doesn't converge #50

Training doesn't converge #50

fengziyue commented Jan 26, 2021

Zoengkyun commented Apr 7, 2021

fengziyue commented Apr 7, 2021 via email

Zoengkyun commented Apr 7, 2021

Thermaloo commented Jun 29, 2023

Training doesn't converge #50

Training doesn't converge #50

Comments

fengziyue commented Jan 26, 2021

Zoengkyun commented Apr 7, 2021

fengziyue commented Apr 7, 2021 via email

Zoengkyun commented Apr 7, 2021

Thermaloo commented Jun 29, 2023