-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training doesn't converge #50
Comments
@fengziyue |
No I didn’t make it converge
…On Tue, Apr 6, 2021 at 9:54 PM Zoengkyun ***@***.***> wrote:
@fengziyue <https://github.com/fengziyue>
I have the same question.
I tried batchSize =8 with 2TITAN RTX, but it doesn't converge either.Same
situation as you,RMSE ~8k-9k.
then I increased the weight of photometric_loss to 1, the first epoch
converged to RMSE1400 +, and later epoches diverged.
I see in his trained model file(sparse+photo), bs =16,(4TITAN RTX?).I
don't have that many GPUs to experiment
Have you tried batchSize =16?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#50 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFCA33X42GHN743A2FO3RTDTHO3LFANCNFSM4WUI5MIQ>
.
|
@fangchangma |
@fangchangma |
Hi Fangchang:
Thank you so much for sharing this great project!
I have tested your pre-trained self-supervised model, it's RMSE is around 1300, matched with your paper.
But when I try to train the model with this command:
python main.py --train-mode sparse+photo
on 2 Tesla-V100 GPU for around 15 epochs, it can only converge to RMSE ~8k-9k and never further. I didn't change any hyper parameter from your code, just the batch-size is smaller than you mentioned (8).
Are there any parameters or options I need to change from this Github repo? Or do you have any suggestions on training?
Thank you so much!
Sincerely,
Ziyue Feng
The text was updated successfully, but these errors were encountered: