Has anyone tried AMP BF16 pretraining from scratch? #271
Unanswered
kyleliang919
asked this question in
Q&A
Replies: 2 comments 2 replies
-
We trained many models with amp bf16 and didn't see a difference from amp
fp16.
What model are you training specifically?
We did fairly large scale (10k to 100k GPU hours) so that might be a
difference to your setup?
…On Fri, Dec 2, 2022, 17:30 Kyle Liang ***@***.***> wrote:
I am getting consistently worse zero-shot results from FP32/FP16
pertaining on the same set of hyperparameters, despite the training loss
roughly matches. Just wondering if bf16 is particularly bad for models like
CLIP.
—
Reply to this email directly, view it on GitHub
<#271>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437Q7GQWK7A6LDHGMKZ3WLIP2PANCNFSM6AAAAAASSCYEEI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
2 replies
-
Ok that's pretty small scale. Maybe bf16 causes issues only at this scale.
Anyway stability issues appear only at larger scale, so you can use fp16
for sure
…On Fri, Dec 2, 2022, 19:49 Kyle Liang ***@***.***> wrote:
I am trying to replicate the YFCC experiment in the paper following the
exact hyperparameters.
—
Reply to this email directly, view it on GitHub
<#271 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437WQ6IU6ZU5QB673HM3WLJAFDANCNFSM6AAAAAASSCYEEI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
On YFCC, I am getting consistently worse zero-shot results from FP32/FP16 pertaining on the same set of hyperparameters, despite the training loss roughly matches. Just wondering if bf16 is particularly bad for models like CLIP.
Beta Was this translation helpful? Give feedback.
All reactions