DDP performance varies from gpu number #56

yuanze1024 · 2023-12-27T06:16:42Z

I have found that when using 4 Gpus', the results are inferior to 2 Gpus'.

After checking the codes, I found that the code reduce losses and then divide by the world_size. That is kinda weird to me. I think the loss.backward will do it implicitly. So I remove this.

#def all_reduce_average(tensor):
#    val = all_reduce_sum(tensor)
#    return val / get_world_size()

def all_reduce_average(tensor):
    return tensor

I'm still running the experiments, and after getting the result, I'll record it here.

yxchng · 2024-03-17T05:08:01Z

@yuanze1024 are your experiments done? does removing this improve performance?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDP performance varies from gpu number #56

DDP performance varies from gpu number #56

yuanze1024 commented Dec 27, 2023

yxchng commented Mar 17, 2024

DDP performance varies from gpu number #56

DDP performance varies from gpu number #56

Comments

yuanze1024 commented Dec 27, 2023

yxchng commented Mar 17, 2024