Gradient checkpointing with DDP in a loop #10479
Replies: 5 comments 4 replies
-
Dear @shivammehta007 , I also got this error, does it have been solved? |
Beta Was this translation helpful? Give feedback.
-
This does not appear to be a lightning issue, but rather with DistributedDataParallel from torch.distributed not supporting gradient checkpointing |
Beta Was this translation helpful? Give feedback.
-
Currently, I solved this problem. Cause of the model has parameters that were not used in producing a loss. Do the following two settings, and you will find the unused parameter names.
|
Beta Was this translation helpful? Give feedback.
-
Hi @kuixu, we can find the parameter names but how to go about the solution? Do we need to remove them or what? How will the issue be solved? |
Beta Was this translation helpful? Give feedback.
-
hi, still having this issue, working with fabric, and getting error Expected to mark a variable ready only once, what is the work around this? |
Beta Was this translation helpful? Give feedback.
-
Since my method is an Autoregressive algorithm It is making a huge gradient tape, I am trying to do something like this
It works fine on single GPU but on DDP it throws this error
I am running it with
Any workaround for this?
Beta Was this translation helpful? Give feedback.
All reactions