-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Free some resources after each step to avoid OOM #75
base: main
Are you sure you want to change the base?
Conversation
hmm, i think the GC should handle this? |
For some reason it seems like it doesn't until the variable goes out of scope - see https://pytorch.org/docs/stable/notes/faq.html ("If you assign a Tensor or Variable to a local, Python will not deallocate until the local goes out of scope.") IDK whether casting and deleting the loss is necessary, as it made no difference in my run (I was just obeying the pytorch docs above). But, deleting the sample data after saving definitely caused my test runs to succeed where previously they were running out of GPU memory. |
(I should clarify - my understanding is that the resources will be freed by the GC when the variable is reassigned on the next step. We're not leaking significant memory on an ongoing basis. Just, we're consuming a constant amount of memory we don't need to be by keeping old local variables around with large storage on the GPU, after they're done being used but before they're reassigned on the next loop. For me that was enough to push the consumption over the edge and make the program quit because it had no more GPU memory.) |
May I ask what GC stands for? |
c894e2c
to
60f760e
Compare
I've seen this issue in the past, and I think what @andrewmoise proposes makes sense. The garbage collector (GC) doesn't handle this situation by itself. For example, in the following for loop, the computational graph gets constructed by for batch in loader:
loss = compute_loss(batch) |
ed62022
to
7c8e5aa
Compare
1891844
to
ac933e4
Compare
@lucidrains Is this resolved now ? |
Is this of interest? I'm cleaning out my github, and I'd like to be able to either get it in, or close it as WONTFIX and delete my branch. I'm happy to recheck the state of the code now, since it looks like things have changed and the original PR won't apply anymore. If it's not of interest though, then no worries, just let me know and I can close the issue/PR. |
Fixes #74