None of the inputs have requires_grad=True. Gradients will be None #573

maxjaritz · 2023-07-20T11:51:41Z

I am fine-tuning a model on a custom dataset. At training start, I get the warning "None of the inputs have requires_grad=True. Gradients will be None". I made this warning disappear by adding use_reentrant=False in the checkpoint function in the following three lines in transformer.py:

Interestingly, this also increased performance in the train/val loss and cross-modal retrieval, simply by setting use_reentrant=False!

My training command is:

torchrun --nproc_per_node 8 -m training.main \
--train-data 'mydata/{00000..04089}.tar' \
--val-data 'mydata/{04090..04095}.tar' \
--train-num-samples 16115354 \
--val-num-samples 70965 \
--dataset-type webdataset \
--epochs 10 \
--batch-size 1650 \
--precision amp \
--local-loss \
--gather-with-grad \
--grad-checkpointing \
--ddp-static-graph \
--workers 8 \
--seed 0 \
--lr 0.3e-3 \
--warmup 1220 \
--report-to tensorboard \
--resume "latest" \
--zeroshot-frequency 1 \
--model ViT-B-32 \
--name ... \
--pretrained laion2B-s34B-b79K \
--lock-image \
--lock-image-unlocked-groups 9

The problem is not occurring when removing the following arguments from the training command

--lock-image \
--lock-image-unlocked-groups 9

It might be related to the following warning from the PyTorch docs (https://pytorch.org/docs/stable/checkpoint.html):

If use_reentrant=True is specified, at least one of the inputs needs to have requires_grad=True if grads are needed for model inputs, otherwise the checkpointed part of the model won’t have gradients. At least one of the outputs needs to have requires_grad=True as well. Note that this does not apply if use_reentrant=False is specified.

Do you know what the underlying issue is?

The text was updated successfully, but these errors were encountered:

rwightman · 2023-09-15T23:52:32Z

hmm, I would have thought this works as long as you don't lock the full image or text towers... but perhaps not, it may not be good idea to checkpoint the parts of the model that have gradients disabled.

Should probably set use_reentrant=False but it's never been clear to me what the downside to that is, the PT docs mention many pluses of =False, but why was =True the default, hohumm

maxjaritz · 2023-09-16T06:03:59Z

In the pytorch doc, I also saw:

Note that future versions of PyTorch will default to use_reentrant=False. Default: True

maxjaritz closed this as completed Sep 16, 2023

maxjaritz reopened this Sep 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

None of the inputs have requires_grad=True. Gradients will be None #573

None of the inputs have requires_grad=True. Gradients will be None #573

maxjaritz commented Jul 20, 2023 •

edited

Loading

rwightman commented Sep 15, 2023

maxjaritz commented Sep 16, 2023

None of the inputs have requires_grad=True. Gradients will be None #573

None of the inputs have requires_grad=True. Gradients will be None #573

Comments

maxjaritz commented Jul 20, 2023 • edited Loading

rwightman commented Sep 15, 2023

maxjaritz commented Sep 16, 2023

maxjaritz commented Jul 20, 2023 •

edited

Loading