-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log epoch real time in LNNP #231
base: main
Are you sure you want to change the base?
Conversation
@AntonioMirarchi please review |
It works fine. This is the time column form the metrics.csv:
The Nan on the last row is not clear to me |
I think the Nan is due to reaching the max num epochs by the trainer, so it's not starting a new epoch but it's creating a row in the metrics regardless of it. |
I am only logging the time in |
Antonio can you give me a yaml to reproduce this NaN thing you see? I cannot trigger it. |
I think the NaN Antonio reports comes from the test pass at the end of a train. It is considered a new "training" and for some reason the first time has NaN. |
Tried to change the logging of the epoch to integer, but this produces a warning: You called `self.log('epoch', ...)` in your `on_validation_epoch_end` but the value needs to be floating to be reduced. Converting it to torch.float32. You can silence this warning by converting the value to floating point yourself. If you don't intend to reduce the value (for instance when logging the global step or epoch) then you can use `self.logger.log_metrics({'epoch': ...})` instead. There is some (really unconvincing imo) discussion on why one cannot log an integer Lightning-AI/pytorch-lightning#18739 |
This reverts commit 6cb018c.
This PR changes the LNNP module so that the real time since the start is logged each epoch.
Allows to track training time with the CSVLogger.