Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save model checkpoints to Wandb #13

Merged
merged 3 commits into from
Nov 2, 2024

Conversation

ryanhoangt
Copy link
Contributor

Description

This PR is to:

  • improve model checkpoint logging locally
  • log checkpoints to wandb

Related Issue

Fixes #7

How Has This Been Tested?

Screenshot 2024-10-17 at 19 17 46

Copy link
Collaborator

@danbraunai-apollo danbraunai-apollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, though one comment that should be addressed before merging.

Comment on lines 40 to 41
with open(save_dir / config_filename, "w") as f:
yaml.dump(config_dict, f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config file might exist already. I think i'd either check if this file already exists and save it if not, or just save the config earlier in the script and don't save it here. The downside of the latter is if you do lots of trial/debugging runs, you end up with lots of output directories with a config file in it but nothing else.

Also note that the docstring assumes you're doing the former.

Copy link
Contributor Author

@ryanhoangt ryanhoangt Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, addressed! Tho I'm not sure why when I tested, the content seemed to still be correctly written (i.e. the config was not appended multiple times)

@lennart-finke
Copy link
Collaborator

Just tested this, looking good and merging. Thanks @ryanhoangt!

@lennart-finke lennart-finke merged commit 53db0b3 into danbraunai:main Nov 2, 2024
1 check passed
@ryanhoangt ryanhoangt deleted the wandb-checkpoint-upload branch November 2, 2024 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Log runs with wandb
3 participants