-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add FlareCleanupTask #947
base: main
Are you sure you want to change the base?
add FlareCleanupTask #947
Conversation
This PR includes changes to |
❌ 1 Tests Failed:
View the top 1 failed tests by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
❌ 1 Tests Failed:
View the top 1 failed tests by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
✅ All tests successful. No failed tests were found. 📣 Thoughts on this report? Let Codecov know! | Powered by Codecov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of suggestions:
- We should definitely limit / batch the query. This could yield millions of entries on the first run for all we know 🤷🏻♂️
- Using the
mock_storage
fixture, you could assert the actual side effect of files being removed from the storage backend, instead of manually mocking very specificArchiveService
methods
Otherwise this looks good, and the following are just "personal preference" suggestions:
- IMO, we should move core business logic out of the celery framework code. In this case this would mean having some free functions instead of all the logic being in
FlareCleanupTask.run_cron_task
. This means you don’t even need thedb_session
argument, as the code does not use sqlalchemy models, and it should make testing less awkward. - I’m personally not a fan of asserting log calls, but rather just the side effects we are aiming for: files being removed from storage, fields being cleared from the DB.
tasks/flare_cleanup.py
Outdated
# single query, loads all pulls and repos in qset into memory, deletes file in Archive 1 by 1 | ||
for pull in non_open_pulls_with_flare_in_archive: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, the "load all pulls … into memory" part can be potentially dangerous as it might overwhelm the task.
This should use a limit query and work in batches.
tasks/flare_cleanup.py
Outdated
archive_service = ArchiveService(repository=pull.repository) | ||
archive_service.delete_file(pull._flare_storage_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it might be worth grouping these by repository(.id)
on the python side, to deduplicate creation of per-repo ArchiveService
. Plus, I believe there is a delete_files
method to do batch deletions as well. Though I believe that depending on the storage backend, that will still delete each file individually.
tasks/flare_cleanup.py
Outdated
_flare_storage_path__isnull=False | ||
).select_related("repository") | ||
log.info( | ||
f"FlareCleanupTask will clear {non_open_pulls_with_flare_in_archive.count()} Archive flares" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this .count()
is a query on its own, which seems a bit unreasonable just for a log line?
mock_archive_service = mocker.patch( | ||
"shared.django_apps.utils.model_utils.ArchiveService" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of mocking the ArchiveService
, you could use the mock_storage
fixture (I believe that should be hooked up to the shared archive service).
That way, you can actually store files (in memory), and assert that those files are truely being deleted from the storage.
8b57acf
to
5ac629e
Compare
This PR includes changes to |
5ac629e
to
044d770
Compare
❌ 1 Tests Failed:
View the top 1 failed tests by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
❌ 1 Tests Failed:
View the top 1 failed tests by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
New strategy for managing
flare
storagedepends on codecov/shared#450
When a
pull
is no longerOPEN
, leave theflare
in order to reduce locks and wait time.During low traffic hours (4am UTC?), run this task to clear it out of our database and Archive storage.
So we won't pay for maintaining the
flare
field but also won't slow down otherpull
actions by checking or clearing the field 🎉