-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gcs database #12817
base: main
Are you sure you want to change the base?
Gcs database #12817
Conversation
bin/run-db-download.py
Outdated
@@ -21,9 +21,12 @@ | |||
BUCKET_NAME = os.getenv("AWS_DB_S3_BUCKET", "bedrock-db-dev") | |||
REGION_NAME = os.getenv("AWS_DB_REGION", "us-west-2") | |||
S3_BASE_URL = f"https://s3-{REGION_NAME}.amazonaws.com/{BUCKET_NAME}" | |||
GCS_BASE_URL = f"https://storage.googleapis.com/{BUCKET_NAME}" | |||
DOWNLOAD_FROM_GCS = os.getenv("DOWNLOAD_FROM_GCS", 'False').lower() in ('true', '1', 't') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our tests don't check black
formatting on these files, but it would be nice if they passed (wrt the single vs double quoting).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, this line might be a little over the top too. Open to feedback on how you all like to handle boolean values as env variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other parts of the code base we use everett. If we used it here this could be written as DOWNLOAD_FROM_GCS = config("DOWNLOAD_FROM_GCS", parser=bool, default="false")
. But not sure why it isn't already used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh cool, thanks @robhudson pushed up a change that pulls that in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welp, maybe that is why looks to fail the tests:
Traceback (most recent call last):
File "/app/./bin/run-db-download.py", line 10, in <module>
from bedrock.base.config_manager import config
ModuleNotFoundError: No module named 'bedrock'
Looks like |
Thanks @robhudson ! |
bin/run-db-download.py
Outdated
|
||
# must import after adding bedrock to path | ||
from bedrock.base.config_manager import config # noqa | ||
|
||
BUCKET_NAME = os.getenv("AWS_DB_S3_BUCKET", "bedrock-db-dev") | ||
REGION_NAME = os.getenv("AWS_DB_REGION", "us-west-2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have config
now, we could optionally convert these (or anywhere else in this file where we use os.getenv
). An example of a string value would just be:
BUCKET_NAME = config("AWS_DB_S3_BUCKET", default="bedrock-db-dev")
Sorry to tack on so much extra.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem at all!
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #12817 +/- ##
==========================================
- Coverage 76.99% 76.88% -0.12%
==========================================
Files 144 144
Lines 7704 7669 -35
==========================================
- Hits 5932 5896 -36
- Misses 1772 1773 +1 ☔ View full report in Codecov by Sentry. |
@@ -63,6 +85,18 @@ def upload_db_data(db_data): | |||
except Boto3Error: | |||
return f"ERROR: Failed to upload the new database info file: {db_data}" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want anything around the if not s3
line above, so that in the future if we remove the AWS credentials this won't short circuit and never make it this far? Or would changes around S3 come later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea was to do kind of a multi step deploy:
- Get the code out
- Turn on GCS so it uploads to both GCS and S3, let that sit for a while
- Turn on download from GCS, let that sit
- Come back and remove the s3 references
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkochendorfer @pmac Can we get this rolled out to prod? Good to do ahead of the postgres/cms tango
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the extra changes. 👍
@bkochendorfer Do you need one of us to merge this? |
Yes I do not have merge capabilities here |
Chatted with Brett. We're going to sit on this until there's a bit more time to focus on it soon. |
One-line summary
Allow uploading and downloading from GCP CS (GCS) instead of AWS S3.
Significant changes and points to review
I felt like just throwing an error during the upload if there was a failure is fine but let me know if you want me to try to figure out what to catch.
Issue / Bugzilla link
#12637
Testing
I deployed the change manually in our
dev
gcp environment and connected to the clock pod to manually run anupload
and adownload
.