-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use PostgresAdmin directly for DB backups #160
Conversation
bc96d54
to
650c62b
Compare
raise message | ||
end | ||
|
||
MiqRegion.replication_type = :none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fryguy and/or @jrafanie So most of the methods in "green" here are pulled from lib/evm_database_ops.rb
in ManageIQ/manageiq
:
and I was able to convert most of what was there previously (model or raw ActiveRecord
) to straight Pg
calls, but this one seems specific. After taking a look myself, I wasn't able to determine why this line specifically was added, as it seems that it will always be effectively a "no-op" when the method involved is called:
But something tells me I am missing something. Can you enlighten me on what I am missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replication_type
describes whether or not this database is the global side of replication or the remote side of replication. From a DB backup/restore point of view though I can't see why that would be important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I think what's happening here is that it's temporarily disabling replication. Setting replication_type to none would essentially unregister the replication provider.
EDIT: But I agree - it's almost like the code is missing the "uninstall" bit? Maybe @carbonin remembers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a bug where restore couldn't drop the database because the replication backend was holding a connection open I think. There could have very well been more to it than that, but it should all bit in git history/bugzilla somewhere.
Also 👋
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carbonin Okay, I think I see where my confusion was.
We pass into .replication_type=
the desired_type
, but the if current_type
statements are getting the values from .replication_type
, which will do some destroy/configure actions if needed, and I assume the methods called in .replication_type
query against Postgres/PgLogical/stuffIVaguelyKnowAbout
Looks like I might need to look into what those do further, and see if we need to somehow replicate that logic over in the console.
Thanks for the response (and 👋 )
(oh, and yes, I did find that commit previously, but was misinterpretting the contents of the method. I now see the error in my ways)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh hai @carbonin !! 👋
7964013
to
0cbfa5b
Compare
result[0]["count"] > 0 | ||
end | ||
|
||
private_class_method def self.disable_replication(dbname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will handle all of the necessary concerns that were brought up previously:
I didn't delete the MiqRegion
, as is currently done in PglogicalSubscription
, but since this is a restore, that data will be blown away anyway, so we only need the parts that will be used prevent an error due to the HA code preventing a restore (which I think this is all of it... but I will double check).
I condensed a lot of what was in PglogicalSubscription
to be a single method, so much of what is used here is spread out in the class between .find
, .find_all
, .delete
, .safe_delete
, and others. I also removed some of the column id conversion as I didn't think it was necessary, and just referenced it as sub_id
for the purposes of the .drop_subscription
portion of the code below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I did test this on a local appliance using:
NickLaMuro/manageiq-db_backup-tests#1
And it worked just fine, so at least from a non-HA appliance setup, these new changes don't seem to break anything. Don't really know how to set up a HA appliance to test the scenarios that were present in https://bugzilla.redhat.com/show_bug.cgi?id=1444993 so that is still a unknown, but the base case at least still works.
7d586d7
to
03dc939
Compare
Allows this to be shared with other classes.
38f7280
to
c02d242
Compare
c02d242
to
88f647d
Compare
- smb | ||
- s3 | ||
- ftp | ||
- swift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍
with_pg_connection do |conn| | ||
pglogical = PG::LogicalReplication::Client.new(conn) | ||
|
||
if pglogical.subscriber? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it makes sense, I'm ok with pushing down any of this logic into the pg/logical_replication gem. This could also be done in a follow-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because of the delay I have had trying to get this PR, I am going to punt on doing this now and handle in a follow up.
So we don't loose track of it, I will open up an issue that can be triaged later, and we can determine if it makes sense then to push this into the gem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
88f647d
to
b4e26aa
Compare
So I have determined that there is a failure with the specs here that is based on the order that which they are run in... but have failed to track down what it is... The |
49a6508
to
c6a991c
Compare
Instead of calling a rake task in core, call `PostgresAdmin` which is now part of this repository. This takes many of the checks that exist in `lib/evm_database_ops.rb` (core) and puts them in this class, tweaking them slightly so they don't require things like `ActiveRecord` to function, or classes that only exist in core `(VmdbDatabaseConnection` for example) In addition, this also doesn't include any of the `.with_file_storage` calls as that will be phased out. `PostgresAdmin` just needs a local file to write to, and since `MiqFileStorage` used a FIFO to write data to external sources (s3, swift, FTP), there is no change required there, and this can just accept a local file and pass that to `PostgresAdmin`.
Finally... it passed... Now I just have to get the PR rebased... but that is a problem for "Monday Nick" to deal with... |
The prepare logic should live in the PostgresAdmin class as it is something that could be used for not only the console, but also CLI. Also, fixed a few things that wouldn't have worked (integration testing of `restore` hasn't been done with this code yet): - Removed the MiqRegion.replication_type= call (not used, I don't think) - Remove connection_pool disconnect (not relevant, since we aren't using ActiveRecord with this code base) - Fixed .connection_count to handle things differently per backup type and dbname (this existed previously with VmdbDatabaseConnection, but was not properly ported over) Everything else should be the same logically speaking, just updated for working within the PostgresAdmin class itself.
a9fc272
to
a6f9740
Compare
Use PG::LogicalReplication::Client to remove subscriptions or publications on a db prior to doing a database reload. Doesn't drop the `MiqRegion`, as was previously done as part of PglogicalSubscription in core: https://github.com/ManageIQ/manageiq/blob/8ca93da126aada556685db5d05aaf5e9a0666900/app/models/pglogical_subscription.rb#L70 But since this is a database reload, and the Region will be removed from the DB anyway, this should not matter, and the necessary code required to perform that delete will be pointless to include in this gem.
If an error occurs while connecting (for example, if you have the wrong user), then this will fail with just that error, instead of an error that exists in the ensure as well.
During a restore, when calling `PostgresAdmin.with_pg_connection`, use the one defined in `PostgresRunner` since it includes connection information for the secondary DB, instead of the one running default one running on the user's machine.
Adds a helper method to PostgresRunner (.hard_reset), which is used to reset the database to a fresh state. Required for local tests only since the DB backup was taken off of an appliance, so when that is restored first, it changes the default DB roles to that `.initial_setup` create and is assumed by the class.
For the `.restore` specs, ensure the env is setup properly so the specs can run properly. With the new methods that check the existing database connections in `PostgresAdmin` directly, this allows said connection to work properly. For CI, we don't want to do this, since it doesn't actually setup the ENV (bad method name, my bad...), but runs the restore in a task all by itself.
Doesn't seem to be working currently (at least locally), and supporting this isn't terribly necessary now that we won't be streaming files in Ruby. Might delete later...
Previously, the logic to pre-determine the backup type required knowledge of the file storage type that PostgresAdmin did not have. In the case where the file storage data was being streamed (S3, FTP, etc.) and was only read from the source once, you had to call out to the `PostgresAdmin.pg_dump_file?` and `PostgresAdmin.base_backup_file?` calls manually to determine what type file was going to be restored so the proper code could be executed, and `PostgresAdmin` didn't have to worry if the file data was being streamed or was just read off of a disk/mount. Since .validate_backup_file_type (not show in diff) is now part of .restore, this extra case logic is no longer necessary, as `backup_type` will always be set if not passed in. As such, the code and spec for it has been updated to reflect that.
When not in a RSpec context (example: when we are doing database operations in a separate process in CI), ensure that we have configured a `ManageIQ::ApplianceConsole.logger` so things don't break in CI.
Using `>>` requires being at the proper elevated permission at the time the command is executed, and since that isn't possible with these scripts, it is simpler to use sed for this, and append a `sudo` to the front.
a6f9740
to
9fd0741
Compare
This seems to cause issues, and postgres on Travis is setup to use `trust` instead of `peer`. Since the tests generally work with the default, continue using that instead of using what is brought over from a `.restore`.
9fd0741
to
bcb52d7
Compare
Checked commits NickLaMuro/manageiq-appliance_console@b57d6f4~...bcb52d7 with ruby 2.6.3, rubocop 1.13.0, haml-lint 0.35.0, and yamllint lib/manageiq/appliance_console/postgres_admin.rb
spec/postgres_runner_helper.rb
|
@Fryguy I think this is finally ready for your review. Tests should be passing, and I have rebased/cleaned up the commits. |
end | ||
|
||
private_class_method def self.disable_replication(dbname) | ||
require 'pg/logical_replication' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to include this gem in the gemspec now? nvm - I searched the page and didn't find it and then suddenly I saw it 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if you searched for it using the pg/...
, then yeah, it wouldn't show up. But yes, it is required, but I think the specs wouldn't have passed if I didn't have it. 😉
context "with a pg_dump file from a pipe" do | ||
let(:dbname) { "pg_dump_restore_of_simple_db_from_pipe" } | ||
let(:fifo_path) { Pathname.new(Dir::Tmpname.create("") {}) } | ||
# context "with a pg_dump file from a pipe" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not... it was failing, but there is no real reason that it should though.
But it isn't really needed, so it can be tossed. Will do that in a follow up.
CiPostgresRunner.stop | ||
CiPostgresRunner.start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super-minor, but you could add hard_reset method to CiPostgresRunner for consistency with PostgresRunner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can do that in a followup.
Merged...everything else can be follow-up, if needed. |
Tested using https://github.com/NickLaMuro/manageiq-db_backup-tests to validate the changes.
There are a bunch of "deletes" that will follow this change, but this should not only be merged, but also released as a new version before those changes are merged. Otherwise the nightly builds of the appliance console will break.
TODO
should work just like local, but currently have to rework connecting to the SMB/NFS shares prior to the tests running (as part of the test suite), and that currently doesn't existsimpler than expected, and there are just a few kinks to work out. Will post a update to themanageiq-db_backup-tests
when it is available.Probably not, but worth mentioning that this is a feature that will no longer be supportedAnswer: Nope, burn it.Links