Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start Maintenance Daemon for Main DB at the server start. #7254

Merged
merged 29 commits into from
Oct 30, 2023

Conversation

emelsimsek
Copy link
Contributor

@emelsimsek emelsimsek commented Oct 12, 2023

DESCRIPTION: This change starts a maintenance deamon at the time of server start if there is a designated main database.

This is the code flow:

  1. User designated an admin database:
    ALTER SYSTEM SET citus.main_db = "myadmindb";

  2. When postmaster starts, in _PG_Init, citus calls
    InitializeMaintenanceDaemonForMainDb

    This function registers a background worker to run CitusMaintenanceDaemonMain with databaseOid = 0

  3. CitusMaintenanceDaemonMain takes some special actions when databaseOid is 0:

    • Gets the citus.main_db value.
    • Connects to the citus.main_db
    • Now the MyDatabaseId is available, creates a hash entry for it.
    • Then follows the same control flow as for a regular db,

@emelsimsek emelsimsek changed the title Start Maintenance Daemon in Control DB at the server start. Start Maintenance Daemon for Control DB at the server start. Oct 13, 2023
@@ -146,6 +147,12 @@ InitializeMaintenanceDaemon(void)
void
InitializeMaintenanceDaemonForAdminDB(void)
{
if (strcmp(ControlDbName, "") == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are GUCs already read in _PG_init? If not, this would always return true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it appears that they are read since this change is able to start a maintenance deamon in the controldb even if there is no connection.

It is also possible to move this check to CitusMaintenanceDaemonMain, Then a background worker will start but then quit immediately.

@emelsimsek emelsimsek requested a review from JelteF October 17, 2023 07:43
@codecov
Copy link

codecov bot commented Oct 18, 2023

Codecov Report

Merging #7254 (daa4cee) into main (10198b1) will decrease coverage by 0.11%.
The diff coverage is 93.33%.

❗ Current head daa4cee differs from pull request most recent head 1e805da. Consider uploading reports for the commit 1e805da to get more accurate results

@@            Coverage Diff             @@
##             main    #7254      +/-   ##
==========================================
- Coverage   93.22%   93.11%   -0.11%     
==========================================
  Files         275      275              
  Lines       59491    61397    +1906     
  Branches        0     7421    +7421     
==========================================
+ Hits        55459    57169    +1710     
- Misses       4032     4163     +131     
- Partials        0       65      +65     

@@ -481,6 +481,7 @@ _PG_init(void)
#endif

InitializeMaintenanceDaemon();
InitializeMaintenanceDaemonForMainDb();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you treat DB as a word? Should it be ...ForMainDb() or...ForMainDB()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emelsimsek emelsimsek changed the title Start Maintenance Daemon for Control DB at the server start. Start Maintenance Daemon for Main DB at the server start. Oct 20, 2023
def test_set_maindb(coord):
with coord.cur() as cur1:
cur1.execute("DROP DATABASE IF EXISTS mymaindb;")
cur1.execute("CREATE DATABASE mymaindb;")
Copy link
Contributor

@JelteF JelteF Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's track the databases we create so we can clean them up automatically, just like we do for schemas. See create_schema, cleanup_schemas and cleanup_test_leftovers in src/test/regress/citus_tests/common.py

That should also make the DROP DATABASE IF EXISTS unnecessary.

Comment on lines 160 to 162
SafeSnprintf(worker.bgw_name, sizeof(worker.bgw_name),
"Citus Maintenance Daemon: %u/%u",
0, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of 0/0 it's more useful to users to write out what it is for.

Suggested change
SafeSnprintf(worker.bgw_name, sizeof(worker.bgw_name),
"Citus Maintenance Daemon: %u/%u",
0, 0);
strcpy_s(worker.bgw_name, sizeof(worker.bgw_name),
"Citus Maintenance Daemon for Main DB");

Comment on lines 365 to 386
/*
* When the database crashes, background workers are restarted, but
* the state in shared memory is lost. In that case, we exit and wait
* for Postmaster calling __PG_Init which in turn calls
* InitializeMaintenanceDaemonForAdminDB.
*/
Copy link
Contributor

@JelteF JelteF Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is actually not true here. Here' we're checking for insert failure, i.e. MaintenaceDaemonHash being full. We should throw a useful error here about that, because that's quite unexpected.

Copy link
Contributor

@JelteF JelteF Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think it might be good to move this insertion logic (including zero-ing) to a function that is called here and by InitializeMaintenanceDaemonBackend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my main source of finding this code confusing is the two branches (if (databaseOid == 0) and else) that seem similar, but are actually quite different. I think it would help readability to split these a bit better. Either by putting their bodies in a function, or creating a new CitusMaintenanceDaemonMain function for the databaseOid == 0 case. These could then call some functions for the shared code. But I think having these long branches but quite different branches make the code hard to follow, since now there's two paths that you need to look at carefully.

@emelsimsek emelsimsek marked this pull request as ready for review October 23, 2023 18:01
@emelsimsek emelsimsek force-pushed the StartMaintenanceDaemonInControlDB branch from b28a6c8 to 1e805da Compare October 23, 2023 18:36
== 0
)

cluster.coordinator.cleanup_databases()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup_databases should be called in the cleanup_test_leftovers function so that it is called even when the test fails.

@emelsimsek emelsimsek requested a review from JelteF October 27, 2023 07:39
Comment on lines 342 to 348
* ConnectToDatabase connects to the database for the given databaseOid.
* if databaseOid is 0, connects to MainDb and then creates a hash entry.
* If a hash entry cannot be created for MainDb it exists the process requesting a restart.
* However for regular databases, it exits without requesting a restart since another
* subsequent backend should be starting the Maintenance Deamon.
* If a hash entry found has a valid workerPid, it exits
* without requesting a restart since there is already a deamon running.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewording + typo fixes

Suggested change
* ConnectToDatabase connects to the database for the given databaseOid.
* if databaseOid is 0, connects to MainDb and then creates a hash entry.
* If a hash entry cannot be created for MainDb it exists the process requesting a restart.
* However for regular databases, it exits without requesting a restart since another
* subsequent backend should be starting the Maintenance Deamon.
* If a hash entry found has a valid workerPid, it exits
* without requesting a restart since there is already a deamon running.
* ConnectToDatabase connects to the database for the given databaseOid.
* if databaseOid is 0, connects to MainDb and then creates a hash entry.
* If a hash entry cannot be created for MainDb it exists the process requesting a restart.
* However for regular databases, it exits without requesting a restart since another
* subsequent backend should be starting the Maintenance Daemon.
* If the found hash entry has a valid workerPid, it exits
* without requesting a restart since there is already a daemon running.

Copy link
Contributor

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a small suggestions about the wording in the codecomment. But other than that this looks good now.

@emelsimsek emelsimsek force-pushed the StartMaintenanceDaemonInControlDB branch from a210359 to 2565296 Compare October 30, 2023 06:29
@emelsimsek emelsimsek merged commit ee8f4bb into main Oct 30, 2023
153 of 154 checks passed
@emelsimsek emelsimsek deleted the StartMaintenanceDaemonInControlDB branch October 30, 2023 06:44
francisjodi pushed a commit that referenced this pull request Nov 13, 2023
DESCRIPTION: This change starts a maintenance deamon at the time of
server start if there is a designated main database.

This is the code flow:

1. User designates a main database:
   `ALTER SYSTEM SET citus.main_db =  "myadmindb";`

2. When postmaster starts, in _PG_Init, citus calls 
    `InitializeMaintenanceDaemonForMainDb`
  
This function registers a background worker to run
`CitusMaintenanceDaemonMain `with `databaseOid = 0 `

3. `CitusMaintenanceDaemonMain ` takes some special actions when
databaseOid is 0:
     - Gets the citus.main_db  value.
     - Connects to the  citus.main_db
     - Now the `MyDatabaseId `is available, creates a hash entry for it.
     - Then follows the same control flow as for a regular db,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants