Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QC-1038] fix Race condition in Service Discovery #2024

Merged
merged 4 commits into from
Nov 2, 2023

Conversation

Barthelemy
Copy link
Collaborator

I used a mutex and a condition variable with a flag to know that the port has been assigned. Perhaps there is a better way ?

@Barthelemy Barthelemy requested a review from knopers8 as a code owner November 1, 2023 15:00
@Barthelemy
Copy link
Collaborator Author

@awegrzyn could you review too ?

@awegrzyn
Copy link
Contributor

awegrzyn commented Nov 1, 2023

Seems ok, but what about simply running _register in new thread right after selecting the port? In such case no mutexes are needed and registering to Consul is done in separate thread (no impact on QC main thread).

@Barthelemy
Copy link
Collaborator Author

@awegrzyn I thought about it but we call _register from many places in the code to update the list of objects we publish. Thus we might still end up calling it before we have a port.

@awegrzyn
Copy link
Contributor

awegrzyn commented Nov 1, 2023

Ah okay, understood, but I have a impression that we're getting back to original problem => QC main thread stuck because of waiting for port selection. @knopers8 Could you confirm?

@knopers8
Copy link
Collaborator

knopers8 commented Nov 1, 2023

The original problem was different. The port selection routine was selecting a port by binding a port in range until succeeded, then releasing it and letting the new thread bind it again. In the time gap between releasing the port and binding it again, sometimes another process was binding it.

I will review the PR tomorrow

@awegrzyn
Copy link
Contributor

awegrzyn commented Nov 2, 2023

Ah ok, so if you don't mind to wait in main thread for port selection that it's fine.

@Barthelemy
Copy link
Collaborator Author

If I understand correctly what I did, we wait in the main thread for the port selection (that should not take really long) and if none is found it will release the main thread any ways. I did some tests and it seemed to work. Thank you Piotr for reviewing it. Ideally I would like to have it in today's release but if there is a logical flaw we will wait.

// Wait until port is set by the thread
{
std::unique_lock<std::mutex> lk(mHealthPortMutex);
mHealthPortCV.wait(lk, [this] { return mHealthPortAssigned == true; });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I misunderstood how condition_variable works, but won't it block the second time you call _register? I am not sure if it requires the other thread to call notify_one() for each wait.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I wrote a little reproducer and it seems it my concern is not valid, it seems that notify_one() unblocks the main thread forever, as long as the predicate is provided in wait.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you noted, as long as the predicate is true, it will let the main thread continue.

Copy link
Collaborator

@knopers8 knopers8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems fine, thanks!

@Barthelemy Barthelemy merged commit dab8261 into AliceO2Group:master Nov 2, 2023
2 checks passed
@Barthelemy Barthelemy deleted the SD-race branch November 2, 2023 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants