-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updater.StopBot should close polling context #211
Conversation
@@ -157,18 +157,20 @@ func (u *Updater) StartPolling(b *gotgbot.Bot, opts *PollingOpts) error { | |||
} | |||
} | |||
|
|||
bData, err := u.botMapping.addBot(b, "", "") | |||
ctx, closeFn := context.WithCancel(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could reuse the opts logic to pass in an updater from #210 here - might be a good way to handle signals?
Though the signals approach would only close the updater, not the dispatcher; so might be best to handle closure explitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach is much better than in #210 - the library should not be concerned about any of the OS signals — it just needs to know one thing: "something wants the updater to stop". And technically it doesn't need the Context, as it's not going to use it for anything else but only to know when it's cancelled.
Handing specific OS signals and termination is a caller's concern, for example, that's how the echoBot could handle the os.Interrupt signal and tell the bot to terminate — works great on this branch:
diff --git a/samples/echoBot/main.go b/samples/echoBot/main.go
index bdffc88..2f390b1 100644
--- a/samples/echoBot/main.go
+++ b/samples/echoBot/main.go
@@ -1,10 +1,12 @@
package main
import (
+ "context"
"fmt"
"log"
"net/http"
"os"
+ "os/signal"
"time"
"github.com/PaulSonOfLars/gotgbot/v2"
@@ -34,6 +36,8 @@ func main() {
if err != nil {
panic("failed to create new bot: " + err.Error())
}
+ ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt)
+ defer cancel()
// Create updater and dispatcher.
dispatcher := ext.NewDispatcher(&ext.DispatcherOpts{
@@ -45,6 +49,11 @@ func main() {
MaxRoutines: ext.DefaultMaxRoutines,
})
updater := ext.NewUpdater(dispatcher, nil)
+ go func() {
+ <-ctx.Done()
+ log.Println("shutting down...")
+ updater.Stop()
+ }()
// Add echo handler to reply to all text messages.
dispatcher.AddHandler(handlers.NewMessage(message.Text, echo))
@@ -157,18 +157,20 @@ func (u *Updater) StartPolling(b *gotgbot.Bot, opts *PollingOpts) error { | |||
} | |||
} | |||
|
|||
bData, err := u.botMapping.addBot(b, "", "") | |||
ctx, closeFn := context.WithCancel(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach is much better than in #210 - the library should not be concerned about any of the OS signals — it just needs to know one thing: "something wants the updater to stop". And technically it doesn't need the Context, as it's not going to use it for anything else but only to know when it's cancelled.
Handing specific OS signals and termination is a caller's concern, for example, that's how the echoBot could handle the os.Interrupt signal and tell the bot to terminate — works great on this branch:
diff --git a/samples/echoBot/main.go b/samples/echoBot/main.go
index bdffc88..2f390b1 100644
--- a/samples/echoBot/main.go
+++ b/samples/echoBot/main.go
@@ -1,10 +1,12 @@
package main
import (
+ "context"
"fmt"
"log"
"net/http"
"os"
+ "os/signal"
"time"
"github.com/PaulSonOfLars/gotgbot/v2"
@@ -34,6 +36,8 @@ func main() {
if err != nil {
panic("failed to create new bot: " + err.Error())
}
+ ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt)
+ defer cancel()
// Create updater and dispatcher.
dispatcher := ext.NewDispatcher(&ext.DispatcherOpts{
@@ -45,6 +49,11 @@ func main() {
MaxRoutines: ext.DefaultMaxRoutines,
})
updater := ext.NewUpdater(dispatcher, nil)
+ go func() {
+ <-ctx.Done()
+ log.Println("shutting down...")
+ updater.Stop()
+ }()
// Add echo handler to reply to all text messages.
dispatcher.AddHandler(handlers.NewMessage(message.Text, echo))
if err != nil { | ||
if errors.Is(err, context.Canceled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful
@@ -84,7 +84,10 @@ func Test_botData_isUpdateChannelStopped(t *testing.T) { | |||
BotClient: &gotgbot.BaseBotClient{}, | |||
} | |||
|
|||
bData, err := bm.addBot(b, "", "") | |||
ctxCancelled := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suspecting this test might fail if executed with -race
, but the different one failed instead.
Note: this possibly does not impact the bot's behaviour in real environment, just a quirk of how the test is accessing the data.
[9:15:47] 0:_thirdparty/gotgbot/ext (paul/close-polling-ctx)> go test -race ./...
==================
WARNING: DATA RACE
Write at 0x00c000214098 by goroutine 219:
runtime.racewrite()
<autogenerated>:1 +0x1e
github.com/PaulSonOfLars/gotgbot/v2/ext.(*botData).stop()
/Users/rustam/wp/_thirdparty/gotgbot/ext/botmapping.go:231 +0xbd
github.com/PaulSonOfLars/gotgbot/v2/ext.(*Updater).StopAllBots()
/Users/rustam/wp/_thirdparty/gotgbot/ext/updater.go:288 +0xa4
github.com/PaulSonOfLars/gotgbot/v2/ext.(*Updater).Stop()
/Users/rustam/wp/_thirdparty/gotgbot/ext/updater.go:264 +0x10d
github.com/PaulSonOfLars/gotgbot/v2/ext_test.TestUpdaterSupportsTwoPollingBots()
/Users/rustam/wp/_thirdparty/gotgbot/ext/updater_test.go:434 +0xa52
testing.tRunner()
/usr/local/Cellar/go/1.23.4/libexec/src/testing/testing.go:1690 +0x226
testing.(*T).Run.gowrap1()
/usr/local/Cellar/go/1.23.4/libexec/src/testing/testing.go:1743 +0x44
Previous read at 0x00c000214098 by goroutine 222:
runtime.raceread()
<autogenerated>:1 +0x1e
github.com/PaulSonOfLars/gotgbot/v2/ext.(*Updater).pollingLoop()
/Users/rustam/wp/_thirdparty/gotgbot/ext/updater.go:174 +0x8d
github.com/PaulSonOfLars/gotgbot/v2/ext.(*Updater).StartPolling.gowrap2()
/Users/rustam/wp/_thirdparty/gotgbot/ext/updater.go:168 +0x79
<... elided for brevity ...>
--- FAIL: TestUpdaterSupportsTwoPollingBots (0.00s)
testing.go:1399: race detected during execution of test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're unable to get it to fail, so running like this helps:
go test ./... -race -count=10
takes a long time tho to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting - theres definitely something funky going on there, not sure what exactly. Will take a look. Might be a different PR, depending on how long it takes! Thanks for the review and the report <3
go u.pollingLoop(bData, reqOpts, v) | ||
|
||
bData.updateWriterControl.Add(1) | ||
go func() { | ||
// defer, so it gets called even in case of panics. | ||
defer bData.updateWriterControl.Done() | ||
|
||
u.pollingLoop(ctx, bData, reqOpts, v) | ||
}() | ||
|
||
return nil | ||
} | ||
|
||
func (u *Updater) pollingLoop(bData *botData, opts *gotgbot.RequestOpts, v map[string]string) { | ||
bData.updateWriterControl.Add(1) | ||
defer bData.updateWriterControl.Done() | ||
|
||
func (u *Updater) pollingLoop(ctx context.Context, bData *botData, opts *gotgbot.RequestOpts, v map[string]string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rusq This should fix it! An edge case for sure, but I can see this being triggered in live systems which are dynamically adding/removing bots.
TL;DR: Race condition caused by the waitgroup being started in a goroutine, rather than before it.
This meant that when the stars (mis)aligned, calling updater.StartPolling(...)
followed by updater.Stop()
, could result in .Wait()
(from the .Stop()
) before calling .Add(1)
(from the goroutine in .StartPolling
); which then causes the race.
The fix makes sure that the .Add
is done sequentially, thus will always run before the .Wait
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concurrency bugs are always the hardest to debug, great break down!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What
This should speed up updater.StopBot significantly by allow closing the context used for polling.
Impact