Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updater.StopBot should close polling context #211

Merged
merged 4 commits into from
Dec 22, 2024

Conversation

PaulSonOfLars
Copy link
Owner

What

This should speed up updater.StopBot significantly by allow closing the context used for polling.

Impact

  • Are your changes backwards compatible? Y
  • Have you included documentation, or updated existing documentation? Y
  • Do errors and log messages provide enough context? Y

@@ -157,18 +157,20 @@ func (u *Updater) StartPolling(b *gotgbot.Bot, opts *PollingOpts) error {
}
}

bData, err := u.botMapping.addBot(b, "", "")
ctx, closeFn := context.WithCancel(context.Background())
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could reuse the opts logic to pass in an updater from #210 here - might be a good way to handle signals?

Though the signals approach would only close the updater, not the dispatcher; so might be best to handle closure explitly?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach is much better than in #210 - the library should not be concerned about any of the OS signals — it just needs to know one thing: "something wants the updater to stop". And technically it doesn't need the Context, as it's not going to use it for anything else but only to know when it's cancelled.

Handing specific OS signals and termination is a caller's concern, for example, that's how the echoBot could handle the os.Interrupt signal and tell the bot to terminate — works great on this branch:

diff --git a/samples/echoBot/main.go b/samples/echoBot/main.go
index bdffc88..2f390b1 100644
--- a/samples/echoBot/main.go
+++ b/samples/echoBot/main.go
@@ -1,10 +1,12 @@
 package main
 
 import (
+	"context"
 	"fmt"
 	"log"
 	"net/http"
 	"os"
+	"os/signal"
 	"time"
 
 	"github.com/PaulSonOfLars/gotgbot/v2"
@@ -34,6 +36,8 @@ func main() {
 	if err != nil {
 		panic("failed to create new bot: " + err.Error())
 	}
+	ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt)
+	defer cancel()
 
 	// Create updater and dispatcher.
 	dispatcher := ext.NewDispatcher(&ext.DispatcherOpts{
@@ -45,6 +49,11 @@ func main() {
 		MaxRoutines: ext.DefaultMaxRoutines,
 	})
 	updater := ext.NewUpdater(dispatcher, nil)
+	go func() {
+		<-ctx.Done()
+		log.Println("shutting down...")
+		updater.Stop()
+	}()
 
 	// Add echo handler to reply to all text messages.
 	dispatcher.AddHandler(handlers.NewMessage(message.Text, echo))

@@ -157,18 +157,20 @@ func (u *Updater) StartPolling(b *gotgbot.Bot, opts *PollingOpts) error {
}
}

bData, err := u.botMapping.addBot(b, "", "")
ctx, closeFn := context.WithCancel(context.Background())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach is much better than in #210 - the library should not be concerned about any of the OS signals — it just needs to know one thing: "something wants the updater to stop". And technically it doesn't need the Context, as it's not going to use it for anything else but only to know when it's cancelled.

Handing specific OS signals and termination is a caller's concern, for example, that's how the echoBot could handle the os.Interrupt signal and tell the bot to terminate — works great on this branch:

diff --git a/samples/echoBot/main.go b/samples/echoBot/main.go
index bdffc88..2f390b1 100644
--- a/samples/echoBot/main.go
+++ b/samples/echoBot/main.go
@@ -1,10 +1,12 @@
 package main
 
 import (
+	"context"
 	"fmt"
 	"log"
 	"net/http"
 	"os"
+	"os/signal"
 	"time"
 
 	"github.com/PaulSonOfLars/gotgbot/v2"
@@ -34,6 +36,8 @@ func main() {
 	if err != nil {
 		panic("failed to create new bot: " + err.Error())
 	}
+	ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt)
+	defer cancel()
 
 	// Create updater and dispatcher.
 	dispatcher := ext.NewDispatcher(&ext.DispatcherOpts{
@@ -45,6 +49,11 @@ func main() {
 		MaxRoutines: ext.DefaultMaxRoutines,
 	})
 	updater := ext.NewUpdater(dispatcher, nil)
+	go func() {
+		<-ctx.Done()
+		log.Println("shutting down...")
+		updater.Stop()
+	}()
 
 	// Add echo handler to reply to all text messages.
 	dispatcher.AddHandler(handlers.NewMessage(message.Text, echo))

if err != nil {
if errors.Is(err, context.Canceled) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful

ext/updater_test.go Outdated Show resolved Hide resolved
@@ -84,7 +84,10 @@ func Test_botData_isUpdateChannelStopped(t *testing.T) {
BotClient: &gotgbot.BaseBotClient{},
}

bData, err := bm.addBot(b, "", "")
ctxCancelled := false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suspecting this test might fail if executed with -race, but the different one failed instead.

Note: this possibly does not impact the bot's behaviour in real environment, just a quirk of how the test is accessing the data.

[9:15:47] 0:_thirdparty/gotgbot/ext (paul/close-polling-ctx)> go test -race ./...
==================
WARNING: DATA RACE
Write at 0x00c000214098 by goroutine 219:
  runtime.racewrite()
      <autogenerated>:1 +0x1e
  github.com/PaulSonOfLars/gotgbot/v2/ext.(*botData).stop()
      /Users/rustam/wp/_thirdparty/gotgbot/ext/botmapping.go:231 +0xbd
  github.com/PaulSonOfLars/gotgbot/v2/ext.(*Updater).StopAllBots()
      /Users/rustam/wp/_thirdparty/gotgbot/ext/updater.go:288 +0xa4
  github.com/PaulSonOfLars/gotgbot/v2/ext.(*Updater).Stop()
      /Users/rustam/wp/_thirdparty/gotgbot/ext/updater.go:264 +0x10d
  github.com/PaulSonOfLars/gotgbot/v2/ext_test.TestUpdaterSupportsTwoPollingBots()
      /Users/rustam/wp/_thirdparty/gotgbot/ext/updater_test.go:434 +0xa52
  testing.tRunner()
      /usr/local/Cellar/go/1.23.4/libexec/src/testing/testing.go:1690 +0x226
  testing.(*T).Run.gowrap1()
      /usr/local/Cellar/go/1.23.4/libexec/src/testing/testing.go:1743 +0x44

Previous read at 0x00c000214098 by goroutine 222:
  runtime.raceread()
      <autogenerated>:1 +0x1e
  github.com/PaulSonOfLars/gotgbot/v2/ext.(*Updater).pollingLoop()
      /Users/rustam/wp/_thirdparty/gotgbot/ext/updater.go:174 +0x8d
  github.com/PaulSonOfLars/gotgbot/v2/ext.(*Updater).StartPolling.gowrap2()
      /Users/rustam/wp/_thirdparty/gotgbot/ext/updater.go:168 +0x79
<... elided for brevity ...>
--- FAIL: TestUpdaterSupportsTwoPollingBots (0.00s)
    testing.go:1399: race detected during execution of test

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're unable to get it to fail, so running like this helps:

go test ./... -race -count=10

takes a long time tho to run.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - theres definitely something funky going on there, not sure what exactly. Will take a look. Might be a different PR, depending on how long it takes! Thanks for the review and the report <3

Comment on lines -166 to +180
go u.pollingLoop(bData, reqOpts, v)

bData.updateWriterControl.Add(1)
go func() {
// defer, so it gets called even in case of panics.
defer bData.updateWriterControl.Done()

u.pollingLoop(ctx, bData, reqOpts, v)
}()

return nil
}

func (u *Updater) pollingLoop(bData *botData, opts *gotgbot.RequestOpts, v map[string]string) {
bData.updateWriterControl.Add(1)
defer bData.updateWriterControl.Done()

func (u *Updater) pollingLoop(ctx context.Context, bData *botData, opts *gotgbot.RequestOpts, v map[string]string) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rusq This should fix it! An edge case for sure, but I can see this being triggered in live systems which are dynamically adding/removing bots.

TL;DR: Race condition caused by the waitgroup being started in a goroutine, rather than before it.

This meant that when the stars (mis)aligned, calling updater.StartPolling(...) followed by updater.Stop(), could result in .Wait() (from the .Stop()) before calling .Add(1) (from the goroutine in .StartPolling); which then causes the race.

The fix makes sure that the .Add is done sequentially, thus will always run before the .Wait.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrency bugs are always the hardest to debug, great break down!

Copy link

@rusq rusq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@PaulSonOfLars PaulSonOfLars merged commit 3383b22 into v2 Dec 22, 2024
3 checks passed
@PaulSonOfLars PaulSonOfLars deleted the paul/close-polling-ctx branch December 22, 2024 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants