Suggested approach for idempotent and consistent eventing #50

ayuksekkaya · 2024-08-18T17:58:49Z

ayuksekkaya
Aug 18, 2024

I have some question on how we can make sure that eventing is consistent and reliable. This is what I observed currently:

In Saastack, domain events are handled by sending the event to a service bus (such as Azure service bus), at which point a function is triggered and the worker is activated. This call eventually reaches DomainEventConsumerService which then notifies all subscribers of the event and the subscribers can react to it.
We are publishing domain events, after the entity has been persisted successfully in the given data store.
We don't seem to be distinguishing between an error on persistence vs an error on publishing events. What I mean by that is that for example in EndUserApplication we call var saved = await _endUserRepository.SaveAsync(unregisteredUser, true, cancellationToken); which in turn calls var saved = await _users.SaveAsync(user, cancellationToken); which in turn calls var published = await this.SaveAndPublishChangesAsync(aggregate, OnEventStreamChanged, (root, changedEvents, token) => _eventStore.AddEventsAsync(_entityName, root.Id.Value, changedEvents, token), cancellationToken);.

So in this flow we can error out anywhere and cancel the operation, but if the error happens after we persist the aggregate and before we can publish the events, say azure service bus is down, then we don't do anything differently it seems? I couldn't see any recovery steps for this because we are returning the error but the error is not treated differently depending on if it came from publish changes vs persistence changes.

My questions:

Is the implementation of an outbox like pattern left for the consumers? If so, what would be a good way of doing this? Something like hangfire? In that case, would hangfire read and then publish to the service bus? What is the suggested approach for it?
in DomainEventConsumerService we have this logic here:

//HACK: We are round-robin distributing these events,
        // but if it fails even once from any consumer, we will enter a retry loop
        // but events that were previously successful, will be replayed again next time around!
        var domainEvent = converted.Value;
        foreach (var consumer in _consumers)
        {
            var result = await consumer.NotifyAsync(domainEvent, cancellationToken);
            if (result.IsFailure)
            {
                return result.Error
                    .Wrap(ErrorCode.Unexpected,
                        Resources.DomainEventConsumerService_ConsumerFailed.Format(consumer.GetType().Name,
                            domainEvent.RootId, changeEvent.Metadata.Fqn));
            }
        }

The comment says that we are doing a retry loop, but I couldn't find where the retry is happening. Is it at the service bus level? And in order to fix the but events that were previously successful, will be replayed again next time around! comment, do we need to handle that individually in each subscriber and make sure that they are idempotent?

I was a bit confused by the eventing docs. I am not super clear on the difference between domain events and integration events in this system since we are relying on an event bus for all events. Is it just about it being within bounded context vs unbounded context?

jezzsantos · 2024-08-18T19:56:14Z

jezzsantos
Aug 18, 2024
Maintainer

We have a closed issue related tot this: #15

0 replies

jezzsantos · 2024-08-18T20:05:28Z

jezzsantos
Aug 18, 2024
Maintainer

Thanks @ayuksekkaya , this is a great question.

A few things to cover first:

Q. We don't seem to be distinguishing between an error on persistence vs an error on publishing events.
A. This maybe true, we need to investigate more in the details.
Q. Is the implementation of an outbox-like pattern left for the consumers?
A. No, definitely not. It is too late in the process for consumers to do anything about it once the message is on the topic. I need to find a way to do the outbox-pattern (CDC or similar) when the event is persisted. What we have at the moment is a first generation implementation. I intent to improve it. Hence Reliability of Projections and Notifications #15
Q. The comment says that we are doing a retry loop, but I couldn't find where the retry is happening. Is it at the service bus level?
A. Yes, Azure Service bus/AWS SNS have retry loops, like queues do, to ensure delivery. They will retry a certain number of times to send event to consumer. Yes, consumer needs to deal with idempotency.
Q. not super clear on the difference between domain events and integration events
A. Yes, it is about bounded contexts. Integration events are intended to reach other systems, different contexts, no code/type sharing, etc.

6 replies

jezzsantos Aug 19, 2024
Maintainer

Yeah, okay, perhaps we need to update the docs, to clarify this.
Situation changed dramatically when I implmented the AsynchronousQueueConsumerRelay.cs in 9c324bb : In this flow.

Originally, we have a synchronous mechanism, using the InProcessSynchronousConsumerRelay, which was a good first gen mechanism. The problem is that with this mechanism, and all subdomains in the modular-monolith, clients to the API's can safely deal with 100% consistency.

However, as we split the subdomains of the modular-monolith (later in the cycle), we introduce eventual consistency (as you discovered).

To set people/clients up correctly, from the start (and decouple the client-API interractions), I think its probably best to have to design clients for eventual consitency form the get go, since the patterns are necessarily different. As opposed to getting years down the track only to discover we need to re-engineer the clients to deal with this. (BTW: We also support responses in command APIs with changed resources contained to aid these transitions, so clients dont need to query immediately after a command to get the changed resources)

Of course, anyone ch=an change this, just swap the IEventNotifier with whatever you want. The InProcessSynchronousConsumerRelay is still int eh codebase for this purpose.

The EventBus is used by default, for both domain_events and integration_events on two different topics. But of course, you can replace the actual IDomainEventConsumerRelay with whatever technology adapter you want.

Now, to running the app. By default, the LocalMachineJsonFileStore is injected by default in desktop development, and in automated testing. It includes a poor-man's implmentation of all the queues and and the message bus, and they are all monitored in a regular cycle every few seconds, to fake the use of real queues and message bus, so we dont need to run those. Which deals with the queues and topics, albeit not too responsively.

If you dont run the TestingApiHost ocally, you would need to call the API which empties the message bus topics: like: POST https://localhost/domain_events/drain

So, thanks very much for wanting to jump in. Good to have more collaborators.
Let me know what you would want to change, and we can plan something to do that.

The outbox pattern solution is going to be very tricky. I havent yet thought it through, but yes, we would need to solve it for both snapshotting and event-sourcing I think. Given we have no dependency on the actual IDataStore1 or IEventStore` used.

What do you suggest? I would love to find a solution to this?

ayuksekkaya Aug 25, 2024
Author

Sorry I have been busy all week with work and just got a chance to look at it. So, it seems like we don't want outbox pattern for Event Sourcing ( I wasn't familiar with event sourcing before looking at your code, but the general consensus seems to be that there are other alternatives for event sourcing also like you mentioned in that comment in the issue). For the snapshotting pattern, I think we can create outbox, inbox tables for each aggregate root in the generic modules, and people who are using this code can continue on that pattern to add these tables when they add more modules. Then if we would need to modify SnapshottingDddCommandStore.cs I believe with transactions so it writes to both the outbox table and the entity. But how do we make sure this works for all databases when it's implemented in the SnapshottingDddCommandStore.cs ? Maybe it should be left to the individual sql implementation, so maybe first we would change in the AzureSqlServerStore.cs? I am not sure there.

And for reading the outbox from the database and publishing, would that be handled by a background job? Can Quartz help there for example? I also need to look at it more I think.

Were you planning on adding .sql scripts for migrations and creating the initial databases? I guess the outbox tables will need to be created there as well

jezzsantos Aug 26, 2024
Maintainer

Hey @ayuksekkaya
I am on vacation at the moment for a few weeks, slow to respond and no laptop to speed things up! Ill try to respond in a reasonable way soon.

jezzsantos Aug 26, 2024
Maintainer

So, a couple of things.
Firstly, all aggregates (snapshotting or event-sourced) both generate "change events" that can be sent to an outbox. Regardless of whether they are event-sourced or snapshotted.

Secondly, a relational database is a popular choice for an IEventStore (i.e. as an adapter, e.g SqlServerStore) instead of using a dedicated IEventStore such as EventStoreEventStore.

Now, we have both the "change events" (from aggregate) and we have data (domain_events from event-sourced aggregate or records from snapshot aggregate) going to the same database.

Third piece is a transaction mechanism, which we are yet to define, where we can wrap both tasks in the storage layer (i.e. in IEventSourcedCommand<T> ) in a single transaction.

Once we have that we can design a generalized outbox pattern, but only for the cases where the programer chooses a database as an eventstore, or where they use snapshotting. Other combos, need a different pattern. So this is a transient ephemeral pattern based on capabilities of the actual adapters used. We may have to ask the adapters to tell us what it supports, to make this work.
No such thing as distributed patterns so if the programmer decided to use EventStore for their event-sourced aggregates we need to explore what EventStore supports for the message broking piece. Or if they decide to use a No-Sql database that wont support transactions, we have to forego the outbox pattern.

Lastly, we need a new mechanism to fetch the "change events" from the outbox and forward them to the message broker. This part would be very different than what we have now, and we have to make it resilient like a queue. Which is hard to do properly.

Beyond outbox, we need to look at CDC and other approaches, which I need to learn.

So, we could say:

We call the current pattern we have 'unreliable', and use it by default.
If the aggregate is event-sourced AND the configured IDataStore.Capabilities.Outbox == true && IDataStore.Capabilities.Transactional == true then we replace the 'unreliable' pattern with the 'reliable-outbox' pattern.
Another combination might be if aggregate is event-sourced and IEventStore.Capabilities.Outbox == true then we do that pattern.
Another might be if aggregate is snapshot and IDataStore.Capabilities.Transactional == true we use outbox pattern also.
There could be other supported combos, and patterns to think about.

jezzsantos Sep 7, 2024
Maintainer

Hi @ayuksekkaya

I am now back from vacation and keen to take the next steps on this.

What are your thoughts about the idea above?
Shall we define it more thoroughly?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggested approach for idempotent and consistent eventing #50

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Suggested approach for idempotent and consistent eventing #50

ayuksekkaya Aug 18, 2024

Replies: 2 comments · 6 replies

jezzsantos Aug 18, 2024 Maintainer

jezzsantos Aug 18, 2024 Maintainer

jezzsantos Aug 19, 2024 Maintainer

ayuksekkaya Aug 25, 2024 Author

jezzsantos Aug 26, 2024 Maintainer

jezzsantos Aug 26, 2024 Maintainer

jezzsantos Sep 7, 2024 Maintainer

ayuksekkaya
Aug 18, 2024

Replies: 2 comments 6 replies

jezzsantos
Aug 18, 2024
Maintainer

jezzsantos
Aug 18, 2024
Maintainer

jezzsantos Aug 19, 2024
Maintainer

ayuksekkaya Aug 25, 2024
Author

jezzsantos Aug 26, 2024
Maintainer

jezzsantos Aug 26, 2024
Maintainer

jezzsantos Sep 7, 2024
Maintainer