Track a dump import task state #529

gmourier · 2022-09-08T10:56:59Z

gmourier
Sep 8, 2022
Maintainer

Description

Raised by @nicolasvienot internally.

It is difficult today for the cloud team to track the state of a dump import.

The only solution is to wait for the Meilisearch server to respond to requests, which happens when the dump has been successfully imported since the HTTP server starts only at that moment.

We have been thinking about a solution within the core team.

The basis of the solution would be to make the import of a dump visible within the /tasks; this would allow several things.

It is easier to track its progress, fetch the /tasks endpoint and track the dumpImport task type. GET /tasks?type=dumpImport&status=processing; If you want to track that particular task, you can store the returned uid and then emit long pool call on that particular task at a given interval.
The http server is launched from the beginning, and it is possible to start working with Meilisearch by giving it tasks to be processed later.

As you may already know, at the moment, when a dump finishes importing, it shows the history of the tasks that took place in the instance that generated the dump.

Problem

Since the dump import task must have an uid, we could end up with an inconsistent history. If Meilisearch assign it the uid 0, we will have to shift all the id by one increment within the dump history, and the dump import will appear as the very first task.

2: documentAdditionOrUpdate (was the task 1 inside the dump): Succeeded
1: indexCreation (was the task 0 inside the dump): Succeeded
0: dumpImport: Succeeded

This is a problem if tasks are enqueued during the import of a dump. It will generate uid collisions.

Potential solution

We could parse the contents of the dump and determine what the uid of the last task contained in it was and thus make the dump import task uid of a dump appear as equal to lastDumpTaskId + 1.

When importing a dump, /tasks would return this view.

5000: dumpImport: processing

This way, we would have a view of this type after the import of a dump.

5000: dumpImport: succeeded
...
1: documentAdditionorUpdate: succeeded
0: indexCreation: succeeded

It ensures that there are no collisions between tasks that might be enqueued during the import of a dump and the history of a dump after it is imported.

What do you think about it? Is this a solution that would be viable and could meet your needs?

Questions

Is the current pain point something critical to be solved @meilisearch/cloud-team? We see interesting additions that could come with the proposed solution, but these additions do not answer a direct expressed need.

In case a dump fails, the server will still be on and will process the next updates that have been stacked. Is this a problem as long as, if the dump fails, it's possible to know it, and it may be enough to start over?

My biggest concern is that it obfuscates the dump import success drastically for people who don't track the task and might think that a dump has been successful when this is not necessarily the case.

Is there any other solution that we could think of?

After writing this, I can think of another solution which would be:

Expose subset API endpoint

Only expose /health or /stats to indicate that indexing is in progress and stop the server if it fails.

The drawback of this solution is that it's impossible to see when a dump has been imported for an instance in /tasks while it's dedicated to that particular thing, tracking asynchronous task completion. Iit's also not possible to start giving work to Meilisearch in advance (Edit: nice-to-have).

The advantage is that we can be sure that Meilisearch will not do anything more if the dump import fails.

Pinging @Kerollmops for tracking

davelarkan · 2022-09-08T11:39:46Z

davelarkan
Sep 8, 2022

Is the current pain point something critical to be solved @meilisearch/cloud-team?

This is something that is very important for the cloud team. We use it as part of the "update Meilisearch version" feature. Right now we have no way to know if a dump import is running or if there has been a failure. Our workaround is to put a timeout in place that just fails the "update" after 24 hours (24 hours is arbitrary but we wanted to have it be long enough to handle very large datasets and use-cases).

While we believe this will be acceptable in most cases, the obvious downsides of this approach are:

If an import fails after 10 minutes - we may not be able to inform the user of this for another 23 hours and 50 minutes (depending on the type of failure)
If a dump import task is still running at 24 hours we will kill it even if it is close to completion.

Ideally we'd have some endpoint we could hit that would tell us if the dump import is still running, completed successfully, or failed.

Regarding the solutions suggested above:
I like the idea of being able to queue tasks once while the import is running. Right now the cloud team will only respond to search requests while an update is in progress. I would see this ability as a "nice-to-have" rather than mission critical. If there were just a /stats or /health endpoint we could hit to get the status of a dump import that would solve our core issue.

1 reply

gmourier Sep 8, 2022
Maintainer Author

Thanks @davelarkan !

irevoire · 2022-09-08T12:01:43Z

irevoire
Sep 8, 2022
Collaborator

What do you think about it? Is this a solution that would be viable and could meet your needs?

I think we should totally do that!

In case a dump fails, the server will still be on and will process the next updates that have been stacked. Is this a problem as long as, if the dump fails, it's possible to know it, and it may be enough to start over?

We could also shut down the instance entirely as we do today. Nothing is impossible 😁
But then the issue is that the tasks we accepted in the meantime are going to be lost 😔

My biggest concern is that it obfuscates the dump import success drastically for people who don't track the task and might think that a dump has been successful when this is not necessarily the case.

My last trick would be to add a new cli flag, something like --import-dump-async path reserved for the experienced users (like @davelarkan).
These people could work their way out of the previous issue by sending the tasks to both the old and new instances at the same time. And if the dump import fails, then they simply shutdown the new instance and return the error that happened.

Just one more cli option bro. I promise bro just one more cli option and it'll fix everything bro. bro. just one more cli option. please just one more. one more cli option and we can fix this whole problem bro. bro cmon just give me one more cli option i promise bro. bro bro please i just need one more cli option thats I...

0 replies

gmourier · 2022-09-08T14:54:34Z

gmourier
Sep 8, 2022
Maintainer Author

It's me again, doubling down on @irevoire suggestion.

I would like your opinion!

We could consider introducing some sort of a maintenance mode, activated by default when using --import-dump that could be disabled this way --wait-for-completion false / --maintenance-mode false. (these are just naming suggestions)

If the mode is enabled, it prevents access to other routes than /tasks (maybe also /health) until the import is successful, e.g. (503 HTTP code). If the import is successful, the mode is automatically disabled. If not, we could force it to shut down Meilisearch as of today. This behavior ensures that the import has been successful before doing anything else. It could be useful for beginners that don't track the dumpImport task. We ensure that the system accompanies them to the desired state if the dumpImport task fails.
If this mode is not enabled, a user can start queuing tasks to Meilisearch, which will be processed later, knowing that the dump may fail. This gives more room to experienced users in case the import is successful.

WDYT?

Thanks!

4 replies

curquiza Sep 8, 2022
Maintainer

Why would users remove this mode when importing a dump? Do you know some specific use cases?

gmourier Sep 8, 2022
Maintainer Author

I'm not 100% sure of myself in explaining a use-case, @curquiza . That's a good question.

Initially, I thought it would permit sending replicated write requests on the new meilisearch instance while continuing to serve a production instance. If the dump is successful, there is no out-of-sync state, and a reverse proxy could switch the instance to serve client search requests from the new instance.

But it's not enough; there will still be a desynchronization between the moment a dump is created on the first instance until it starts to be imported on the second Meilisearch instance. Maybe during this desync window, a cloned queue could act as a buffer to be dequeued by the newer instance as soon as the dump import starts.

So, I would say that it is just for experienced users; they can go faster and not have to wait for the import of the dump to be finished to start sending the write payload to Meilisearch.

curquiza Sep 8, 2022
Maintainer

So, I would say that it is just for experienced users; they can go faster and not have to wait for the import of the dump to be finished to start sending the write payload to Meilisearch.

Oh yes I've better read the second point. Indeed, if Meilisearch can accept tasks when importing a dump (which is not the case currently if I'm correct), it's understandable.

I was asking because I know we want to avoid adding settings for nothing, since too many settings can impact the user experience in-fine.

gmourier Sep 8, 2022
Maintainer Author

Indeed, I agree with you, we must be absolutely careful about that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meilisearch

Track a dump import task state #529

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Meilisearch

Track a dump import task state #529

gmourier Sep 8, 2022 Maintainer

Description

Problem

Potential solution

Questions

Expose subset API endpoint

Replies: 3 comments · 5 replies

davelarkan Sep 8, 2022

gmourier Sep 8, 2022 Maintainer Author

irevoire Sep 8, 2022 Collaborator

gmourier Sep 8, 2022 Maintainer Author

curquiza Sep 8, 2022 Maintainer

gmourier Sep 8, 2022 Maintainer Author

curquiza Sep 8, 2022 Maintainer

gmourier Sep 8, 2022 Maintainer Author

gmourier
Sep 8, 2022
Maintainer

Replies: 3 comments 5 replies

davelarkan
Sep 8, 2022

gmourier Sep 8, 2022
Maintainer Author

irevoire
Sep 8, 2022
Collaborator

gmourier
Sep 8, 2022
Maintainer Author

curquiza Sep 8, 2022
Maintainer

gmourier Sep 8, 2022
Maintainer Author

curquiza Sep 8, 2022
Maintainer

gmourier Sep 8, 2022
Maintainer Author