Track a dump import task state #529
Replies: 3 comments 5 replies
-
This is something that is very important for the cloud team. We use it as part of the "update Meilisearch version" feature. Right now we have no way to know if a dump import is running or if there has been a failure. Our workaround is to put a timeout in place that just fails the "update" after 24 hours (24 hours is arbitrary but we wanted to have it be long enough to handle very large datasets and use-cases). While we believe this will be acceptable in most cases, the obvious downsides of this approach are:
Ideally we'd have some endpoint we could hit that would tell us if the dump import is still running, completed successfully, or failed. Regarding the solutions suggested above: |
Beta Was this translation helpful? Give feedback.
-
I think we should totally do that!
We could also shut down the instance entirely as we do today. Nothing is impossible 😁
My last trick would be to add a new cli flag, something like Just one more cli option bro. I promise bro just one more cli option and it'll fix everything bro. bro. just one more cli option. please just one more. one more cli option and we can fix this whole problem bro. bro cmon just give me one more cli option i promise bro. bro bro please i just need one more cli option thats I... |
Beta Was this translation helpful? Give feedback.
-
It's me again, doubling down on @irevoire suggestion. I would like your opinion! We could consider introducing some sort of a maintenance mode, activated by default when using
WDYT? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Description
Raised by @nicolasvienot internally.
It is difficult today for the cloud team to track the state of a dump import.
The only solution is to wait for the Meilisearch server to respond to requests, which happens when the dump has been successfully imported since the HTTP server starts only at that moment.
We have been thinking about a solution within the core team.
The basis of the solution would be to make the import of a dump visible within the
/tasks
; this would allow several things./tasks
endpoint and track thedumpImport
task type. GET/tasks?type=dumpImport&status=processing
; If you want to track that particular task, you can store the returneduid
and then emit long pool call on that particular task at a given interval.As you may already know, at the moment, when a dump finishes importing, it shows the history of the tasks that took place in the instance that generated the dump.
Problem
Since the dump import task must have an
uid
, we could end up with an inconsistent history. If Meilisearch assign it the uid0
, we will have to shift all theid
by one increment within the dump history, and the dump import will appear as the very first task.This is a problem if tasks are enqueued during the import of a dump. It will generate uid collisions.
Potential solution
We could parse the contents of the dump and determine what the uid of the last task contained in it was and thus make the dump import task
uid
of a dump appear as equal tolastDumpTaskId + 1
.When importing a dump,
/tasks
would return this view.This way, we would have a view of this type after the import of a dump.
It ensures that there are no collisions between tasks that might be enqueued during the import of a dump and the history of a dump after it is imported.
What do you think about it? Is this a solution that would be viable and could meet your needs?
Questions
Is the current pain point something critical to be solved @meilisearch/cloud-team? We see interesting additions that could come with the proposed solution, but these additions do not answer a direct expressed need.
In case a dump fails, the server will still be on and will process the next updates that have been stacked. Is this a problem as long as, if the dump fails, it's possible to know it, and it may be enough to start over?
My biggest concern is that it obfuscates the dump import success drastically for people who don't track the task and might think that a dump has been successful when this is not necessarily the case.
Is there any other solution that we could think of?
After writing this, I can think of another solution which would be:
Expose subset API endpoint
Only expose
/health
or/stats
to indicate that indexing is in progress and stop the server if it fails.The drawback of this solution is that it's impossible to see when a dump has been imported for an instance in
/tasks
while it's dedicated to that particular thing, tracking asynchronous task completion. Iit's also not possible to start giving work to Meilisearch in advance (Edit: nice-to-have).The advantage is that we can be sure that Meilisearch will not do anything more if the dump import fails.
Pinging @Kerollmops for tracking
Beta Was this translation helpful? Give feedback.
All reactions