-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
context.df.Task.any
launches tasks even if the task was already launched and sometimes return result from a different Task
#574
Comments
Hey, @castrodd , could you please take a look? Sorry for pinging u, but see some recent activity from you on this repository. Thanks. |
Or maybe @davidmrdavid , you could take a look at this one? p.s. sorry for pinging u :) Thanks |
@Ayzrian I will take a look soon and get back to you. Thank you for your patience! |
Hey, @castrodd I wonder if you have any updates for it? |
Hey team, It has been more than a month already, do we have any updates for this issue? Best Regards |
Hi @Ayzrian - apologies for the delay here, it's been busy. I'll coordinate internally with @castrodd to help debug this one. There's a lot to grok here but I think I recognize, from the Python SDK, the first behavior you mentioned, that a task inside a This bug is in part a technical limitation of the legacy protocol the JS SDK is using to communicate with the C# Durable Functions code - the inter-process/out-of-process protocol does not have a notion of "Task ID", meaning the C# Durable Functions code is not able to filter out repeated sub-tasks in a TaskAny and instead believes all sub-tasks are brand new Task scheduling requests. The latest out-of-process protocol handles this edge-case better, but that will take substantial effort to incorporate so that is not a short term fix. One way to fix this is to filter out, on the SDK side, any Task.Any sub-tasks that have already been scheduled from being sent over the protocol (as otherwise they'll be re-scheduled) but that creates other edge cases. Still, from our experience in the Python SDK, I've come to that performing this SDK-side filtering is a better trade-off, and that the new edge cases it introduces are less likely to be encountered by users. I'll work with @castrodd to develop that fix. All that said, I'm don't quite understand your second bug report:
Can you break this down further? A small example with concrete inputs and outputs would help me understand. Unfortunately, I can't quite download the attached pictures and zooming in with the browser makes them distorted, so I can't review the logs you shared. Thanks! |
Describe the bug
First I would like to explain the idea I am trying to achieve with Durable Functions. I have two activities:
activity-get-next-products-batch
- resolves a batch of products from data source and places into a blob storage fileactivity-process-products
- processes a batch of products produced by previous activity. In real world this can run really longI want to achieve a result where each batch processing starts as soon as possible, but at the same time we keep producing next batch.
So when we resolve the batch, we do two actions
activity-get-next-products-batch
activity-process-products
for processing of the batch, this is pushed in the array of processing tasks to be awaited.Then we do
Task.any
to wait for any of the jobs to finish (keep in mind that there can be multipleactivity-process-products
running ideally).In this set-up
Task.any
can return two tasks:activity-process-products
- then we filter finished job from the list of tasksactivity-get-next-products-batch
- then we schedule both jobs again, and keep waiting for previuos unfinished jobs with new call toTask.any
My expectations that if a job is already launched it won't be launched again we will keep waiting for it to finish, but what happnes in reallity that
activity-get-next-products-batch
invocation runs longer thanactivity-process-products
, and when the code just want to wait for the batch to arrive the runtime schedules another job, rather than waiting for already scheduled. See gant chart obtained from Durable Functions Monitor.And from the logs here we starting the job as
TaskEventId 2
it runs longer that another job started for processing of products.Then when processing is finished we just wait for the next batch to arrive, but the runtime schedules the task again, rather than waiting for already launched task... See the
input
is the same, butTaskEventId
is3
this time. The code will be attached below.Another issue that I see that when in the code that when both tasks for
any
finish around the same time, theresults
are being returned from a wrong Task, though the code checks that the Task being returned===
the task that was scheduled to gext batch.Here on the screenshot below you can see that both tasks finished but have different output. Then we see logs that
Get Products Task Finished
, this code isBut what happens is that the
Task.any
actually returned result from a complete different task.Investigative information
If deployed to Azure App Service
To Reproduce
Steps to reproduce the behavior:
The code of orchestrator.
Expected behavior
a) The
Task.any
doesn't schedule a new task for already launched task.b) The
Task.any
return the valid result for the task rather than a result from different task.Actual behavior
a) The
Task.any
schedule a new task for already launched task.b) The
Task.any
return the invalid result for the taskScreenshots
See above in the bug description
Known workarounds
MAY BE this could be workaround-ed if I treat each activity as a sub-orchestrator in "singleton" mode, where each of them has an
id
, and than parent somehow will do the similar logic with different subochestrators...The text was updated successfully, but these errors were encountered: