-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
context.df.Task.all throws as soon as one task fails so one can't get the result of other tasks #477
Comments
@GP4K: I noticed that reproducer has what looks like an explicit storage account connection string. You may want to remove it if so. Looking at this issue shortly |
Ah, you've stumbled upon a bit of a low level detail in the implementation. Let me try to clarify what's going on and how to achieve the result you want. When you schedule tasks with That's pretty much what's happening here. Your orchestrator is no longer waiting for other results. It should be noted that this is the intended behavior. Put simply Given today's API, to obtain the result of your To summarize: you can use Please let me know if that clarifies your question. Thanks! FYI - @hossam-nasr. This is an interesting case. Perhaps something we can facilitate in future releases. |
That's the default connection string to Azurite so it's ok. Thanks for the explanation. So if I translate it to code, that should be it, right? try {
yield context.df.Task.all(tasks);
} catch (err) {
// One task failed
}
for (let task of tasks) {
try { // Could yield task throw? Do I need to use try/catch?
yield task;
// Do something with task.result
} catch (err) {
}
} Or should I filter the tasks that haven't completed yet and call Task.all recursively like: const pendingTasks = task.filter((task) => !task.isCompleted && !task.isFaulted);
if (pendingTasks.length === 0) return;
try {
yield context.df.Task.all(pendingTasks);
} catch (err) {
// check again if there are pending tasks
} |
I recommend option 1 as it guarantees that, at the end of the loop, all your tasks have completed. With your second snippet, you have to check again and again on different subsets of "pendingTasks", as you noted. |
I see. But regarding performances, it would be roughly the same since the tasks have been initially scheduled by the orchestrator, right? |
Yes, that's correct. Only the first |
Thank you very much for your help. It would be really nice if this was documented because there are other issues that are confusing. Ex: #66 (comment) |
It appears like |
Hi @davidmrdavid, I have a similar scenario where I want to schedule multiple parallel activity functions where some can fail or throw an error, and thats is fine, as I will only use the results of the ones that succeeded. The issue I'm noticing with the proposed solution to @GP4cK's code is that some activity functions that eventually succeed are being called multiple times. I was assuming the later loop that iterates over the tasks simply continues or fetches the result, but it seems that when the replay happens on the orchestrator function, some activity functions are ran again. I notice this by putting breakpoints and logs inside the activity functions. I can provide sample code if needed. Is this expected behavior? Is there anything I can do to prevent re-calling the activity functions? I.e., only run each activity functions once, regardless of its result. Thank you |
Hi @danieltskv, That is definitely not expected behavior. Do you think you could open a new issue, link this ticket in it, and provide some reproducer code? That would help us get to the bottom of this. Thanks! |
@davidmrdavid |
+1 for the |
@davidmrdavid, I have attempted to apply solution 1 (from @GP4cK) in Python. It may work for Node.js, but doesn't seem to work for Python. Here is the code I'm running and I can move this to a new issue in https://github.com/Azure/azure-functions-durable-python/issues if requested. orchestrator def orchestrator_function(context: df.DurableOrchestrationContext):
tasks = [
context.call_activity('Hello1'),
context.call_activity('Hello1'),
context.call_activity('Hello1')
]
try:
allResults = yield context.task_all(tasks)
except Exception as e:
# one of the tasks threw an exception so the orchestrator is no longer waiting for the tasks to all complete
# Need to re-attach the orchestrator to the tasks to wait for them
# https://github.com/Azure/azure-functions-durable-js/issues/477#issuecomment-1428850454
for task in tasks:
try:
result = yield task
except Exception as e:
pass
return 'done'
main = df.Orchestrator.create(orchestrator_function) Hello1 activity function def main(name: str) -> str:
raise Exception('test error') After running the orchestrator function, I receive the following response
Pretty sure this relates to this comment of yours. This issue may resolve this my issue as well? Looks like you had said it would take a couple of weeks awhile ago and it hasn't been resolved, is there a timeline for this? Is there a workaround for this currently? I want to have the results for all successful tasks even if some of the tasks raised an exception. The only way I can think of currently is to run some non-orchestrator code that externally calls the activity function for each task and track each task individually. This way the exception handling will be for each call, not |
Hi @IDrumsey: Yes it would be great if you could re-open that request in the Python SDK repo (https://github.com/Azure/azure-functions-durable-python/issues ) and attach a |
Describe the bug
context.df.Task.all
throws as soon as one task fails so one can't get the result of other tasks. I've made a minimum reproducible repo here: https://github.com/GP4cK/azure-durable-task-all-bug.Investigative information
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I'm expecting to be able to loop over the tasks and that for each task,
task.isCompleted || task.isFaulted
should be true andtask.result
should not be undefined (if the Activity/SubOrchestration is returning something).Actual behavior
When the first task fails, the second task is neither completed or faulted and its result is undefined.
Known workarounds
I guess the workaround is to not use the fan-out fan-in pattern and use the chaining pattern instead.
The text was updated successfully, but these errors were encountered: