You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RuntimeError: Exception encountered when calling Cond.call().
merge_call called while defining a new graph or a tf.function. This can often happen if the function fn passed to strategy.run() contains a nested @tf.function, and the nested @tf.function contains a synchronization point, such as aggregating gradients (e.g, optimizer.apply_gradients), or if the function fn uses a control flow statement which contains a synchronization point in the body. Such behaviors are not yet supported. Instead, please avoid nested tf.functions or control flow statements that may potentially cross a synchronization boundary, for example, wrap the fn passed to strategy.run or the entire strategy.run inside a tf.function or move the control flow out of fn. If you are subclassing a tf.keras.Model, please avoid decorating overridden methods test_step and train_step in tf.function.
One could perhaps rewrite it as an implicit conditional via math manipulations: the code will be executed unconditionally but will be leading to different outcomes depending on whether it is the end of an accumulation round or not.
The text was updated successfully, but these errors were encountered:
IvanUkhov
changed the title
Remove the explicit conditional in the gradient accumulation
Rethink the conditional in the gradient accumulation
Dec 3, 2024
One could perhaps rewrite it as an implicit conditional via math manipulations: the code will be executed unconditionally but will be leading to different outcomes depending on whether it is the end of an accumulation round or not.
Indeed -- I vaguely recall I implemented it in this way at some point. I don't remember why I changed it though.
@fchollet, is it not a little strange that it does not work in a distributed setting given tf.distribute is profusely used in the code related to gradient accumulation? Are there tests for this? Perhaps it does not work only in some specific cases, like when using GPUs? Just trying to make sure are are not jumping on a nonexisting problem.
The following conditional precludes the usage of gradient accumulation under a distributed strategy in TensorFlow:
keras/keras/src/optimizers/base_optimizer.py
Lines 459 to 465 in ab53ed2
The exception is as follows:
This probably has something to do with this one:
keras/keras/src/backend/tensorflow/optimizer.py
Lines 212 to 217 in ab53ed2
One could perhaps rewrite it as an implicit conditional via math manipulations: the code will be executed unconditionally but will be leading to different outcomes depending on whether it is the end of an accumulation round or not.
The text was updated successfully, but these errors were encountered: