You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using the LossScaledOptimizer fails for MirroredStrategy with the following exception:
Exception has occurred: RuntimeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
in user code:
File "/usr/local/lib/python3.10/dist-packages/keras/src/backend/tensorflow/trainer.py", line 105, in one_step_on_data **
return self.train_step(data)
File "/usr/local/lib/python3.10/dist-packages/keras/src/backend/tensorflow/trainer.py", line 72, in train_step
self.optimizer.apply_gradients(zip(gradients, trainable_weights))
File "/usr/local/lib/python3.10/dist-packages/keras/src/optimizers/base_optimizer.py", line 206, in apply_gradients
self.apply(grads, trainable_variables)
File "/usr/local/lib/python3.10/dist-packages/keras/src/optimizers/loss_scale_optimizer.py", line 183, in apply
ops.cond(finite, handle_finite_grads, handle_non_finite_grads)
File "/usr/local/lib/python3.10/dist-packages/keras/src/ops/core.py", line 594, in cond
return Cond()(pred, true_fn, false_fn)
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 123, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.10/dist-packages/keras/src/backend/tensorflow/optimizer.py", line 82, in _internal_apply_gradients
tf.__internal__.distribute.interim.maybe_merge_call(
RuntimeError: Exception encountered when calling Cond.call().
�[1m`merge_call` called while defining a new graph or a tf.function. This can often happen if the function `fn` passed to `strategy.run()` contains a nested `@tf.function`, and the nested `@tf.function` contains a synchronization point, such as aggregating gradients (e.g, optimizer.apply_gradients), or if the function `fn` uses a control flow statement which contains a synchronization point in the body. Such behaviors are not yet supported. Instead, please avoid nested `tf.function`s or control flow statements that may potentially cross a synchronization boundary, for example, wrap the `fn` passed to `strategy.run` or the entire `strategy.run` inside a `tf.function` or move the control flow out of `fn`. If you are subclassing a `tf.keras.Model`, please avoid decorating overridden methods `test_step` and `train_step` in `tf.function`.�[0m
hods `test_step` and `train_step` in `tf.function`.
The reason for the exception is the following tf.cond() call:
For me, gradient accumulation does not work with Adam, with or without use_ema, and results in the same error. It feels like it applies to the base optimizer in general. Opened #20582.
Hi,
Using the
LossScaledOptimizer
fails forMirroredStrategy
with the following exception:The reason for the exception is the following
tf.cond()
call:keras/keras/optimizers/loss_scale_optimizer.py
Line 183 in cb65582
To reproduce change the following line:
keras/integration_tests/tf_distribute_training_test.py
Line 44 in cb65582
to
Alternatively, you can turn on the GPU and used mixed precision which then automatically uses the optimizer.
The text was updated successfully, but these errors were encountered: