You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 1, 2024. It is now read-only.
Hi, I appreciate this package is experimental but I came across the following strange behaviour on a V100 GPU. Above a certain size in the first dimension, the batched jacobian no longer works on the GPU, while continuing to function on the CPU. The batch dimension doesn't appear to make a difference.
MWE
using BatchedRoutines, ForwardDiff, CUDA, Random
f(u) = u .^2
u =rand(50, 128)
BatchedRoutines.batched_jacobian(AutoForwardDiff(), f, u); # CPU, works
u =CuArray(u)
BatchedRoutines.batched_jacobian(AutoForwardDiff(), f, u); # GPU, works
u =rand(51, 128)
BatchedRoutines.batched_jacobian(AutoForwardDiff(), f, u); # CPU, works
u =CuArray(u)
BatchedRoutines.batched_jacobian(AutoForwardDiff(), f, u); # GPU, fails
To address the issue, ForwardDiff max chunksize is 12 but NTuple doesn't specialize till size 12 so that leads to a dynamic dispatch and the CUDA code fails to compile. Lux forces the max chunksize to be 8
Hi, I appreciate this package is experimental but I came across the following strange behaviour on a V100 GPU. Above a certain size in the first dimension, the batched jacobian no longer works on the GPU, while continuing to function on the CPU. The batch dimension doesn't appear to make a difference.
MWE
Stack trace
The text was updated successfully, but these errors were encountered: