-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU compatibility #142
Comments
I think this would be interesting. As CUDA dropped support for macOS, that stymies my involvement. (I guess I would by default be in favour of a different language like OpenCL or Metal.) The C library uses real-to-real ffts, which are not supported in CuFFT (nor MKL for that matter!) (https://forums.developer.nvidia.com/t/newbie-to-cufft-how-to-do-real-to-real-transforms/69952), but those are simply graceful for the programmer and workarounds could be found. I also think that GPU computations are best performed synchronously with the same amount of work across all threads, which is not always the case here. FYI, SHTns is supposed to work on the GPU https://bitbucket.org/nschaeff/shtns/src/master/ |
Thanks for the quick response. I think the argument for GPU computations here is also not only the speed up itself (which seems to be there, when I look at the SHTns benchmark, thank you for the link), but also when the transform is used in high-dimensional PDEs or ML models that can profit from GPUs massively. The overhead of transferring memory back and forward between the two memories would probably be quite costly. CUDA seems to be the best supported GPU API for Julia that was why I was assuming it. (I am also developing on macOS normally but luckily I have access to a HPC with some nVidia cards). I'll definitely will keep an eye open for it. |
so if there is a pure julia implementation, that could easily be put on a GPU to get big gains for e.g. fft |
I wonder if it would be realistic and/or a goal to make the library GPU compatible. With that I mean only the crucial part of applying a pre-computed plan to a CUDA / CU Array.
This is probably a bit tricky in the c library. While the FFTW parts could probably be bound to the appropiate CUDA implementations (there is cuFFT), it would need adjustments for the other plans. Personally I have no experience with CUDA in C, but in Julia a bit and looked at the old pure version of the SH plans and it seemed at least plausible that this would be doable there, but maybe I also overlooked something.
The text was updated successfully, but these errors were encountered: