Resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI
- Scientific Programming and Computer Architecture book by Divakar Viswanath
- https://github.com/divakarvi/bk-spca : source associated to the book
- Definition of latency oriented architecture
- Seven Dwarfs of HPC
- CSCI-5576/4576: High Performance Scientific Computing
- Mark Horowitz talk at ISSCC_2014: Computing's energy problem
- Introduction to High-Performance Scientific Computing, book and slides by Victor Eijkhout and The Art of HPC website
- San Diego Summer institute
- Finnish CSC summer school
- Computational Physics book by K. N. Anagnostopoulos
- Modern computer architecture slides, see e.g. slides Intro_Architecture.pdf
- Structure and Interpretation of Computer Programs
- NVIDIA's latest CUDA programming guide
- Julich training on CUDA
- Oxford training on CUDA
- Swiss CSCS summer school
- Amina Guermouche (Telecom Paris)
- EPCC, Univ Edinburgh, GPU training
- ARCHER GPU course
- Univ Luxembourg HPC
- SC19 Introduction to GPU programming with CUDA
- https://codingbyexample.com/category/cuda/
- http://turing.une.edu.au/~cosc330/lectures/display_notes.php?lecture=18
- https://www.nersc.gov/users/training/gpus-for-science/
- https://dl.acm.org/citation.cfm?id=3318192
- git@bitbucket.org:hwuligans/gputeachingkit-labs.git
- http://syllabus.gputeachingkit.com/
- udemy/cuda-programming-masterclass
- SDL2 Graphics User Interface : https://github.com/rogerallen/smandelbrotr
- mgbench : a multi-GPU benchmark
- performance analysis : parallelforall blog on Nsight
- misc : convert CUDA to portable C++ for AMD GPU
- List of Nvidia GPUs
- https://github.com/ashokyannam/GPU_Acceleration_Using_CUDA_C_CPP
- https://github.com/karlrupp/cpu-gpu-mic-comparison
- https://perso.centrale-marseille.fr/~gchiavassa/visible/HPC/01%20-%20GR%20%20Intro%20to%20GPU%20programming%20V2%20OpenACC%20.pdf
- https://devblogs.nvidia.com/using-nsight-compute-to-inspect-your-kernels/
- https://www.olcf.ornl.gov/wp-content/uploads/2019/08/NVIDIA-Profilers.pdf
- http://on-demand.gputechconf.com/gtc/2017/presentation/s7445-jakob-progsch-what-the-profiler-is-telling-you.pdf
- monitoring performance : https://github.com/NERSC/timemory
- roofline model
- C++ wrapper library
- template CMake project for CUDA
- Multi-GPU programming from FZJ
- Multi-GPU programming from Nvidia
- CUDA Library samples (cuFFT, cuSolver , cuSparse, ...)
- MatX, a GPU-Accelerated Numerical Computing C++ library
- (NEW 2021) legate and cuNumeric
- cuNumeric: drop-in remplacement for Numpy, built on top of legion
- stdpar + cython
- Numba // recommended numba tutorial for GPU programming
- CuPy
- pycuda
- python / C++ CUDA interface (SWIG and Cython)
- python / C++ CUDA interface with pybind11
- PythonHPC
- HPC Python video's
- Hands-On GPU Programming with Python and CUDA and examples
- 2020-geilo-gpu-python
- Numba introduction
- https://towardsdatascience.com/fast-data-augmentation-in-pytorch-using-nvidia-dali-68f5432e1f5f
- https://ep2019.europython.eu/media/conference/slides/fX8dJsD-distributed-multi-gpu-computing-with-dask-cupy-and-rapids.pdf
- https://github.com/NVIDIA/DeepLearningExamples
- https://github.com/chagaz/hpc-ai-ml-2019
- tensorflow tutorial
- AI cheatsheet
- m2dsupsdlclass
- deep-learning-with-python-notebooks
- https://d2l.ai/
- Building a neural network FROM SCRATCH (no Tensorflow/Pytorch, just numpy & math)
- Artificial Neural Networks for Solving Ordinary and Partial Differential Equations, Lagaris etal, IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 9, NO. 5, SEPTEMBER 1998
- Physics Informed Deep Learning (Part I): Data-driven, Solutions of Nonlinear Partial Differential Equations, https://arxiv.org/pdf/1711.10561.pdf
- Raissi et al, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, https://doi.org/10.1016/j.jcp.2018.10.045
- https://github.com/openhackathons-org/gpubootcamp/tree/master/hpc_ai/PINN
- Nvidia Modulus documentation
- Nvidia Modulus source code
- Nvidia Modulus examples
- DeepXDE
- TensorDiffEq
- SciANN, SciANN examples
- neurodiffeq
- Julia's DiffEqFlux.jl, NeuralOperators.jl and OperatorLearning
- https://github.com/maziarraissi/PINNs
- Fourier Neural Operator
- a review article : Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next
- slides by Lu Lu (Univ. Penn)
- https://raytracing.github.io/
- https://github.com/RayTracing/raytracing.github.io
- https://github.com/rogerallen/raytracinginoneweekendincuda : code très clean, super
- https://www.openmp.org/wp-content/uploads/openmp-examples-4.5.0.pdf
- https://github.com/OpenMP/Examples/tree/v4.5.0/sources
- https://ukopenmpusers.co.uk/wp-content/uploads/uk-openmp-users-2018-OpenMP45Tutorial_new.pdf
- https://www.nas.nasa.gov/hecc/assets/pdf/training/OpenMP4.5_3-20-19.pdf
- http://www.admin-magazine.com/HPC/Articles/OpenMP-Coding-Habits-and-GPUs?utm_source=AMEP
- How to build yourself clang with OpenMP target support for Nvidia GPUs
- https://hpc-wiki.info/hpc/Building_LLVM/Clang_with_OpenMP_Offloading_to_NVIDIA_GPUs
- //devmesh.intel.com/blog/724749/how-to-build-and-run-your-modern-parallel-code-in-c-17-and-openmp-4-5-library-on-nvidia-gpus
- https://www.openmp.org/wp-content/uploads/SC17-OpenMPBooth_jlarkin.pdf
- OpenMP 5.0 for accelerators at GTC 2019
- LLVM/Clang based compiler for both AMD/NVidia GPUs
- OpenMP target examples
How to build clang++ with openmp target (off-loading) support ?
- https://devmesh.intel.com/blog/724749/how-to-build-and-run-your-modern-parallel-code-in-c-17-and-openmp-4-5-library-on-nvidia-gpus
- https://hpc-wiki.info/hpc/Building_LLVM/Clang_with_OpenMP_Offloading_to_NVIDIA_GPUs
- OpenACC Programming and Best Practices Guide
- PGI compiler - OpenACC getting started guide
- https://www.fz-juelich.de/SharedDocs/Downloads/IAS/JSC/EN/slides/openacc/2-openacc-introduction.pdf?__blob=publicationFile
- Introduction to GPU programming using OpenACC
- https://github.com/eth-cscs/SummerSchool2019/tree/master/topics/openacc
- https://developer.nvidia.com/openacc-overview-course
- https://perso.centrale-marseille.fr/~gchiavassa/visible/HPC/01%20-%20GR%20%20Intro%20to%20GPU%20programming%20V2%20OpenACC%20.pdf
- Jeff Larkin (Nvidia) Introduction to OpenACC
- Jeff Larkin (Nvidia) OpenACC data management
- Jeff Larkin (Nvidia) OpenACC optimizations
- OpenAcc training material as notebooks
- https://www.pgroup.com/resources/docs/19.10/pdf/pgi19proftut.pdf
- https://github.com/OpenACCUserGroup/openacc_concept_strategies_book
- https://developer.nvidia.com/blog/solar-storm-modeling-gpu-openacc/
Which compiler with OpenAcc support ?
- Nvidia/PGI compiler is the oldest and probably more mature OpenACC compiler.
- GNU/gcc provided by Spack is the easiest way to get started for OpenMP/OpenACC offload with the GNU compiler.
- accelerating-standard-c-with-gpus-using-stdpar/ for Nivia GPUs
- a real life example in CFD: LULESH
- another reference in CFD stdpar for Lattice Boltzmann simulation and its companion code
- https://github.com/shwina/stdpar-cython/
- https://software.intel.com/content/www/us/en/develop/articles/get-started-with-parallel-stl.html
Which compiler ?
- Nvidia/PGI compiler for Nvidia GPUs
- GNU g++ version >= 9.1 (+ TBB) for multicore CPUs
- clang >= 10.0.1 for multicore CPUs
- Intel OneApi HPC Toolkit
- https://developer.nvidia.com/blog/accelerating-fortran-do-concurrent-with-gpus-and-the-nvidia-hpc-sdk/
- example code euler2d_cudaFortran : solving Euler's equations in Fortran with stdpar (do concurrent loops)
- Khronos
- syclacademy
- oneAPI-samples
- more oneAPI / SYCL samples
- a short tutorial
- Compilers / toolchain
- codeplay
- Intel OneAPI. If you want Nvidia GPU support, you'll have to rebuild llvm/clang from the source code, see instructions; OneAPI DPC++ actually is a SYCL implementation + extensions (Unified Shared Memory, Explicit SIMD, ...)
- triSYCL for Xilinx FPGA target
- Comparison Kokkos/SYCL (early 2020)
- The CUDA Handbook: A Comprehensive Guide to GPU Programming, by Nicholas Wilt, Pearson Education.
- CUDA by example, by Sanders and Kandrot, Addison-Wesley, 2010. Also available in pdf
- Learn CUDA programming by B. Sharma and J. Han, Packt Publishing, 2019
- Python + CUDA : https://github.com/PacktPublishing/Hands-On-GPU-Programming-with-Python-and-CUDA
- https://www.oreilly.com/library/view/hands-on-gpu-programming/9781788993913/ by Brian Tuomanen
- Discovering Modern C++: An Intensive Course for Scientists, Engineers, and Programmers, and companion github website
- https://github.com/changkun/modern-cpp-tutorial
- https://github.com/eth-cscs/examples_cpp
- https://github.com/mandliya/algorithms_and_data_structures
- https://www.fz-juelich.de/SharedDocs/Downloads/IAS/JSC/EN/slides/cplusplus/cplusplus.pdf?__blob=publicationFile
- https://gitlab.maisondelasimulation.fr/tpadiole/hpcpp
- http://www.cppstdlib.com/
- http://101.lv/learn/C++/
- https://github.com/caveofprogramming/advanced-cplusplus
- https://en.cppreference.com/w/
- list of Lists of C++ related resources: https://github.com/fffaraz/awesome-cpp
- list of books on C++ : https://github.com/fffaraz/awesome-cpp/blob/master/books.md
- C++ idioms
- Design Patterns and Book on design patterns for modern c++
- Julich training on C++
- CSCS computing center training on C++ videos
- CppCon and videos on YouTube
- Bo Qiang YouTube channel on C++11
- https://github.com/TheAlgorithms/C-Plus-Plus
- cours de C++ de l'université de Strasbourg
Alternate programming models for programming modern computing architectures in a performance portable way:
- introduction to performance portability
- https://github.com/arrayfire/arrayfire
- https://docs.nvidia.com/cuda/thrust/index.html
- https://github.com/kokkos/kokkos
- https://github.com/LLNL/RAJA et https://github.com/LLNL/RAJA-tutorials
- https://github.com/triSYCL/triSYCL
- https://github.com/codeplaysoftware/computecpp-sdk
- https://github.com/kokkos/kokkos
- https://github.com/kokkos/kokkos-tutorials
- https://github.com/kokkos/kokkos-tutorials/wiki/Kokkos-Lecture-Series
- C++ Performance Portability - A Decade of Lessons Learned - Christian Trott - CppCon 2022
- cmake-cookbook and the book
- Modern CMake tutorial
- template CMake project for CUDA
- GPUs for science day
- Udacity CS344 video archive
- cuda related : https://gist.github.com/allanmac/f91b67c112bcba98649d - cuda_assert
- FPGA, loop transformation, matrix multiplication
- Cycle du hype
- https://press3.mcs.anl.gov/atpesc/files/2019/08/ATPESC_2019_Dinner_Talk_8_8-7_Foster-Coding_the_Continuum.pdf
- Learn/improve your skill on Linux’s command line/Bash e.g. http://swcarpentry.github.io/shell-novice/
- http://www.tldp.org/LDP/abs/html/
- http://www.epons.org/commandes-base-linux.php
- The art of command line
- https://www.nextplatform.com/
- subscribe blog/news letters on HPC; e.g. Admin-magazine / HPC
- (En anglais) Intel Parallel Universe Magazine
- Portage d'un code C++ de simulation des équations de Navier-Stokes par la méthode de Boltzmann sur réseau.