- CUDA hello world Run a device function in multi threads.
- passing parameters to device function demonstrate why memory copy in CUDA is very important
- dive into memory copy between host(CPU memory) and device(GPU memory) pinned memory is much faster than pagable memory
- matrix add using CUDA demonstrate that GPU cache line is important to high speed performance