Embarking on the journey of CUDA learning is both intriguing and extensive. To record my progress and insights, I will maintain a journal within this repository. My goal is to execute all the code snippets within the collaborative environment of Google Colab, ensuring accessibility and ease of experimentation.
- shared memory
- no bank conflict
- multiple elements in one thread
- vectorized memory access
- SM have shared memory, register file , warp sechduler ans so on.
- One block can only be in one SM, but one SM contain multiple blocks.