This will also be helpful for understanding CUDA: https://www.olcf.ornl.gov/cuda-training-series The code has a lot of reduction patterns, so this might be helpful: * slides: https://www.olcf.ornl.gov/wp-content/uploads/2019/12/05_Atomics_Reductions_Warp_Shuffle.pdf * video recording: https://vimeo.com/419029739