There was an error fetching the commit references. Please try again later.
Optimizing parallel reduction in CUDA.
Showing
- src/core/cuda/CMakeLists.txt 2 additions, 0 deletionssrc/core/cuda/CMakeLists.txt
- src/core/cuda/cuda-prefix-sum_impl.h 3 additions, 2 deletionssrc/core/cuda/cuda-prefix-sum_impl.h
- src/core/cuda/cuda-reduction_impl.h 19 additions, 181 deletionssrc/core/cuda/cuda-reduction_impl.h
- src/core/cuda/reduction-operations.h 182 additions, 173 deletionssrc/core/cuda/reduction-operations.h
- src/core/cuda/tnlCudaReduction.h 62 additions, 0 deletionssrc/core/cuda/tnlCudaReduction.h
- src/core/cuda/tnlCudaReduction_impl.h 296 additions, 0 deletionssrc/core/cuda/tnlCudaReduction_impl.h
- src/core/tnlConstants.h 1 addition, 0 deletionssrc/core/tnlConstants.h
Loading
Please register or sign in to comment