There was an error fetching the commit references. Please try again later.
Optimizing CUDA parallel reduction.
Showing
- src/core/cuda/cuda-prefix-sum_impl.h 15 additions, 10 deletionssrc/core/cuda/cuda-prefix-sum_impl.h
- src/core/cuda/cuda-reduction_impl.h 83 additions, 146 deletionssrc/core/cuda/cuda-reduction_impl.h
- src/core/cuda/reduction-operations.h 83 additions, 1264 deletionssrc/core/cuda/reduction-operations.h
- src/core/cuda/tnlCudaReduction.h 3 additions, 3 deletionssrc/core/cuda/tnlCudaReduction.h
- src/core/cuda/tnlCudaReduction_impl.h 4 additions, 4 deletionssrc/core/cuda/tnlCudaReduction_impl.h
- tests/benchmarks/tnl-cuda-benchmarks.h 34 additions, 10 deletionstests/benchmarks/tnl-cuda-benchmarks.h
Loading
Please register or sign in to comment