-
- Downloads
There was an error fetching the commit references. Please try again later.
Optimizing parallel reduction in CUDA.
Showing
- src/core/CMakeLists.txt 4 additions, 3 deletionssrc/core/CMakeLists.txt
- src/core/cuda/cuda-prefix-sum_impl.h 2 additions, 2 deletionssrc/core/cuda/cuda-prefix-sum_impl.h
- src/core/cuda/cuda-reduction_impl.h 261 additions, 254 deletionssrc/core/cuda/cuda-reduction_impl.h
- src/core/cuda/reduction-operations.h 448 additions, 4 deletionssrc/core/cuda/reduction-operations.h
- src/core/mfuncs.h 11 additions, 0 deletionssrc/core/mfuncs.h
- src/core/tnlConstants.h 55 additions, 0 deletionssrc/core/tnlConstants.h
- src/core/tnlCuda.h 12 additions, 26 deletionssrc/core/tnlCuda.h
- src/core/vectors/tnlVectorOperationsCuda_impl.h 2 additions, 2 deletionssrc/core/vectors/tnlVectorOperationsCuda_impl.h
- tests/benchmarks/tnl-cuda-benchmarks.h 107 additions, 27 deletionstests/benchmarks/tnl-cuda-benchmarks.h
Loading
Please register or sign in to comment