+19
−40
+78
−93
File changed.
Preview size limit exceeded, changes collapsed.
Loading
Reduction and multireduction refactoring Brief summary: - rewritten multireduction using lambda functions - avoided `volatile` using `__syncwarp()` - using reduction functions with `return a + b` instead of `a += b` - using `std::plus`, `std::multiplies`, `std::logical_and`, `std::logical_or`, etc. instead of custom lambda functions - optimized OpenMP thread counts for reduction and multireduction - added computation of sample standard deviation to benchmarks - implemented parallel prefix-sum with OpenMP - implemented distributed prefix-sum See merge request !37
File changed.
Preview size limit exceeded, changes collapsed.