Merge branch 'JK/multireduction' into 'develop'
Reduction and multireduction refactoring Brief summary: - rewritten multireduction using lambda functions - avoided `volatile` using `__syncwarp()` - using reduction functions with `return a + b` instead of `a += b` - using `std::plus`, `std::multiplies`, `std::logical_and`, `std::logical_or`, etc. instead of custom lambda functions - optimized OpenMP thread counts for reduction and multireduction - added computation of sample standard deviation to benchmarks - implemented parallel prefix-sum with OpenMP - implemented distributed prefix-sum See merge request !37
Please register or sign in to comment