Reduction and multireduction refactoring (!37) · Merge requests · TNL / tnl-dev

Brief summary:

rewritten multireduction using lambda functions
avoided volatile using __syncwarp()
using reduction functions with return a + b instead of a += b
using std::plus, std::multiplies, std::logical_and, std::logical_or, etc. instead of custom lambda functions
optimized OpenMP thread counts for reduction and multireduction
added computation of sample standard deviation to benchmarks
implemented parallel prefix-sum with OpenMP
implemented distributed prefix-sum

Edited Aug 16, 2019 by Jakub Klinkovský

Reduction and multireduction refactoring