Reduction and multireduction refactoring
Brief summary:
- rewritten multireduction using lambda functions
- avoided
volatile
using__syncwarp()
- using reduction functions with
return a + b
instead ofa += b
- using
std::plus
,std::multiplies
,std::logical_and
,std::logical_or
, etc. instead of custom lambda functions - optimized OpenMP thread counts for reduction and multireduction
- added computation of sample standard deviation to benchmarks
- implemented parallel prefix-sum with OpenMP
- implemented distributed prefix-sum
Edited by Jakub Klinkovský