# Reduction and multireduction refactoring

Brief summary:

- rewritten multireduction using lambda functions
- avoided
`volatile`

using`__syncwarp()`

- using reduction functions with
`return a + b`

instead of`a += b`

- using
`std::plus`

,`std::multiplies`

,`std::logical_and`

,`std::logical_or`

, etc. instead of custom lambda functions - optimized OpenMP thread counts for reduction and multireduction
- added computation of sample standard deviation to benchmarks
- implemented parallel prefix-sum with OpenMP
- implemented distributed prefix-sum

Edited by Jakub Klinkovský