Refactored parallel OpenMP scan
The first phase performs only per-block reduction, not scan. The output array elements are written only in the second phase, so overall we perform only `n` instead of `2n` write operations.
parent
63d567e4
Please register or sign in to comment