There was an error fetching the commit references. Please try again later.
Refactored parallel OpenMP scan
The first phase performs only per-block reduction, not scan. The output array elements are written only in the second phase, so overall we perform only `n` instead of `2n` write operations.
Loading
Please register or sign in to comment