Commit cbc2fff9 authored Aug 11, 2019 by Jakub Klinkovský

Found a way to avoid using volatile in CUDA reduction: __syncwarp()

The performance seems to be identical to the code using volatile.

parent b74a24d2

Please to comment