Found a way to avoid using volatile in CUDA reduction: __syncwarp() (cbc2fff9) · Commits · TNL / tnl-dev

Commit cbc2fff9 authored Aug 11, 2019 by

Jakub Klinkovský

The performance seems to be identical to the code using volatile.

parent b74a24d2

Please register or to comment