Loading
Fixed configuration of reduction kernels
The problem is that the __CUDA_ARCH__ macro is defined only in device code, so it can't be used for configuring the kernel launches.
The problem is that the __CUDA_ARCH__ macro is defined only in device code, so it can't be used for configuring the kernel launches.