Commit 4bbf495c authored by Jakub Klinkovský's avatar Jakub Klinkovský
Browse files

Added specialization of CudaBlockReduce using __shfl instructions

parent 05903a8f
Loading
Loading
Loading
Loading
+211 −72

File changed.

Preview size limit exceeded, changes collapsed.

+1 −1
Original line number Diff line number Diff line
@@ -312,7 +312,7 @@ CudaScanKernelUpsweep( const InputView input,
   __syncthreads();

   // Perform the parallel reduction.
   value = BlockReduce::reduce( reduction, value, threadIdx.x, storage.blockReduceStorage );
   value = BlockReduce::reduce( reduction, identity, value, threadIdx.x, storage.blockReduceStorage );

   // Store the block result in the global memory.
   if( threadIdx.x == 0 )