Refactored splitting of the scan operation in two phases
- sequential scan does not need to be split, so "perform" performs the whole simple scan algorithm, "performFirstPhase" only reduces the block (i.e. the whole vector), "performSecondPhase" performs the scan operation with the block result combined with a global offset as the initial value - parallel OpenMP scan calls the sequential scan to process the block results - parallel CUDA scan was changed such that the block results array is an exclusive scan after the first phase, same as in the other device specializations
parent
ee8e4e92
Please register or sign in to comment