Optimized parallel CUDA scan algorithm to avoid unnecessary writing in the first phase
The original approach (prescan + uniform shift) is more efficient for inputs that are expensive to evaluate, such as vector expressions.
parent
2f61104b
Please register or sign in to comment