Skip to content
Commit 8accbc52 authored by Jakub Klinkovský's avatar Jakub Klinkovský
Browse files

Optimized parallel CUDA scan algorithm to avoid unnecessary writing in the first phase

The original approach (prescan + uniform shift) is more efficient for
inputs that are expensive to evaluate, such as vector expressions.
parent 2f61104b
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment