- Jul 31, 2021
-
-
Jakub Klinkovský authored
- input and output are passed by views rather than raw pointers (this allows to scan even vector expressions) - consequently, indexing is different (begin and end for the global memory accesses) - fixed calculation of currentSize in the launcher - changed configuration of the kernel using the blockSize and valuesPerThread template parameters rather than the elementsInBlock runtime parameter - changed allocation of the shared memory from dynamic to static - the second phase kernel uses shared memory to cache block results for each block
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
The latter is the standard name for it and it is hidden from the generated documentation of the public interface.
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
The algorithms are supposed to be used via overloaded plain functions in the Algorithms namespace: for now, there are only inplaceInclusiveScan and inplaceExclusiveScan (and their distributed variant). The scan and segmentedScan methods were removed from data structures (Vector, VectorView, DistributedVector, DistributedVectorView). They were inflexible (only std::plus was actually used for reduction), incomplete (some overloads just threw NotImplementedError), and they were violating the open-closed principle: https://en.wikipedia.org/wiki/Open%E2%80%93closed_principle
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
Also moved the test under Algorithms and made sure it is actually being compiled.
-
Jakub Klinkovský authored
The first phase performs only per-block reduction, not scan. The output array elements are written only in the second phase, so overall we perform only `n` instead of `2n` write operations.
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
Also fixed the idempotent values for Max and MaxWithArg (std::numerical_limits<T>::lowest() vs std::numerical_limits<T>::min())
-
Jakub Klinkovský authored
Hence, all StaticArray, Array, ArrayView and even expression templates are directly usable in reduction without the need to create a wrapping fetch functor. Also NDArray has this interface in 1D.
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
The tests should not rely on other parts of the library if possible.
-
Jakub Klinkovský authored
- sequential scan does not need to be split, so "perform" performs the whole simple scan algorithm, "performFirstPhase" only reduces the block (i.e. the whole vector), "performSecondPhase" performs the scan operation with the block result combined with a global offset as the initial value - parallel OpenMP scan calls the sequential scan to process the block results - parallel CUDA scan was changed such that the block results array is an exclusive scan after the first phase, same as in the other device specializations
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
- used ValueType instead of RealType - closes #87 - replaced prefix-sum with scan in the comments - renamed variables containing "sum" to "result" - fixed artificial blockShifts in the sequential implementation
-
Jakub Klinkovský authored
The file should be named after the main function which is implemented in it. Also changed the parameter name from "reduce" to "reduction" to differentiate it from the main "reduce" function.
-
Jakub Klinkovský authored
extension of the implementation of staticFor See merge request !95
-
Tomáš Jakubec authored
-
- Jul 28, 2021
-
-
Jakub Klinkovský authored
This reverts commit 05859cdd.
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
- Jul 27, 2021
-
-
Tomáš Oberhuber authored
To/sorting See merge request !99
-
Tomáš Oberhuber authored
-
Tomáš Oberhuber authored
-
Tomáš Oberhuber authored
-
Tomáš Oberhuber authored
-
Tomáš Oberhuber authored
-
Tomáš Oberhuber authored
-
Tomáš Oberhuber authored
-
Tomáš Oberhuber authored
Fixing header including in Nvidia bitonic sort wrapper. Fixing namespaces definition.
-
Tomáš Oberhuber authored
-