- Sep 16, 2021
-
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
CUDA kernels should not ever work with distributed data structures, they should always get the underlying *local* data structure.
-
Jakub Klinkovský authored
- The "NullGroup" should not be used even when built without MPI, otherwise the behaviour is very bug-prone because "NullGroup" usage is not caught and changing the build type leads to a different semantics. - "AllGroup" is not a good default value for the parameters, considering that the class attributes are initialized to "NullGroup".
-
- Sep 14, 2021
-
-
Jakub Klinkovský authored
Buffering with a small value is very slow when profiling anything in nvpp.
-
- Sep 03, 2021
-
-
Jakub Klinkovský authored
-
- Sep 02, 2021
-
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
- Sep 01, 2021
-
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
- Aug 31, 2021
-
-
Jakub Klinkovský authored
-
- Aug 27, 2021
-
-
Jakub Klinkovský authored
Amends e5fc6a96
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
- Aug 11, 2021
-
-
Jakub Klinkovský authored
Scan refactoring Closes #87 See merge request !100
-
- Aug 08, 2021
-
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
- Aug 06, 2021
-
-
Jakub Klinkovský authored
- structs from HorizontalOperations.h reimplemented as function objects in Functional.h - repetitive function definitions generated using macros - added new operators: % (modulus) and ^ (xor)
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
This way we test both the general CUDA implementation using shared memory and the specialization using __shfl instructions. Both the reduction and scan kernels needed some tweaks due to shared memory usage with non-fundamental types.
-
Jakub Klinkovský authored
This is needed because custom specializations of std::is_arithmetic cannot be used (they cause an undefined behaviour).
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
- Aug 03, 2021
-
-
Jakub Klinkovský authored
The algorithms are implemented as plain functions in TNL::Algorithms. containsValue was replaced with contains.
-
- Jul 31, 2021
-
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
Removed reduction methods from Array and ArrayView, instead added overloads of reduce and reduceWithArgument for arrays/views Plain functions are much more flexible than methods. The methods were also violating the open-closed principle: https://en.wikipedia.org/wiki/Open%E2%80%93closed_principle
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
This adds back the original approach (prescan + uniform shift) which was removed too early.
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
The original approach (prescan + uniform shift) is more efficient for inputs that are expensive to evaluate, such as vector expressions.
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-