- Sep 16, 2021
-
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
It does not make sense to print a distributed array like this, because it may contain ghost elements. Users should examine the local array view manually.
-
Jakub Klinkovský authored
Replaced send/receive for Array(,View) and mpiSend/mpiReceive for String with a general implementation in the MPI namespace Also added analogous functions: MPI::sendrecv, MPI::bcast.
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
- removed wrapper functions: AllGroup, NullGroup, NullRequest - added MPI_COMM_WORLD and other handles to MPI/DummyDefs.h - renamed getCommunicationGroup to getCommunicator in all data structures - improved naming to match the MPI terminology: communicator instead of group
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
CUDA kernels should not ever work with distributed data structures, they should always get the underlying *local* data structure.
-
Jakub Klinkovský authored
- The "NullGroup" should not be used even when built without MPI, otherwise the behaviour is very bug-prone because "NullGroup" usage is not caught and changing the build type leads to a different semantics. - "AllGroup" is not a good default value for the parameters, considering that the class attributes are initialized to "NullGroup".
-
- Sep 14, 2021
-
-
Jakub Klinkovský authored
Buffering with a small value is very slow when profiling anything in nvpp.
-
- Sep 03, 2021
-
-
Jakub Klinkovský authored
-
- Sep 02, 2021
-
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
- Sep 01, 2021
-
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
- Aug 31, 2021
-
-
Jakub Klinkovský authored
-
- Aug 27, 2021
-
-
Jakub Klinkovský authored
Amends e5fc6a96
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
- Aug 11, 2021
-
-
Jakub Klinkovský authored
Scan refactoring Closes #87 See merge request !100
-
- Aug 08, 2021
-
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
- Aug 06, 2021
-
-
Jakub Klinkovský authored
- structs from HorizontalOperations.h reimplemented as function objects in Functional.h - repetitive function definitions generated using macros - added new operators: % (modulus) and ^ (xor)
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
This way we test both the general CUDA implementation using shared memory and the specialization using __shfl instructions. Both the reduction and scan kernels needed some tweaks due to shared memory usage with non-fundamental types.
-
Jakub Klinkovský authored
This is needed because custom specializations of std::is_arithmetic cannot be used (they cause an undefined behaviour).
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
- Aug 03, 2021
-
-
Jakub Klinkovský authored
The algorithms are implemented as plain functions in TNL::Algorithms. containsValue was replaced with contains.
-
- Jul 31, 2021
-
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
Removed reduction methods from Array and ArrayView, instead added overloads of reduce and reduceWithArgument for arrays/views Plain functions are much more flexible than methods. The methods were also violating the open-closed principle: https://en.wikipedia.org/wiki/Open%E2%80%93closed_principle
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-