Commits · 7a68883366b7c9b0044750a8dc108cc1070a9544 · TNL / tnl-dev

Jul 31, 2021

Refactored CUDA parallel scan kernel · 7a688833

Jakub Klinkovský authored Jul 17, 2021

- input and output are passed by views rather than raw pointers (this
  allows to scan even vector expressions)
- consequently, indexing is different (begin and end for the global
  memory accesses)
- fixed calculation of currentSize in the launcher
- changed configuration of the kernel using the blockSize and
  valuesPerThread template parameters rather than the elementsInBlock
  runtime parameter
- changed allocation of the shared memory from dynamic to static
- the second phase kernel uses shared memory to cache block results for
  each block

7a688833

Implemented 'outplace' variants of scan and distributed scan functions · 76a95d0b
Jakub Klinkovský authored Jul 14, 2021

76a95d0b
Refactored and extended tests for scan and distributed scan · 4f1dc3af
Jakub Klinkovský authored Jul 13, 2021

4f1dc3af
Fixed bug in the second phase of CUDA scan implementation · 4467323a
Jakub Klinkovský authored Jul 14, 2021

4467323a

Segments: renamed namespace details to detail · c44b1140

Jakub Klinkovský authored Jul 11, 2021

The latter is the standard name for it and it is hidden from the
generated documentation of the public interface.

c44b1140

Fixed header includes · 42734a75
Jakub Klinkovský authored Jul 11, 2021

42734a75

Moved implementations of scan and distributed scan into the detail namespace · c1780697

Jakub Klinkovský authored Jul 11, 2021

The algorithms are supposed to be used via overloaded plain functions in
the Algorithms namespace: for now, there are only inplaceInclusiveScan
and inplaceExclusiveScan (and their distributed variant).

The scan and segmentedScan methods were removed from data structures
(Vector, VectorView, DistributedVector, DistributedVectorView). They
were inflexible (only std::plus was actually used for reduction),
incomplete (some overloads just threw NotImplementedError), and they
were violating the open-closed principle:
https://en.wikipedia.org/wiki/Open%E2%80%93closed_principle

c1780697

Fixed copy-assignment operator in DistributedArrayView according to DistributedVectorView · 624e709f
Jakub Klinkovský authored Jul 10, 2021

624e709f
Moved scan tests from Containers to Algorithms · 19e9b4e5
Jakub Klinkovský authored Jul 10, 2021

19e9b4e5
Moved segmented scan into its own header file under Algorithms · e347b486
Jakub Klinkovský authored Jul 10, 2021
```
Also moved the test under Algorithms and made sure it is actually
being compiled.
```
e347b486

Refactored parallel OpenMP scan · a4e15b08

Jakub Klinkovský authored Jul 10, 2021

The first phase performs only per-block reduction, not scan. The output
array elements are written only in the second phase, so overall we
perform only `n` instead of `2n` write operations.

a4e15b08

Removed useless DeviceType from detail::Reduction · 63d567e4
Jakub Klinkovský authored Jul 10, 2021

63d567e4
Fixed formatting in reduce.h and removed unused includes · 49839f6a
Jakub Klinkovský authored Jul 10, 2021

49839f6a
reduce: fixed the Result type in case the fetch functor returns a reference · a3f0ad65
Jakub Klinkovský authored Jul 10, 2021

a3f0ad65
Removed unnecessary lambda functions from expression templates · 72ad8e30
Jakub Klinkovský authored Jul 10, 2021

72ad8e30

Added static_asserts to the getIdempotent methods in Functional.h · a1e3a62d

Jakub Klinkovský authored Jul 10, 2021

Also fixed the idempotent values for Max and MaxWithArg
(std::numerical_limits<T>::lowest() vs std::numerical_limits<T>::min())

a1e3a62d

Added operator() to StaticArray, Array, ArrayView and ExpressionTemplates · 090a8f29

Jakub Klinkovský authored Jul 10, 2021

Hence, all StaticArray, Array, ArrayView and even expression templates are
directly usable in reduction without the need to create a wrapping fetch
functor. Also NDArray has this interface in 1D.

090a8f29

Added Devices::Sequential to DistributedVectorTest and VectorTestSetup · 045241d7
Jakub Klinkovský authored Jul 10, 2021

045241d7
Added Devices::Sequential to ArrayTest and ArrayViewTest · 1c9ff705
Jakub Klinkovský authored Jul 10, 2021

1c9ff705
Removed useless SharedPointer and ParallelFor from ArrayTest · 62100711
Jakub Klinkovský authored Jul 09, 2021
```
The tests should not rely on other parts of the library if possible.
```
62100711

Refactored splitting of the scan operation in two phases · 311fcf36

Jakub Klinkovský authored Jul 10, 2021

- sequential scan does not need to be split, so "perform" performs the
  whole simple scan algorithm, "performFirstPhase" only reduces the
  block (i.e. the whole vector), "performSecondPhase" performs the scan
  operation with the block result combined with a global offset as the
  initial value
- parallel OpenMP scan calls the sequential scan to process the block
  results
- parallel CUDA scan was changed such that the block results array is an
  exclusive scan after the first phase, same as in the other device
  specializations

311fcf36

Fixed sequential scan to apply the initial value properly · ee8e4e92
Jakub Klinkovský authored Jul 10, 2021

ee8e4e92

Refactoring scan · dfe6b1e8

Jakub Klinkovský authored Jul 09, 2021

- used ValueType instead of RealType - closes #87
- replaced prefix-sum with scan in the comments
- renamed variables containing "sum" to "result"
- fixed artificial blockShifts in the sequential implementation

dfe6b1e8

Renamed Reduction.h to reduce.h · f6a5cb16

Jakub Klinkovský authored Jul 09, 2021

The file should be named after the main function which is implemented in
it. Also changed the parameter name from "reduce" to "reduction" to
differentiate it from the main "reduce" function.

f6a5cb16

Merge branch 'TJ/static-for' into 'develop' · 0d735ef4
Jakub Klinkovský authored Jul 31, 2021
```
extension of the implementation of staticFor

See merge request !95
```
0d735ef4
Extension of the staticFor implementation · 117edb17
Tomáš Jakubec authored Apr 12, 2021

117edb17

Jul 28, 2021
- Revert "Sorting: removed unused reduction functions" · ee53fa0a
  Jakub Klinkovský authored Jul 28, 2021
```
This reverts commit 05859cdd.
```
  ee53fa0a
- Sorting: removed unused reduction functions · 05859cdd
  Jakub Klinkovský authored Jul 28, 2021
  
  05859cdd
- Sorting: fixed includes, added messages to NotImplementedError · d1777744
  Jakub Klinkovský authored Jul 28, 2021
  
  d1777744
- Fixed comparison of int and unsigned int in CedermanQuicksort.h · e9d6d0d7
  Jakub Klinkovský authored Jul 28, 2021
  
  e9d6d0d7
Jul 27, 2021
- Merge branch 'TO/sorting' into 'develop' · d37bcd8c
  Tomáš Oberhuber authored Jul 27, 2021
```
To/sorting

See merge request !99
```
  d37bcd8c
- Additional fixes of BubblerSort. · f691a84f
  Tomáš Oberhuber authored Jul 26, 2021
  
  f691a84f
- Fixing isDescending function. · 02987e4c
  Tomáš Oberhuber authored Jul 26, 2021
  
  02987e4c
- Changing parameters of sort functions to make overriding of the default sorter easier. · edc7392b
  Tomáš Oberhuber authored Jul 26, 2021
  
  edc7392b
- Fixing bubble sort. · 1e1505dd
  Tomáš Oberhuber authored Jul 26, 2021
  
  1e1505dd
- Renaming file Sort.h to sort.h. · b17d31fa
  Tomáš Oberhuber authored Jul 26, 2021
  
  b17d31fa
- Added bubble sort - to have CPU sorter with interface for lambda functions. · de7beccf
  Tomáš Oberhuber authored Jul 26, 2021
  
  de7beccf
- Writting documentation on sorting. · 4c564a62
  Tomáš Oberhuber authored Jul 19, 2021
  
  4c564a62
- Fixing namespaces in sorting source files. · 3c2b8bcc
  Tomáš Oberhuber authored Jul 19, 2021
```
Fixing header including in Nvidia bitonic sort wrapper.

Fixing namespaces definition.
```
  3c2b8bcc
- Removing folder with original implementation of quicksort and bitonic sort. · be107229
  Tomáš Oberhuber authored Jul 19, 2021
  
  be107229