Commits · 2ae9e97e8f1bdf8b274854aa00b5b5f7b76e4644 · TNL / tnl-dev

Oct 12, 2019

Switching to "ExecutionType" instead of "DeviceType" · 2ae9e97e

Jakub Klinkovský authored Sep 04, 2019

This continues the split of Device into Execution and Allocator. The
execution types in TNL/Execution are: Sequential, OpenMP and Cuda
(Execution::OpenMP is instead of Devices::Host).

TODO:

- smart pointers: replace Device with Allocator (methods getData() and
  modifyData() should be removed, instead there should be getHostData()
  and getImageData() in both const and non-const variants)
- serialization: use a placeholder string (like "any") because data from files should be loadable with any Executor or Allocator
- revise BuildConfigTags for problem-solvers
- compatibility of Executors with Allocators
- dynamic execution policy - to specify runtime parameters for a
  specific (parallel) algorithm
   - implementation:
      some hierarchy of class templates which have the static execution policy as a template parameter
      specific classes for certain algorithms (like Reduction or PrefixSum)
      e.g. `DefaultExecutionParameters<CUDA>` → `ReductionExecutionParameters<CUDA>`
                                              → `PrefixSumExecutionParameters<CUDA>`
      most algorithms should use `DefaultExecutionParameters<DeviceType>`
   - then:
      - extend tests:
         ParallelFor: achieve full coverage with small array size
         finishing reduction and multireduction on host/GPU
         prefix-sum: specify suitable maxGridSize, blockSize, elementsInBlock and decrease VECTOR_TEST_SIZE
      - cuda reduction: profiling + probably change "finish" to launch only 1 block of threads
         - try zero-copy buffer on the host instead of CudaReductionBuffer
      - try using `ParallelFor` with a specific block size in LBM
      - custom kernel launch configuration for traversers

2ae9e97e

Removed HostType and CudaType aliases in containers, matrices and grids · a556f79e

Jakub Klinkovský authored Oct 11, 2019

They are not suitable for more than 2 devices/execution types; their design
breaks the Open-Closed Principle. Instead, a type template "Self" was
created, which allows to change any template parameter.

a556f79e

Removed useless typedefs such as ThisType · d81d168e
Jakub Klinkovský authored Oct 11, 2019

d81d168e
Removed Containers::List because it has no benefits over std::list · 413f98e0
Jakub Klinkovský authored Oct 11, 2019

413f98e0
Fixed handling of --build parameter in the install script · f10ac046
Jakub Klinkovský authored Oct 11, 2019

f10ac046
Enforce builds without (more or less) any warnings · aea96c2d
Jakub Klinkovský authored Aug 25, 2019

aea96c2d
Added Devices::Sequential and corresponding specializations in TNL::Algorithms · 8310274f
Jakub Klinkovský authored Sep 04, 2019

8310274f
Serialization in TNL::File: File::save and File::load are specialized by... · ae61fa6a
Jakub Klinkovský authored Sep 02, 2019
```
Serialization in TNL::File: File::save and File::load are specialized by Allocator instead of Device
```
ae61fa6a

Moved algorithms from TNL/Containers/Algorithms/ to just TNL/Algorithms/ · 58d96777

Jakub Klinkovský authored Sep 01, 2019

The usage of algorithms such as MemoryOperations or Reduction is not
bound to a particular container. On the other hand, ArrayIO,
ArrayAssignment, VectorAssignment and StaticArrayAssignment are just
implementation details for the containers - moved into
TNL/Containers/detail/

Also moved ParallelFor, StaticFor, StaticVectorFor, TemplateStaticFor
into TNL/Algorithms/

58d96777

Split ArrayOperations into MemoryOperations and MultiDeviceMemoryOperations · 0da362f3
Jakub Klinkovský authored Sep 01, 2019
```
This will be necessary to avoid code bloat with more than 2 devices
(execution types).
```
0da362f3

ArrayOperations: using more parallel algorithms and suitable sequential fallbacks · 2478380f

Jakub Klinkovský authored Aug 22, 2019

- cudaMemcpy is slower than our ParallelFor kernel for CUDA
- use std::copy and std::equal instead of memcpy and memcmp, but only as
  sequential fallbacks
- use parallel algorithms for containsValue and containsOnlyValue (again
  with sequential fallbacks)

2478380f

ArrayOperations: added missing methods for the static/sequential specialization · 41f5d702
Jakub Klinkovský authored Aug 22, 2019

41f5d702
Benchmarks: added benchmarks for array copy and compare using memcpy and memcmp · e77c09b7
Jakub Klinkovský authored Aug 22, 2019

e77c09b7
Moved SystemInfo class out of the Devices namespace · bfda4f7b
Jakub Klinkovský authored Aug 21, 2019
```
It has nothing to do with devices.
```
bfda4f7b
Cleaned up Devices::Cuda · f01b1501
Jakub Klinkovský authored Oct 12, 2019

f01b1501

Removed duplicate TransferBufferSize constants · d8c43181

Jakub Klinkovský authored Aug 22, 2019

Also set the buffer size to 1 MiB, because larger buffer size slows down
memory copies significantly (e.g. MeshTest would take about 10x longer).

Addresses #26

d8c43181

Oct 11, 2019
- Moved atomicAdd function from Devices/Cuda.h into Atomic.h · edcf76d0
  Jakub Klinkovský authored Aug 22, 2019
  
  edcf76d0
- Moved synchronization of smart pointers from Devices::Cuda into TNL::Pointers... · e1f8e538
  Jakub Klinkovský authored Aug 21, 2019
```
Moved synchronization of smart pointers from Devices::Cuda into TNL::Pointers namespace as free functions

synchronizeDevice() was renamed to synchronizeSmartPointersOnDevice()
for clarity - there are many similarly named functions in CUDA (e.g.
cudaDeviceSynchronize()).
```
  e1f8e538
- Moved (most of) static methods from TNL::Devices::Cuda as free functions into... · c4a004ef
  Jakub Klinkovský authored Aug 21, 2019
```
Moved (most of) static methods from TNL::Devices::Cuda as free functions into separate namespace TNL::Cuda

The class TNL::Devices::Cuda was too bloated, breaking the Single
Responsibility Principle. It should be used only for template
specializations and other things common to all devices.

The functions in MemoryHelpers.h are deprecated, smart pointers should
be used instead.

The functions in LaunchHelpers.h are temporary, more refactoring is
needed with respect to execution policies and custom launch parameters.
```
  c4a004ef
- Added default stream synchronizations after kernel launches in CudaReductionKernel.h · f2a063c5
  Jakub Klinkovský authored Aug 21, 2019
  
  f2a063c5
- Fixed parseCommandLine after refactoring the getType function · b8501a6a
  Jakub Klinkovský authored Aug 22, 2019
  
  b8501a6a
- Reimplemented getType() function using typeid operator and removed useless getType() methods · 76c3ffa6
  Jakub Klinkovský authored Aug 21, 2019
```
Fixes #46
```
  76c3ffa6
- Removed custom implementation of std::make_unique which is available in STL since C++14 · 666bf781
  Jakub Klinkovský authored Aug 20, 2019
  
  666bf781
- Removed useless operator<< for TNL::String · 843eca3f
  Jakub Klinkovský authored Aug 20, 2019
```
The implementation for std::string (which is a base class of
TNL::String) is perfectly sufficient.
```
  843eca3f
- Refactoring VectorFieldVTKWriter · 57dff889
  Jakub Klinkovský authored Aug 20, 2019
```
Fixes #11
```
  57dff889
- Devices: replaced getDeviceType() with getType() · 3cdcb547
  Jakub Klinkovský authored Aug 19, 2019
  
  3cdcb547
- Removed MIC support · 50617068
  Jakub Klinkovský authored Aug 17, 2019
  
  50617068
Sep 20, 2019
- Fixed distributed scan without OpenMP · 552e90c4
  Jakub Klinkovský authored Sep 20, 2019
  
  552e90c4
Sep 03, 2019
- Style changes in StaticArray · bb04a590
  Jakub Klinkovský authored Sep 03, 2019
  
  bb04a590
- Fixed forwarding of arguments passed to StaticFor · db254ee8
  Jakub Klinkovský authored Sep 03, 2019
```
StaticArrayAssignment expects the arguments passed as reference.
```
  db254ee8
- Fixed StaticArrayTest · 7a9a3087
  Jakub Klinkovský authored Sep 03, 2019
  
  7a9a3087
- Simplified functors in StaticArrayAssignment.h · ccf1aa07
  Jakub Klinkovský authored Sep 03, 2019
  
  ccf1aa07
- Cleaned up examples in reduction tutorials · 1765a6b2
  Jakub Klinkovský authored Sep 03, 2019
  
  1765a6b2
- Cleanup · 78d15fb0
  Jakub Klinkovský authored Aug 18, 2019
  
  78d15fb0
- Fixed const-qualification of dataFetcher in CUDA reduction · 1c07e0af
  Jakub Klinkovský authored Sep 03, 2019
  
  1c07e0af
Sep 02, 2019
- CI: fixed tags for documentation deployment job · 0cd31577
  Jakub Klinkovský authored Sep 02, 2019
  
  0cd31577
- Merge branch 'tutorials' into 'develop' · dd9dd992
  Tomáš Oberhuber authored Sep 02, 2019
```
Tutorials

See merge request !41
```
  dd9dd992
- Writting tutorial on StaticVector. · 9cb46d76
  Tomáš Oberhuber authored Sep 02, 2019
  
  9cb46d76
- Added command line parameter --tests-jobs to build script to set number of... · 2e04963f
  Tomáš Oberhuber authored Sep 02, 2019
```
Added command line parameter --tests-jobs to build script to set number of processes for unit tests.
```
  2e04963f
- Writting tutorial on StaticArray. · 75de139c
  Tomáš Oberhuber authored Sep 01, 2019
  
  75de139c