Commit 2ae9e97e authored Sep 04, 2019 by Jakub Klinkovský

Switching to "ExecutionType" instead of "DeviceType"

This continues the split of Device into Execution and Allocator. The
execution types in TNL/Execution are: Sequential, OpenMP and Cuda
(Execution::OpenMP is instead of Devices::Host).

TODO:

- smart pointers: replace Device with Allocator (methods getData() and
  modifyData() should be removed, instead there should be getHostData()
  and getImageData() in both const and non-const variants)
- serialization: use a placeholder string (like "any") because data from files should be loadable with any Executor or Allocator
- revise BuildConfigTags for problem-solvers
- compatibility of Executors with Allocators
- dynamic execution policy - to specify runtime parameters for a
  specific (parallel) algorithm
   - implementation:
      some hierarchy of class templates which have the static execution policy as a template parameter
      specific classes for certain algorithms (like Reduction or PrefixSum)
      e.g. `DefaultExecutionParameters<CUDA>` → `ReductionExecutionParameters<CUDA>`
                                              → `PrefixSumExecutionParameters<CUDA>`
      most algorithms should use `DefaultExecutionParameters<DeviceType>`
   - then:
      - extend tests:
         ParallelFor: achieve full coverage with small array size
         finishing reduction and multireduction on host/GPU
         prefix-sum: specify suitable maxGridSize, blockSize, elementsInBlock and decrease VECTOR_TEST_SIZE
      - cuda reduction: profiling + probably change "finish" to launch only 1 block of threads
         - try zero-copy buffer on the host instead of CudaReductionBuffer
      - try using `ParallelFor` with a specific block size in LBM
      - custom kernel launch configuration for traversers

parent a556f79e

Expand all Show whitespace changes

Inline Side-by-side

Please to comment