Skip to content
Commit 2ae9e97e authored by Jakub Klinkovský's avatar Jakub Klinkovský
Browse files

Switching to "ExecutionType" instead of "DeviceType"

This continues the split of Device into Execution and Allocator. The
execution types in TNL/Execution are: Sequential, OpenMP and Cuda
(Execution::OpenMP is instead of Devices::Host).

TODO:

- smart pointers: replace Device with Allocator (methods getData() and
  modifyData() should be removed, instead there should be getHostData()
  and getImageData() in both const and non-const variants)
- serialization: use a placeholder string (like "any") because data from files should be loadable with any Executor or Allocator
- revise BuildConfigTags for problem-solvers
- compatibility of Executors with Allocators
- dynamic execution policy - to specify runtime parameters for a
  specific (parallel) algorithm
   - implementation:
      some hierarchy of class templates which have the static execution policy as a template parameter
      specific classes for certain algorithms (like Reduction or PrefixSum)
      e.g. `DefaultExecutionParameters<CUDA>` → `ReductionExecutionParameters<CUDA>`
                                              → `PrefixSumExecutionParameters<CUDA>`
      most algorithms should use `DefaultExecutionParameters<DeviceType>`
   - then:
      - extend tests:
         ParallelFor: achieve full coverage with small array size
         finishing reduction and multireduction on host/GPU
         prefix-sum: specify suitable maxGridSize, blockSize, elementsInBlock and decrease VECTOR_TEST_SIZE
      - cuda reduction: profiling + probably change "finish" to launch only 1 block of threads
         - try zero-copy buffer on the host instead of CudaReductionBuffer
      - try using `ParallelFor` with a specific block size in LBM
      - custom kernel launch configuration for traversers
parent a556f79e
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment