Switching to "ExecutionType" instead of "DeviceType"
This continues the split of Device into Execution and Allocator. The
execution types in TNL/Execution are: Sequential, OpenMP and Cuda
(Execution::OpenMP is instead of Devices::Host).
TODO:
- smart pointers: replace Device with Allocator (methods getData() and
modifyData() should be removed, instead there should be getHostData()
and getImageData() in both const and non-const variants)
- serialization: use a placeholder string (like "any") because data from files should be loadable with any Executor or Allocator
- revise BuildConfigTags for problem-solvers
- compatibility of Executors with Allocators
- dynamic execution policy - to specify runtime parameters for a
specific (parallel) algorithm
- implementation:
some hierarchy of class templates which have the static execution policy as a template parameter
specific classes for certain algorithms (like Reduction or PrefixSum)
e.g. `DefaultExecutionParameters<CUDA>` → `ReductionExecutionParameters<CUDA>`
→ `PrefixSumExecutionParameters<CUDA>`
most algorithms should use `DefaultExecutionParameters<DeviceType>`
- then:
- extend tests:
ParallelFor: achieve full coverage with small array size
finishing reduction and multireduction on host/GPU
prefix-sum: specify suitable maxGridSize, blockSize, elementsInBlock and decrease VECTOR_TEST_SIZE
- cuda reduction: profiling + probably change "finish" to launch only 1 block of threads
- try zero-copy buffer on the host instead of CudaReductionBuffer
- try using `ParallelFor` with a specific block size in LBM
- custom kernel launch configuration for traversers
Loading
Please sign in to comment