Switching to "ExecutionType" instead of "DeviceType"
This continues the split of Device into Execution and Allocator. The execution types in TNL/Execution are: Sequential, OpenMP and Cuda (Execution::OpenMP is instead of Devices::Host). TODO: - smart pointers: replace Device with Allocator (methods getData() and modifyData() should be removed, instead there should be getHostData() and getImageData() in both const and non-const variants) - serialization: use a placeholder string (like "any") because data from files should be loadable with any Executor or Allocator - revise BuildConfigTags for problem-solvers - compatibility of Executors with Allocators - dynamic execution policy - to specify runtime parameters for a specific (parallel) algorithm - implementation: some hierarchy of class templates which have the static execution policy as a template parameter specific classes for certain algorithms (like Reduction or PrefixSum) e.g. `DefaultExecutionParameters<CUDA>` → `ReductionExecutionParameters<CUDA>` → `PrefixSumExecutionParameters<CUDA>` most algorithms should use `DefaultExecutionParameters<DeviceType>` - then: - extend tests: ParallelFor: achieve full coverage with small array size finishing reduction and multireduction on host/GPU prefix-sum: specify suitable maxGridSize, blockSize, elementsInBlock and decrease VECTOR_TEST_SIZE - cuda reduction: profiling + probably change "finish" to launch only 1 block of threads - try zero-copy buffer on the host instead of CudaReductionBuffer - try using `ParallelFor` with a specific block size in LBM - custom kernel launch configuration for traversers
parent
a556f79e
Please register or sign in to comment