- Oct 12, 2019
-
-
Jakub Klinkovský authored
This continues the split of Device into Execution and Allocator. The execution types in TNL/Execution are: Sequential, OpenMP and Cuda (Execution::OpenMP is instead of Devices::Host). TODO: - smart pointers: replace Device with Allocator (methods getData() and modifyData() should be removed, instead there should be getHostData() and getImageData() in both const and non-const variants) - serialization: use a placeholder string (like "any") because data from files should be loadable with any Executor or Allocator - revise BuildConfigTags for problem-solvers - compatibility of Executors with Allocators - dynamic execution policy - to specify runtime parameters for a specific (parallel) algorithm - implementation: some hierarchy of class templates which have the static execution policy as a template parameter specific classes for certain algorithms (like Reduction or PrefixSum) e.g. `DefaultExecutionParameters<CUDA>` → `ReductionExecutionParameters<CUDA>` → `PrefixSumExecutionParameters<CUDA>` most algorithms should use `DefaultExecutionParameters<DeviceType>` - then: - extend tests: ParallelFor: achieve full coverage with small array size finishing reduction and multireduction on host/GPU prefix-sum: specify suitable maxGridSize, blockSize, elementsInBlock and decrease VECTOR_TEST_SIZE - cuda reduction: profiling + probably change "finish" to launch only 1 block of threads - try zero-copy buffer on the host instead of CudaReductionBuffer - try using `ParallelFor` with a specific block size in LBM - custom kernel launch configuration for traversers
-
Jakub Klinkovský authored
They are not suitable for more than 2 devices/execution types; their design breaks the Open-Closed Principle. Instead, a type template "Self" was created, which allows to change any template parameter.
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
Serialization in TNL::File: File::save and File::load are specialized by Allocator instead of Device
-
Jakub Klinkovský authored
The usage of algorithms such as MemoryOperations or Reduction is not bound to a particular container. On the other hand, ArrayIO, ArrayAssignment, VectorAssignment and StaticArrayAssignment are just implementation details for the containers - moved into TNL/Containers/detail/ Also moved ParallelFor, StaticFor, StaticVectorFor, TemplateStaticFor into TNL/Algorithms/
-
Jakub Klinkovský authored
This will be necessary to avoid code bloat with more than 2 devices (execution types).
-
Jakub Klinkovský authored
- cudaMemcpy is slower than our ParallelFor kernel for CUDA - use std::copy and std::equal instead of memcpy and memcmp, but only as sequential fallbacks - use parallel algorithms for containsValue and containsOnlyValue (again with sequential fallbacks)
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
It has nothing to do with devices.
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
Also set the buffer size to 1 MiB, because larger buffer size slows down memory copies significantly (e.g. MeshTest would take about 10x longer). Addresses #26
-
- Oct 11, 2019
-
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
Moved synchronization of smart pointers from Devices::Cuda into TNL::Pointers namespace as free functions synchronizeDevice() was renamed to synchronizeSmartPointersOnDevice() for clarity - there are many similarly named functions in CUDA (e.g. cudaDeviceSynchronize()).
-
Jakub Klinkovský authored
Moved (most of) static methods from TNL::Devices::Cuda as free functions into separate namespace TNL::Cuda The class TNL::Devices::Cuda was too bloated, breaking the Single Responsibility Principle. It should be used only for template specializations and other things common to all devices. The functions in MemoryHelpers.h are deprecated, smart pointers should be used instead. The functions in LaunchHelpers.h are temporary, more refactoring is needed with respect to execution policies and custom launch parameters.
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
Fixes #46
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
The implementation for std::string (which is a base class of TNL::String) is perfectly sufficient.
-
Jakub Klinkovský authored
Fixes #11
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
- Sep 20, 2019
-
-
Jakub Klinkovský authored
-
- Sep 03, 2019
-
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
StaticArrayAssignment expects the arguments passed as reference.
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
- Sep 02, 2019
-
-
Jakub Klinkovský authored
-
Tomáš Oberhuber authored
Tutorials See merge request !41
-
Tomáš Oberhuber authored
-
Tomáš Oberhuber authored
Added command line parameter --tests-jobs to build script to set number of processes for unit tests.
-
Tomáš Oberhuber authored
-