- Nov 26, 2019
-
-
Lukas Cejka authored
-
Lukas Cejka authored
-
Lukas Cejka authored
-
Lukas Cejka authored
-
Lukas Cejka authored
-
Lukas Cejka authored
Copied tnl-benchmark-spmv files and spmv.h from BLAS to SpMV. Deleted min/max size and stepFactor. Not working yet, backup purposes.
-
Lukas Cejka authored
-
Lukas Cejka authored
-
Lukas Cejka authored
-
- Nov 10, 2019
-
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
- Nov 08, 2019
-
-
Jakub Klinkovský authored
Refactoring for execution policies Closes #49, #46, and #11 See merge request !42
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
Closes #49
-
Jakub Klinkovský authored
They are not suitable for more than 2 devices/execution types; their design breaks the Open-Closed Principle. Instead, a type template "Self" was created, which allows to change any template parameter.
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
- Oct 25, 2019
-
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
Serialization in TNL::File: File::save and File::load are specialized by Allocator instead of Device
-
Jakub Klinkovský authored
The usage of algorithms such as MemoryOperations or Reduction is not bound to a particular container. On the other hand, ArrayIO, ArrayAssignment, VectorAssignment and StaticArrayAssignment are just implementation details for the containers - moved into TNL/Containers/detail/ Also moved ParallelFor, StaticFor, StaticVectorFor, TemplateStaticFor into TNL/Algorithms/
-
Jakub Klinkovský authored
This will be necessary to avoid code bloat with more than 2 devices (execution types).
-
Jakub Klinkovský authored
- cudaMemcpy is slower than our ParallelFor kernel for CUDA - use std::copy and std::equal instead of memcpy and memcmp, but only as sequential fallbacks - use parallel algorithms for containsValue and containsOnlyValue (again with sequential fallbacks)
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
It has nothing to do with devices.
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
Also set the buffer size to 1 MiB, because larger buffer size slows down memory copies significantly (e.g. MeshTest would take about 10x longer). Addresses #26
-
Jakub Klinkovský authored
-
Jakub Klinkovský authored
Moved synchronization of smart pointers from Devices::Cuda into TNL::Pointers namespace as free functions synchronizeDevice() was renamed to synchronizeSmartPointersOnDevice() for clarity - there are many similarly named functions in CUDA (e.g. cudaDeviceSynchronize()).
-