Allocators
Original idea: Device
should be split to Allocator
and Executor
(or some better name) which represent the "memory space" and "execution model", respectively
All TNL allocators must satisfy the requirements imposed by the Allocator concept from STL.
Eventually, TNL will have these core concepts:
-
Allocator
- handles only allocation/deallocation
- multiple allocators can correspond to the same "memory space"
-
Executor
- primarily used for specializations of algorithms
-
Algorithms
- basic (container-free) algorithms specialized by
Executor
-
ParallelFor
,Reduction
,MultiReduction
,ArrayOperations
, ...
- basic (container-free) algorithms specialized by
-
Containers
- classes for general data structures (TODO: alternatively use "Dense" and "Sparse", because a dense matrix can be an extended alias for 2D array)
-
Array
,Vector
(alsoVectorOperations
),NDArray
-
Views
- View only wraps a raw pointer to data and some metadata (such as the array size), it does not do allocation and deallocation of the data. Hence, the view has a fixed size which cannot be changed.
- It has a copy-constructor which does a shallow copy.
- It has a copy-assignment operator which does a deep copy.
- It has all other methods present in the relevant container (data structure).
As the first step, allocators will be added to TNL:
-
implement allocator classes -
add tests for allocators and update ArrayOperationsTest
-
use allocators in Array -
document the Allocator template type in Array -
add constructors for Array with custom allocator -
cross-device copies should be probably split from(discarded because cross-device copies are done even inArrayOperations
because they depend on theAllocator
and not on theDevice
/Executor
ArrayView
so it would need theAllocator
parameter too)
-
-
add template parameter Allocator
toVector
-
add template parameter(this can wait for the next round)Allocator
toDistributedArray
,DistributedVector
etc. -
use allocators in smart pointers - replace template parameter(this can wait until it is actually needed)Device
withAllocator
? -
add new benchmarks for host-to-device and device-to-host transfers -
add the triad benchmark - copy to device, compute, copy to host - use different memory management strategies (Host + explicit copy, CudaHost + explicit copy, CudaHost + zero-copy, CudaManaged + Unified Memory (+ hints)) -
document the core concepts described above
Edited by Jakub Klinkovský