Skip to content

Allocators

Jakub Klinkovský requested to merge allocators into develop

Original idea: Device should be split to Allocator and Executor (or some better name) which represent the "memory space" and "execution model", respectively

All TNL allocators must satisfy the requirements imposed by the Allocator concept from STL.

Eventually, TNL will have these core concepts:

  1. Allocator
    • handles only allocation/deallocation
    • multiple allocators can correspond to the same "memory space"
  2. Executor
    • primarily used for specializations of algorithms
  3. Algorithms
    • basic (container-free) algorithms specialized by Executor
    • ParallelFor, Reduction, MultiReduction, ArrayOperations, ...
  4. Containers
    • classes for general data structures (TODO: alternatively use "Dense" and "Sparse", because a dense matrix can be an extended alias for 2D array)
    • Array, Vector (also VectorOperations), NDArray
  5. Views
    • View only wraps a raw pointer to data and some metadata (such as the array size), it does not do allocation and deallocation of the data. Hence, the view has a fixed size which cannot be changed.
    • It has a copy-constructor which does a shallow copy.
    • It has a copy-assignment operator which does a deep copy.
    • It has all other methods present in the relevant container (data structure).

As the first step, allocators will be added to TNL:

  • implement allocator classes
  • add tests for allocators and update ArrayOperationsTest
  • use allocators in Array
    • document the Allocator template type in Array
    • add constructors for Array with custom allocator
    • cross-device copies should be probably split from ArrayOperations because they depend on the Allocator and not on the Device/Executor (discarded because cross-device copies are done even in ArrayView so it would need the Allocator parameter too)
  • add template parameter Allocator to Vector
  • add template parameter Allocator to DistributedArray, DistributedVector etc. (this can wait for the next round)
  • use allocators in smart pointers - replace template parameter Device with Allocator? (this can wait until it is actually needed)
  • add new benchmarks for host-to-device and device-to-host transfers
  • add the triad benchmark - copy to device, compute, copy to host - use different memory management strategies (Host + explicit copy, CudaHost + explicit copy, CudaHost + zero-copy, CudaManaged + Unified Memory (+ hints))
  • document the core concepts described above
Edited by Jakub Klinkovský

Merge request reports