NDArray (!18) · Merge requests · TNL / tnl-dev

Jakub Klinkovský requested to merge ndarray into develop Dec 25, 2018

Features

Applications

storage of NumDwarf coefficients in tnl-mhfem (that was the main driving force for the development)
storage for VectorField in TNL
storage of distribution functions and macroscopic quantities in LBM
subarrays: writing 1D and 2D slices into VTK
SlicedNDArray might improve data locality in CWYGMRES operations:
- prefetching cannot cross page boundaries (see meltdown.pdf)
- NUMA systems
replace StaticMatrix in tnl-mhfem with an alias to static 2-dimensional NDArray

Operations

generic traversers for scalar functions (assign, add, subtract, multiply, ...)
- forAll method: calls a lambda function f as f(i1, i2, ..., iN), where the indices are taken from the [0, sizes) multiindex range
- forInternal method: calls a lambda function f as f(i1, i2, ..., iN), where the indices are taken from the [begin, end) multiindex range (by default, begin == 1 and end == sizes - 1)
- forBoundary method: calls a lambda function f as f(i1, i2, ..., iN), where the indices are taken from the [0, sizes) - [begin, end) multiindex set (by default, begin == 1 and end == sizes - 1); parallel version works only in 1D, 2D, and 3D
reduce_along_axis, reduce_along_axes - generalized multireductions - see also https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/README.md?at=default&fileviewer=file-view-default#markdown-header-reduction-operations
apply_along_axis - apply a function to all 1D slices along given axis (challenge: parallelization of outer or inner function)
- Note that unlike numpy.apply_along_axis, the inner function cannot change the array dimension/shape.
- Note that the similar NumPy function, apply_over_axes, is not applicable to NDArray because the slices along different axes have different type so a single function cannot be applied to them. Also, even in NumPy it is interesting only with the change of dimension/shape.
reordering of ND arrays along any axis (currently hardcoded in tnl-mhfem only for one specific layout of dofs)
other NumPy array manipulation routines - logical re-shaping and transpose-like operations (i.e. return a view with different sizes or permutation of axes without changing the data)
Eigen geometrical operations on tensors

Benchmarks

compilation time depending on the number of dimensions, number of ndarrays in code, ...
overhead of the indexing calculation for high-dimensional array
operations

comparison with RAJA

identity perm, set:     1D bandwidth: 9.4718 GB/s,  6D bandwidth: 8.52481 GB/s  (9% difference)
identity perm, assign:  1D bandwidth: 11.503 GB/s,  6D bandwidth: 11.0063 GB/s  (4.5% loss compared to 1D)
reverse perm, assign:                               6D bandwidth: 9.58735 GB/s  (13% loss compared to identity 6D)

comparison with OpenFOAM - ScalarField, VectorField, TensorField, operations like tensor*vector (locally on mesh cells)

References

AoS vs SoA vs SoAoS: some LBM paper
chapters 21 and 31 in the GPU Computing Gems
P0009: Polymorphic Multidimensional Array
https://scicomp.stackexchange.com/questions/3130/how-to-efficiently-structure-simulation-data-in-memory-for-cells-with-varying-de

Edited Aug 14, 2019 by Jakub Klinkovský

NDArray

Features

Applications

Operations

Benchmarks

References

Merge request reports