Extended NDArray operations
After !18 (merged) the following things remain to be implemented:
implement generic assignment operator
- support any value, device and index types
- support any permutation
- support copies to and from non-contiguous memory (e.g. subarrays)
add support for different allocators (c.f.
- subarrays: writing 1D and 2D slices into VTK
reduce_along_axes- generalized multireductions - see also https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/README.md?at=default&fileviewer=file-view-default#markdown-header-reduction-operations
apply_along_axis- apply a function to all 1D slices along given axis (challenge: parallelization of outer or inner function)
- Note that unlike numpy.apply_along_axis, the inner function cannot change the array dimension/shape.
- Note that the similar NumPy function, apply_over_axes, is not applicable to NDArray because the slices along different axes have different type so a single function cannot be applied to them. Also, even in NumPy it is interesting only with the change of dimension/shape.
reordering of ND arrays along any axis (currently hardcoded in
tnl-mhfemonly for one specific layout of dofs)
- other NumPy array manipulation routines - logical re-shaping and transpose-like operations (i.e. return a view with different sizes or permutation of axes without changing the data)
- Eigen geometrical operations on tensors
compilation time depending on the number of dimensions, number of ndarrays in code, ...
overhead of the indexing calculation for high-dimensional array
comparison with RAJA
identity perm, set: 1D bandwidth: 9.4718 GB/s, 6D bandwidth: 8.52481 GB/s (9% difference) identity perm, assign: 1D bandwidth: 11.503 GB/s, 6D bandwidth: 11.0063 GB/s (4.5% loss compared to 1D) reverse perm, assign: 6D bandwidth: 9.58735 GB/s (13% loss compared to identity 6D)
comparison with OpenFOAM -
TensorField, operations like tensor*vector (locally on mesh cells)