tnl-dev issueshttps://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues2020-07-29T11:35:11Zhttps://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/81Fix reorderEntities for DistributedMesh2020-07-29T11:35:11ZJakub KlinkovskýFix reorderEntities for DistributedMeshThe current naïve implementation cannot work - `DistributedMeshSynchronizer` assumes that global indices of local entities are sorted, so we should update the global indices too and exchange the new global indices for ghost entities.The current naïve implementation cannot work - `DistributedMeshSynchronizer` assumes that global indices of local entities are sorted, so we should update the global indices too and exchange the new global indices for ghost entities.Jakub KlinkovskýJakub Klinkovskýhttps://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/80Silence of unused variable warning2020-07-12T10:22:44ZTomáš JakubecSilence of unused variable warningIt would be nice to silence the warnings caused by unused variables. There are several ways to do it:
1. do not name the unused argument (not preferable),
2. use [[maybe_unused]] (since C++17) in function definition,
3. typecast the unused variable (void) var. This approach is used in Qt.
For example consider the following function:
```c++
static std::string name( CudaStatusType error_code )
{
#ifdef HAVE_CUDA
return cudaGetErrorName( error_code );
#else
(void) error_code;
throw CudaSupportMissing();
#endif
}
```
The approach in [Qt](https://doc.qt.io/qt-5/qtglobal.html#Q_UNUSED) discussed [here](https://stackoverflow.com/questions/19576884/does-q-unused-have-any-side-effects) is the following:
```c++
#define UNUSED(var) (void) var
```
Then, the warning can be silenced by `UNUSED(error_code);`.It would be nice to silence the warnings caused by unused variables. There are several ways to do it:
1. do not name the unused argument (not preferable),
2. use [[maybe_unused]] (since C++17) in function definition,
3. typecast the unused variable (void) var. This approach is used in Qt.
For example consider the following function:
```c++
static std::string name( CudaStatusType error_code )
{
#ifdef HAVE_CUDA
return cudaGetErrorName( error_code );
#else
(void) error_code;
throw CudaSupportMissing();
#endif
}
```
The approach in [Qt](https://doc.qt.io/qt-5/qtglobal.html#Q_UNUSED) discussed [here](https://stackoverflow.com/questions/19576884/does-q-unused-have-any-side-effects) is the following:
```c++
#define UNUSED(var) (void) var
```
Then, the warning can be silenced by `UNUSED(error_code);`.Tomáš JakubecTomáš Jakubechttps://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/79StaticComparison bug2020-07-11T10:08:17ZTomáš JakubecStaticComparison bugTesting the following code:
```c++
#include <iostream>
#include <GTMesh/Debug/Debug.h>
#include <TNL/Containers/StaticVector.h>
#include <TNL/Containers/Vector.h>
using namespace std;
template<int Dim, typename Real>
struct std::numeric_limits<TNL::Containers::StaticVector<Dim, Real>>{
static constexpr bool is_specialized = true;
static TNL::Containers::StaticVector<Dim, Real> max(){
TNL::Containers::StaticVector<Dim, Real> res;
res = std::numeric_limits<Real>::max();
return res;
}
static TNL::Containers::StaticVector<Dim, Real> lowest(){
TNL::Containers::StaticVector<Dim, Real> res;
res = std::numeric_limits<Real>::lowest();
return res;
}
};
using namespace TNL;
int main()
{
TNL::Containers::Vector<TNL::Containers::StaticVector<3,int>, TNL::Devices::Host, size_t> a;
a.setSize(2);
a[0] = TNL::Containers::StaticVector<3,int>{5,-3,6};
a[1] = TNL::Containers::StaticVector<3,int>{8, 1, -5};
TNL::Containers::StaticVector<3,int> a1 = a[0];
TNL::Containers::StaticVector<3,int> a2 = a[1];
DBGVAR(a); // == ..\lookup_problem\main.cpp << 36 >> [[ a ]] ==> [ [ 5, -3, 6 ], [ 8, 1, -5 ] ]
DBGVAR((a1 < a2), (a2 < a1)); // == ..\lookup_problem\main.cpp << 37 >> [[ (a1 < a2) ]] ==> false
// == ..\lookup_problem\main.cpp << 37 >> [[ (a2 < a1) ]] ==> false
DBGVAR(min(a)); // == ..\lookup_problem\main.cpp << 39 >> [[ min(a) ]] ==> [ 5, -3, -5 ]
DBGVAR(min(a1,a2), min(a2,a1));// == ..\lookup_problem\main.cpp << 40 >> [[ min(a1,a2) ]] ==> [ 5, -3, 6 ]
// == ..\lookup_problem\main.cpp << 40 >> [[ min(a2,a1) ]] ==> [ 8, 1, -5 ]
DBGVAR(TNL::min(a1, a2)); // == ..\lookup_problem\main.cpp << 42 >> [[ TNL::min(a1, a2) ]] ==> [ 5, -3, -5 ]
return 0;
}
```
The first problem is the comparison of `a1` and `a2`. If the `(a1 < a2)` is false the `(a2 < a1)` must be true. However in both cases, the result is false, which is incorrect. The comparison utilizes the `StaticCompare::LT` defined as:
```c++
__cuda_callable__
static bool LT( const T1& a, const T2& b )
{
TNL_ASSERT_EQ( a.getSize(), b.getSize(), "Sizes of expressions to be compared do not fit." );
for( int i = 0; i < a.getSize(); i++ )
if( ! (a[ i ] < b[ i ]) )
return false;
return true;
}
```
This function does not realize a suitable comparison, e.g., lexicographical.
Secondly, there is a difference in call of min. Both `min(a)` and `TNL::min(a1, a2)` utilize `StaticBinaryExpressionTemplate< ET1, ET2, Min >`, which results in retuning a vector with minimum in each element separately (which is awesome). However, `min(a1, a2)` `min` from stl is called (the `std::min` is prioritized over `TNL::Containers::Expressions::min`) and it employs `StaticCompare::LT` through `operator<`. This problem is solved by removing `using namespace std` (which is partialy my mistake, but worth mentioning). The incorrect implementation of `StaticCompare::LT` results in dependency of the result on the order of the arguments of `min`.
The macro `DBGVAR` is defined in the [GTMesh library](https://mmg-gitlab.fjfi.cvut.cz/gitlab/jakubec/GTMesh).Testing the following code:
```c++
#include <iostream>
#include <GTMesh/Debug/Debug.h>
#include <TNL/Containers/StaticVector.h>
#include <TNL/Containers/Vector.h>
using namespace std;
template<int Dim, typename Real>
struct std::numeric_limits<TNL::Containers::StaticVector<Dim, Real>>{
static constexpr bool is_specialized = true;
static TNL::Containers::StaticVector<Dim, Real> max(){
TNL::Containers::StaticVector<Dim, Real> res;
res = std::numeric_limits<Real>::max();
return res;
}
static TNL::Containers::StaticVector<Dim, Real> lowest(){
TNL::Containers::StaticVector<Dim, Real> res;
res = std::numeric_limits<Real>::lowest();
return res;
}
};
using namespace TNL;
int main()
{
TNL::Containers::Vector<TNL::Containers::StaticVector<3,int>, TNL::Devices::Host, size_t> a;
a.setSize(2);
a[0] = TNL::Containers::StaticVector<3,int>{5,-3,6};
a[1] = TNL::Containers::StaticVector<3,int>{8, 1, -5};
TNL::Containers::StaticVector<3,int> a1 = a[0];
TNL::Containers::StaticVector<3,int> a2 = a[1];
DBGVAR(a); // == ..\lookup_problem\main.cpp << 36 >> [[ a ]] ==> [ [ 5, -3, 6 ], [ 8, 1, -5 ] ]
DBGVAR((a1 < a2), (a2 < a1)); // == ..\lookup_problem\main.cpp << 37 >> [[ (a1 < a2) ]] ==> false
// == ..\lookup_problem\main.cpp << 37 >> [[ (a2 < a1) ]] ==> false
DBGVAR(min(a)); // == ..\lookup_problem\main.cpp << 39 >> [[ min(a) ]] ==> [ 5, -3, -5 ]
DBGVAR(min(a1,a2), min(a2,a1));// == ..\lookup_problem\main.cpp << 40 >> [[ min(a1,a2) ]] ==> [ 5, -3, 6 ]
// == ..\lookup_problem\main.cpp << 40 >> [[ min(a2,a1) ]] ==> [ 8, 1, -5 ]
DBGVAR(TNL::min(a1, a2)); // == ..\lookup_problem\main.cpp << 42 >> [[ TNL::min(a1, a2) ]] ==> [ 5, -3, -5 ]
return 0;
}
```
The first problem is the comparison of `a1` and `a2`. If the `(a1 < a2)` is false the `(a2 < a1)` must be true. However in both cases, the result is false, which is incorrect. The comparison utilizes the `StaticCompare::LT` defined as:
```c++
__cuda_callable__
static bool LT( const T1& a, const T2& b )
{
TNL_ASSERT_EQ( a.getSize(), b.getSize(), "Sizes of expressions to be compared do not fit." );
for( int i = 0; i < a.getSize(); i++ )
if( ! (a[ i ] < b[ i ]) )
return false;
return true;
}
```
This function does not realize a suitable comparison, e.g., lexicographical.
Secondly, there is a difference in call of min. Both `min(a)` and `TNL::min(a1, a2)` utilize `StaticBinaryExpressionTemplate< ET1, ET2, Min >`, which results in retuning a vector with minimum in each element separately (which is awesome). However, `min(a1, a2)` `min` from stl is called (the `std::min` is prioritized over `TNL::Containers::Expressions::min`) and it employs `StaticCompare::LT` through `operator<`. This problem is solved by removing `using namespace std` (which is partialy my mistake, but worth mentioning). The incorrect implementation of `StaticCompare::LT` results in dependency of the result on the order of the arguments of `min`.
The macro `DBGVAR` is defined in the [GTMesh library](https://mmg-gitlab.fjfi.cvut.cz/gitlab/jakubec/GTMesh).Jakub KlinkovskýJakub Klinkovskýhttps://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/76Grids todo list2020-06-24T12:36:35ZJakub KlinkovskýGrids todo listContinuing #52...
- [ ] use `.vti` files for the storage of grids, drop `TNLReader`
- [ ] `getEntityIndex()` should be removed from grid - users should call `entity.getIndex()`
- [ ] `getEntity()` and `getEntitiesCount()` should have `int Dimension` template parameter
- [ ] `isBoundaryEntity()` should be moved from `GridEntity` to `Grid` - it is not only an entity attribute, it is always bound to the particular mesh.
Generally, entities might be shared between multiple submeshes, so the method does not make sense in the general interface.
There might also be read-only views for partitions of the mesh (see vienna-grid).
- [ ] the `getMesh()` method should be removed from `GridEntity` - for the same reason as `isBoundaryEntity()`
(it is also an optimization because the size of the entity structure will be smaller)
- [ ] `getCenter()` and `getMeasure()` should be plain functions taking a `Mesh` and `MeshEntity` as parameters.
This is because general MeshEntity stores only indices of the subvertices, the points have to be accessed via Mesh class.
See also [Effective C++ Item 23 Prefer non-member non-friend functions to member functions](https://stackoverflow.com/questions/5989734/effective-c-item-23-prefer-non-member-non-friend-functions-to-member-functions) and the [Open-closed principle](https://en.wikipedia.org/wiki/Open%E2%80%93closed_principle).Continuing #52...
- [ ] use `.vti` files for the storage of grids, drop `TNLReader`
- [ ] `getEntityIndex()` should be removed from grid - users should call `entity.getIndex()`
- [ ] `getEntity()` and `getEntitiesCount()` should have `int Dimension` template parameter
- [ ] `isBoundaryEntity()` should be moved from `GridEntity` to `Grid` - it is not only an entity attribute, it is always bound to the particular mesh.
Generally, entities might be shared between multiple submeshes, so the method does not make sense in the general interface.
There might also be read-only views for partitions of the mesh (see vienna-grid).
- [ ] the `getMesh()` method should be removed from `GridEntity` - for the same reason as `isBoundaryEntity()`
(it is also an optimization because the size of the entity structure will be smaller)
- [ ] `getCenter()` and `getMeasure()` should be plain functions taking a `Mesh` and `MeshEntity` as parameters.
This is because general MeshEntity stores only indices of the subvertices, the points have to be accessed via Mesh class.
See also [Effective C++ Item 23 Prefer non-member non-friend functions to member functions](https://stackoverflow.com/questions/5989734/effective-c-item-23-prefer-non-member-non-friend-functions-to-member-functions) and the [Open-closed principle](https://en.wikipedia.org/wiki/Open%E2%80%93closed_principle).Jakub KlinkovskýJakub Klinkovskýhttps://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/75Fix/improve the implementation of mesh entity orientations2020-06-24T12:34:52ZJakub KlinkovskýFix/improve the implementation of mesh entity orientationsCurrently it is untested and inefficient because it is based on storing the whole subvertex permutations for each entity. We need to better understand what information is needed for FVM and improve the implementation.Currently it is untested and inefficient because it is based on storing the whole subvertex permutations for each entity. We need to better understand what information is needed for FVM and improve the implementation.Jakub KlinkovskýJakub Klinkovskýhttps://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/74Implement MeshView and maybe DistributedMeshView2020-06-24T12:30:48ZJakub KlinkovskýImplement MeshView and maybe DistributedMeshViewThere should be support for sub-configurations, so that light views could be initialized by a full mesh, e.g. a view including only cells, vertices and links from cells to their subvertices. The same can be done with `Mesh`'s copy-constructor.There should be support for sub-configurations, so that light views could be initialized by a full mesh, e.g. a view including only cells, vertices and links from cells to their subvertices. The same can be done with `Mesh`'s copy-constructor.Jakub KlinkovskýJakub Klinkovskýhttps://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/72Simplify interface of loadDistributedMesh and distributeMesh2020-06-21T20:59:52ZJakub KlinkovskýSimplify interface of loadDistributedMesh and distributeMeshThe `loadDistributedMesh` and `distributeMesh` functions were written specifically for the distributed grid, but they don't match the interface of the general distributed mesh.
- only the distributed mesh should be passed to `loadDistributedMesh`, the local mesh can be obtained with `mesh.getLocalMesh()` (at least for the distributed mesh, distributed grid has an inverse relation with the local grid)
- `distributeMesh` should not be used at all with a general distributed mesh (decomposition is done by `tnl-decompose-mesh`, then it is loaded with `PVTUReader`). Try to merge `distributeMesh` for grids with the overload of `loadDistributedMesh` for grids.The `loadDistributedMesh` and `distributeMesh` functions were written specifically for the distributed grid, but they don't match the interface of the general distributed mesh.
- only the distributed mesh should be passed to `loadDistributedMesh`, the local mesh can be obtained with `mesh.getLocalMesh()` (at least for the distributed mesh, distributed grid has an inverse relation with the local grid)
- `distributeMesh` should not be used at all with a general distributed mesh (decomposition is done by `tnl-decompose-mesh`, then it is loaded with `PVTUReader`). Try to merge `distributeMesh` for grids with the overload of `loadDistributedMesh` for grids.Jakub KlinkovskýJakub Klinkovskýhttps://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/71Overload functions l2Norm and lpNorm for 1D StaticVector2020-06-21T20:52:22ZJakub KlinkovskýOverload functions l2Norm and lpNorm for 1D StaticVectorFor 1D vectors, the `l2Norm` and `lpNorm` functions are equivalent to taking the absolute value of the 0-th component, so there should be overloads avoiding the expensive calls to `TNL::sqrt` and `TNL::pow`.
Then remove `getVectorLength` from [getEntityMeasure.h](https://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/blob/develop/src/TNL/Meshes/Geometry/getEntityMeasure.h#L53-68).For 1D vectors, the `l2Norm` and `lpNorm` functions are equivalent to taking the absolute value of the 0-th component, so there should be overloads avoiding the expensive calls to `TNL::sqrt` and `TNL::pow`.
Then remove `getVectorLength` from [getEntityMeasure.h](https://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/blob/develop/src/TNL/Meshes/Geometry/getEntityMeasure.h#L53-68).Jakub KlinkovskýJakub Klinkovskýhttps://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/70Implement parsing of binary data in VTKReader2020-06-21T20:40:38ZJakub KlinkovskýImplement parsing of binary data in VTKReaderThe `VTKReader` (for the legacy VTK format) can read only ASCII data, but `VTKWriter` can write ASCII or binary.The `VTKReader` (for the legacy VTK format) can read only ASCII data, but `VTKWriter` can write ASCII or binary.Jakub KlinkovskýJakub Klinkovskýhttps://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/69Implement parsing of ASCII arrays in the XMLVTK class2020-06-21T20:36:42ZJakub KlinkovskýImplement parsing of ASCII arrays in the XMLVTK classThis is not finished in !65.This is not finished in !65.Jakub KlinkovskýJakub Klinkovskýhttps://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/68Remove template parameter `isBinary_` from SparseMatrixRowView2020-06-20T09:32:04ZJakub KlinkovskýRemove template parameter `isBinary_` from SparseMatrixRowViewThe method `SparseMatrixRowView::isBinary()` can be implemented by checking if `RealType` is `bool`, the same way as in `SparseMatrix`.The method `SparseMatrixRowView::isBinary()` can be implemented by checking if `RealType` is `bool`, the same way as in `SparseMatrix`.Tomáš OberhuberTomáš Oberhuberhttps://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/63Loading of the matrix circuit5M2020-06-20T09:33:14ZLukáš ČejkaLoading of the matrix circuit5MThe matrix circuit5M.mtx from the Florida Matrix Market takes days to load for some reason.
Further investigation is needed.The matrix circuit5M.mtx from the Florida Matrix Market takes days to load for some reason.
Further investigation is needed.Lukáš ČejkaLukáš Čejkahttps://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/62Recording errors into logs2020-04-01T07:19:02ZLukáš ČejkaRecording errors into logsCurrently, any input into logs consists of hardware information, or Benchmark results.
However, if a Matrix fails to be allocated, there needs to be some sort of way to record this in the logs.Currently, any input into logs consists of hardware information, or Benchmark results.
However, if a Matrix fails to be allocated, there needs to be some sort of way to record this in the logs.Lukáš ČejkaLukáš Čejkahttps://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/56computeCompressedRowLengthsFromMtxFile( ... ) doesn't take account symmetric ...2020-06-20T09:33:45ZMatouš FenclcomputeCompressedRowLengthsFromMtxFile( ... ) doesn't take account symmetric formatFunction computeCompressedRowLengthsFromMtxFile( .. ) doesn't take account symmetric format in matrixReader.h. Computed rowLenghts are too big for ellpackSymmetric.
example:<pre>
/ 1 1 1 1 1 \
| 1 0 0 0 0 |
| 1 0 0 0 0 |
| 1 0 0 0 0 |
\ 1 0 0 0 0 /
</pre>
non-symmetric rowLenghts = 5;
symmetric rowLenghts = 1;
Function should compute only under-diagonal rowLenghts.Function computeCompressedRowLengthsFromMtxFile( .. ) doesn't take account symmetric format in matrixReader.h. Computed rowLenghts are too big for ellpackSymmetric.
example:<pre>
/ 1 1 1 1 1 \
| 1 0 0 0 0 |
| 1 0 0 0 0 |
| 1 0 0 0 0 |
\ 1 0 0 0 0 /
</pre>
non-symmetric rowLenghts = 5;
symmetric rowLenghts = 1;
Function should compute only under-diagonal rowLenghts.https://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/55Implement nD ParallelFor2019-11-18T21:36:51ZTomáš OberhuberImplement nD ParallelForImplement nD `ParallelFor` using `StaticArray< Dim, Index >` as:
```
template< typename Device, typename Mode >
struct ParallelFor{
template< int Dim,
typename Index,
typename Function,
typename... FunctionArgs >
static void exec( const StaticArray< Dim, Index >& start, const StaticArray< Dim, Index >& end, Function F, FunctionArgs.. args );
}
```Implement nD `ParallelFor` using `StaticArray< Dim, Index >` as:
```
template< typename Device, typename Mode >
struct ParallelFor{
template< int Dim,
typename Index,
typename Function,
typename... FunctionArgs >
static void exec( const StaticArray< Dim, Index >& start, const StaticArray< Dim, Index >& end, Function F, FunctionArgs.. args );
}
```https://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/53Bug in analytic functions2019-11-07T13:28:16ZMatouš FenclBug in analytic functionsAnalytic functions as input MeshFunction in Hamilton-Jacobi solver with MPI on GPU give bad values.
Generated function from *tnl-init* is passed into *tnl-direct-eikonal-solver*. TNL devides input MeshFunction into blocks, which number depends on number of processes. Devided MeshFunction that we get in file *tnlDirectEikonalProblem_impl.h* in function *Solve()* has invalid values.
Starting script is attached.
[tnl-run-dir-eik-solver](/uploads/f3a60b9e323f9bbecedbf3f23f3f1f9e/tnl-run-dir-eik-solver)Analytic functions as input MeshFunction in Hamilton-Jacobi solver with MPI on GPU give bad values.
Generated function from *tnl-init* is passed into *tnl-direct-eikonal-solver*. TNL devides input MeshFunction into blocks, which number depends on number of processes. Devided MeshFunction that we get in file *tnlDirectEikonalProblem_impl.h* in function *Solve()* has invalid values.
Starting script is attached.
[tnl-run-dir-eik-solver](/uploads/f3a60b9e323f9bbecedbf3f23f3f1f9e/tnl-run-dir-eik-solver)Tomáš OberhuberTomáš Oberhuberhttps://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/51Wrappers for STL compatibility2020-04-19T08:01:12ZTomáš OberhuberWrappers for STL compatibilityMake wrappers for easier porting of STL code to TNL. It means, for example, wrapper for TNL::Array which has the same methods as STL vector and so on.Make wrappers for easier porting of STL code to TNL. It means, for example, wrapper for TNL::Array which has the same methods as STL vector and so on.https://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/50MpiCommunicator::selectGPU does not work reliably when multiple nodes are sel...2019-10-12T10:22:43ZJakub KlinkovskýMpiCommunicator::selectGPU does not work reliably when multiple nodes are selectedThe computation of "node rank" is problematic - maybe `MPI_Get_processor_name` does not work reliably. Sometimes multiple ranks are assigned the same GPU.The computation of "node rank" is problematic - maybe `MPI_Get_processor_name` does not work reliably. Sometimes multiple ranks are assigned the same GPU.Jakub KlinkovskýJakub Klinkovskýhttps://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/45Parallelize segmented prefix-sum with OpenMP2019-08-14T20:39:50ZJakub KlinkovskýParallelize segmented prefix-sum with OpenMPThe implementation of segmented prefix-sum for `Host` is only sequential.The implementation of segmented prefix-sum for `Host` is only sequential.https://mmg-gitlab.fjfi.cvut.cz/gitlab/tnl/tnl-dev/-/issues/44Extended NDArray operations2019-08-14T12:24:52ZJakub KlinkovskýExtended NDArray operationsAfter !18 the following things remain to be implemented:
## Features
- [ ] implement generic assignment operator
- [x] support any value, device and index types
- [ ] support any permutation
- [ ] support copies to and from non-contiguous memory (e.g. subarrays)
- [ ] add support for different allocators (c.f. `Array` implementation)
## Applications
- [ ] storage for `VectorField` in TNL
- [ ] subarrays: writing 1D and 2D slices into VTK
## Operations
- [ ] `reduce_along_axis`, `reduce_along_axes` - generalized multireductions - see also https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/README.md?at=default&fileviewer=file-view-default#markdown-header-reduction-operations
- [ ] `apply_along_axis` - apply a function to all 1D slices along given axis (challenge: parallelization of outer or inner function)
- Note that unlike [numpy.apply_along_axis](https://docs.scipy.org/doc/numpy/reference/generated/numpy.apply_along_axis.html), the inner function cannot change the array dimension/shape.
- Note that the similar NumPy function, [apply_over_axes](https://docs.scipy.org/doc/numpy/reference/generated/numpy.apply_over_axes.html), is not applicable to NDArray because the slices along different axes have different type so a single function cannot be applied to them. Also, even in NumPy it is interesting only with the change of dimension/shape.
- [ ] reordering of ND arrays along any axis (currently hardcoded in `tnl-mhfem` only for one specific layout of dofs)
- [ ] other [NumPy array manipulation routines](https://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html) - logical re-shaping and transpose-like operations (i.e. return a view with different sizes or permutation of axes without changing the data)
- [ ] [Eigen geometrical operations on tensors](https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/README.md?at=default&fileviewer=file-view-default#markdown-header-geometrical-operations)
## Benchmarks
- [ ] compilation time depending on the number of dimensions, number of ndarrays in code, ...
- [ ] overhead of the indexing calculation for high-dimensional array
- [ ] operations
- [ ] comparison with [RAJA](https://github.com/LLNL/RAJA)
identity perm, set: 1D bandwidth: 9.4718 GB/s, 6D bandwidth: 8.52481 GB/s (9% difference)
identity perm, assign: 1D bandwidth: 11.503 GB/s, 6D bandwidth: 11.0063 GB/s (4.5% loss compared to 1D)
reverse perm, assign: 6D bandwidth: 9.58735 GB/s (13% loss compared to identity 6D)
- [ ] comparison with OpenFOAM - `ScalarField`, `VectorField`, `TensorField`, operations like tensor*vector (locally on mesh cells)After !18 the following things remain to be implemented:
## Features
- [ ] implement generic assignment operator
- [x] support any value, device and index types
- [ ] support any permutation
- [ ] support copies to and from non-contiguous memory (e.g. subarrays)
- [ ] add support for different allocators (c.f. `Array` implementation)
## Applications
- [ ] storage for `VectorField` in TNL
- [ ] subarrays: writing 1D and 2D slices into VTK
## Operations
- [ ] `reduce_along_axis`, `reduce_along_axes` - generalized multireductions - see also https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/README.md?at=default&fileviewer=file-view-default#markdown-header-reduction-operations
- [ ] `apply_along_axis` - apply a function to all 1D slices along given axis (challenge: parallelization of outer or inner function)
- Note that unlike [numpy.apply_along_axis](https://docs.scipy.org/doc/numpy/reference/generated/numpy.apply_along_axis.html), the inner function cannot change the array dimension/shape.
- Note that the similar NumPy function, [apply_over_axes](https://docs.scipy.org/doc/numpy/reference/generated/numpy.apply_over_axes.html), is not applicable to NDArray because the slices along different axes have different type so a single function cannot be applied to them. Also, even in NumPy it is interesting only with the change of dimension/shape.
- [ ] reordering of ND arrays along any axis (currently hardcoded in `tnl-mhfem` only for one specific layout of dofs)
- [ ] other [NumPy array manipulation routines](https://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html) - logical re-shaping and transpose-like operations (i.e. return a view with different sizes or permutation of axes without changing the data)
- [ ] [Eigen geometrical operations on tensors](https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/README.md?at=default&fileviewer=file-view-default#markdown-header-geometrical-operations)
## Benchmarks
- [ ] compilation time depending on the number of dimensions, number of ndarrays in code, ...
- [ ] overhead of the indexing calculation for high-dimensional array
- [ ] operations
- [ ] comparison with [RAJA](https://github.com/LLNL/RAJA)
identity perm, set: 1D bandwidth: 9.4718 GB/s, 6D bandwidth: 8.52481 GB/s (9% difference)
identity perm, assign: 1D bandwidth: 11.503 GB/s, 6D bandwidth: 11.0063 GB/s (4.5% loss compared to 1D)
reverse perm, assign: 6D bandwidth: 9.58735 GB/s (13% loss compared to identity 6D)
- [ ] comparison with OpenFOAM - `ScalarField`, `VectorField`, `TensorField`, operations like tensor*vector (locally on mesh cells)Jakub KlinkovskýJakub Klinkovský