3.[Flexible reduction in matrix rows](#flexible_reduction_in_matrix_rows)
4.[Matrix-vector product](#matrix_vector_product)
5.[Matrix I/O operations](#matrix_io_operations)
3.[Matrix view](#matrix_view)
4.[Flexible reduction in matrix rows](#flexible_reduction_in_matrix_rows)
5.[Matrix-vector product](#matrix_vector_product)
6.[Matrix I/O operations](#matrix_io_operations)
## Overview of matrix types <a name="overview_of_matrix_types"></a>
@@ -107,6 +108,38 @@ In this table:
## Allocation and setup of different matrix types <a name="allocation_and_setup_of_different_matrix_types"></a>
There are several ways how to create new matrix:
1.**Initializer lists** allow to create matrix from the C++ initializer lists. The matrix elements must be therefore encoded in the source code and so it is useful for rather smaller matrices. Methods and constructors with initializer lists are user friendly and simple to use. It is a good choice for tool problems with small matrices.
2.**STL map** can be used for creation of sparse matrices only. The user first insert all matrix elements together with their coordinates into `std::map` based on which the sparse matrix is created in the next step. It is simple and user friendly approach suitable for creation of large matrices. An advantage is that we do not need to know the distribution of the matrix elements in matrix rows in advance like we do in other ways of matrix construction. This makes the use of STL map suitable for combining of sparse matrices in TNL with other numerical packages. However, the sparse matrix is constructed on the host and then copied on GPU if necessary. Therefor, this approach is not a good choice if fast and efficient matrix construction is required.
3.**Methods `setElement` and `addElement` called from the host** allows to change particular matrix elements. The methods can be called from host even for matrices allocated on GPU. In this case, however, the matrix elements are transferred on GPU one by one which is very inefficient. If the matrix is allocated on the host system (CPU), the efficiency is good. In case of sparse matrices, one must set row capacities (i.e. maximal number of nonzero elements in each row) before using these methods. If the row capacity is exceeded, the matrix has to be reallocated and all matrix elements are lost.
4.**Methods `setElement` and `addElement` called from native device** allows to do efficient matrix elements setup even on devices (GPUs). In this case, the methods must be called from a GPU kernel or a lambda function combined with parallel for (\ref TNL::Algorithms::ParallelFor). The user get very good performance even when manipulating matrix allocated on GPU. On the other hand, only data structures allocated on GPUs can be used in the kernel or lambda function. The the matrix can be accessed in the GPU kernel or lambda function by means of [matrix view](#matrix_view) or the shared pointer (\ref TNL::Pointers::SharedPointer).
5.**Method `getRow` combined with `ParallelFor`** is very simillar to the previous one. The difference is that with first fetch helper object called *matrix row* which is linked to particular matrix row. Using methods of this object, one may change the matrix elements in given matrix row. An advantage is that the access to the matrix row is resolved only once for all elements in the row. In some more sophisticated sparse matrix formats, this can be nontrivial operation and this approach may slightly improve the performance. Another advantage for sparse matrices is that we access the matrix elements based on their *local index* in the row which is something like a rank of the nonzero element in the row. This is more efficient than adressing the matrix elements by the column indexes which requires searching in the matrix row. So this may significantly improve the performance of setup of sparse matrices. When it comes to dense matrices, there should not be great difference in performance compared to use of the methods `setElement` and `getElement`. Note that when the method is called from GPU kernel or lambda function , only data structures allocated on GPU can be accessed and the matrix must be made accessible by the means of.
6.**Method `forRows`** this approach is very similar to the previous one but it avoids using `ParallelFor` and necessity of passing the matrix to GPU kernels by matrix view or shared pointers.
The following table shows pros and cons of particular mathods:
| | | Allows accessing only data allocated on the same device/memory space. |
| | | Use of matrix local indexes can be less intuitive. |
| **forRows** | Best efficiency for sparse matrices. | Requires setting of row capacities. |
| | Avoid use of matrix view or shared pointer in kernels/lambda function. | Requires writting GPU kernel or lambda function. |
| | | Allows accessing only data allocated on the same device/memory space. |
| | | Use of matrix local indexes is less intuitive. |
Though it may seem that the later methods come with more cons than pros they offer much higher performance and we believe they even them are still very user friendly. On the other hand, if the matrix setup performance is not a priority the use the simple but slow method can still be a good choice.