@@ -1034,7 +1034,7 @@ The computational parameters that varied in the benchmark are:
Additionally, we used the BoomerAMG preconditioner \cite{Hypre:BoomerAMG} from the Hypre library \cite{Hypre:design1,Hypre:design2,Hypre:library} version 2.25.0.
The latter preconditioner is coupled with the BiCGstab implementation from the Hypre library.
Some other components of the solver also had to be changed due to compatibility with these two different BiCGstab implementations.
While the TNL implementation is not bound to any specific matrix format and the Ellpack format was used in the solver, the Hypre library requires the CSR format with specific conventions.\todo{Describe the conventions in \cref{chapter:linear systems} and leave a reference here.}
While the TNL implementation is not bound to any specific matrix format and the Ellpack format was used in the solver, the Hypre library requires the CSR format with specific conventions (see \cref{sec:distributed sparse matrix} for details).
Hence, the sparse matrix assembly uses slightly different procedures in the two configurations.
The implementation of a distributed sparse matrix in the Hypre library \cite{Hypre:library,Hypre:design1,Hypre:design2} is different from TNL in several aspects.
\inline{explain the assumed partition scheme}
\inline{describe the conventions used in the \texttt{hypre\_CSRMatrix} (the diagonal entry in each row is stored in the first place in the CSR format)}
\inline{describe the \texttt{hypre\_ParCSRMatrix} format (\ic{diag}, \ic{offd}, \texttt{col\_map\_offd})}
Solving a linear system on a distributed computing platform requires a suitable partitioning of the data among the MPI ranks.
The system is typically distributed in a row-wise manner such that the rows of the matrix and vectors are partitioned into non-overlapping ranges that are assigned to individual ranks.
A \emph{distributed data structure} is then used for combining the local data of each rank with information about the partitioning (e.g., assigning global indices).
While the implementation of a distributed vector is straightforward, data structures for a distributed sparse matrix that provides coupling between blocks of the partitioning may be designed in different ways.
This section describes data structures for distributed sparse matrices implemented in the TNL and Hypre libraries.
The implementation of a distributed sparse matrix in TNL~\cite{oberhuber:2021tnl} is closely bound to the distributed mesh described in \cref{sec:meshes:distributed}.
Each row of the global matrix corresponds to one \emph{degree of freedom} associated to an entity of the global mesh and the matrix partitioning is determined by the partitioning of the mesh.
The data owned by a particular rank is stored in a single local matrix represented by a sparse matrix data structure in TNL.
Each row and column of the local matrix corresponds to an entity of the local mesh:
\begin{itemize}
\item
Rows correspond to entities owned by the rank (i.e., not ghost entities).
\item
Columns may correspond to entities owned by the rank or to ghost entities.
Columns owned by the rank belong to the diagonal block of the global matrix and columns corresponding to ghost entities belong to the off-diagonal blocks.
\end{itemize}
It is important to realize that \emph{local indexing} is used in the local matrix, given by indices used in the local mesh.
Hence, a local matrix with $N_r$ rows and $N_c > N_r$ columns is stored such that the first $N_r$ columns represent the diagonal block and the remaining $N_c - N_r$ columns represent the off-diagonal blocks in a compact form (there are no gaps between the off-diagonal columns as there would be if the global indexing was used).
Global indices of an entry in the local matrix can be determined based on the global indices of the corresponding mesh entity.
As explained in \cref{sec:meshes:distributed}, local entities owned by the rank are contiguous and thus their global indices can be determined by adding an offset to the local indices, but global indices of ghost entities must be stored explicitly in an array.
The compact representation of the local matrix is advantageous for operations such as distributed sparse matrix--vector multiplication, which can be performed in two steps:
\begin{enumerate}
\item
Use the \ic{DistributedMeshSynchronizer} class described in \cref{sec:meshes:distributed} to synchronize data corresponding to ghost entities in the input vector.
Note that the distributed vector contains $N_c$ local elements where the last $N_c - N_r$ values correspond to ghost entities.
\item
Perform a sparse matrix--vector multiplication with the $N_r \times N_c$ local matrix and the local vector of $N_c$ elements to compute the first $N_r$ elements of the output vector.
The $N_c - N_r$ ghost elements in the output vector can be synchronized with \ic{DistributedMeshSynchronizer} if needed.
\end{enumerate}
The implementation of a distributed sparse matrix in the Hypre library~\cite{Hypre:library,Hypre:design1,Hypre:design2} is different from TNL in several aspects.
Firstly, the data structure is purely algebraic in the sense that it provides all necessary information without relying on a mesh data structure.
Internally, Hypre uses the CSR format with specific conventions for the representation of sparse matrices:
\begin{itemize}
\item
Each matrix block is represented by the \ic{hypre_CSRMatrix} structure.
An important convention is that the diagonal entry in each row must be stored as the first value of the row in the CSR format.
\item
The \ic{hypre_ParCSRMatrix} structure represents a distributed matrix.
Among others, it contains the attributes \ic{diag}, \ic{offd}, and \ic{col_map_offd}.
\item
The \ic{diag} and \ic{offd} attributes are instances of the \ic{hypre_CSRMatrix} structure, which represent the diagonal and off-diagonal block of the local matrix, respectively.
The former is typically a square matrix for which data is always available in the local part of a distributed vector.
\item
The \ic{col_map_offd} attribute is an array that provides global indices for the off-diagonal block.
It is indexed by columns of the \ic{offd} matrix.
\end{itemize}
The exchange of non-local data in Hypre is implemented based on the \emph{assumed partition scheme}\cite{Baker2006} which allows for a scalable determination of global information.
Overall, it can be understood as an algebraic generalization of the indexing with ghost entities in TNL.
author={Baker, A. H. and Falgout, R. D. and Yang, U. M.},
journal={Parallel Computing},
title={An assumed partition algorithm for determining processor inter-communication},
year={2006},
issn={0167-8191},
number={5},
pages={394--414},
volume={32},
abstract={The recent advent of parallel machines with tens of thousands of processors is presenting new challenges for obtaining scalability. A particular challenge for large-scale scientific software is determining the inter-processor communications required by the computation when a global description of the data is unavailable or too costly to store. We present a type of rendezvous algorithm that determines communication partners in a scalable manner by assuming the global distribution of the data. We analyze the algorithm theoretically and demonstrate its scaling properties on up to 32,768 processors of BlueGene/L in the context of determining communication patterns for a matrix-vector multiply in the hypre software library. Our algorithm is very general and is applicable to a variety of situations in parallel computing.},