Approximation of the point injection boundary condition at $\vec x =\vec0$ on (a) triangular and (b) tetrahedral meshes.
Adapted from \cite{fucik:2019NumDwarf}.
@@ -982,7 +982,7 @@ The values of all parameters used in this work are summarized in \cref{tab:mcwhd
\subsection{Verification results}
For each mesh $\mathcal K_h$, we define the error $E_{h,S_n}= S_{n,\mathrm{ex}}(t_{\max})- S_{n,h}(t_{\max})$ in terms of the injected phase saturation $S_n$ as the difference between the semi-analytical solution $S_{n,\mathrm{ex}}$ and numerical solution $S_{n,h}$ on the mesh $\mathcal K_h$ at time $t_{\max}$.
The $L_p$ norm in $\Omega\subset\mathbb R^D$ of the error $\norm{E_{h,S_n}}_p$ is evaluated for $p \in\{1,2\}$ as
The $L_p$ norm $\norm{E_{h,S_n}}_p$ is evaluated in $\Omega\subset\mathbb R^D$for $p \in\{1,2\}$ as
\begin{equation}
\label{eq:mhfem:norm additivity}
\norm{E_{h,S_n}}_p = \left( \int\limits_\Omega\abs{E_{h,S_n}(\vec x)}^p \dif\vec x \right)^\frac1p
@@ -990,18 +990,11 @@ The $L_p$ norm in $\Omega \subset \mathbb R^D$ of the error $\norm{E_{h,S_n}}_p$
\end{equation}
The integrals over mesh cells $K \in\mathcal K_h$ are first transformed to integrals over the reference cell, and then the Fubini theorem is applied to obtain one-dimensional integrals that are computed numerically using the Gauss--Kronrod quadrature (G7--K15) with adaptive interval subdivision, which is implemented in the \CPPeleven library \ic{qdt}~\cite{munoz:2014,qdt}.
Given two approximate solutions $S_{n,h_1}$ and $S_{n,h_2}$ on meshes $\mathcal K_{h_1}$ and $\mathcal K_{h_2}$, respectively, the order of convergence of the numerical scheme with respect to $S_n$ in the $L_p$ norm can be approximated as
Given two approximate solutions $S_{n,h_1}$ and $S_{n,h_2}$ on meshes $\mathcal K_{h_1}$ and $\mathcal K_{h_2}$, respectively, the order of convergence of the numerical scheme with respect to $S_n$ in the $L_p$ norm can be approximated using an experimental order of convergence as
where the acronym $EOC$ stands for experimental order of convergence.
In order to investigate the convergence and accuracy of the numerical scheme using the experimental order of convergence, several simulations were computed on a series of triangular and tetrahedral meshes.
The meshes are the same as those used in \cite{fucik:2019NumDwarf} and their properties are listed in \cref{tab:meshes}.
All simulations were computed in \ic{double} precision.
\Cref{tab:mhfem:EOC 2D,tab:mhfem:EOC 3D} show the resulting $L_1$ and $L_2$ norms of the error, the corresponding experimental orders of convergence, as well as the mesh size parameters $h$ and constant time steps $\tau$ that were used in the simulations.
The results of the numerical analysis indicate that the numerical scheme converges with the first order of accuracy in both dimensions and both $L_1$ and $L_2$ norms.
\begin{table}[!tb]
\caption{
@@ -1021,6 +1014,15 @@ The results of the numerical analysis indicate that the numerical scheme converg
\input{./data/mcwhdd/eoc_3D.tex}
\end{table}
% HACK: enlarge the \textheight to allow one more line
\enlargethispage{\baselineskip}
In order to investigate the convergence and accuracy of the numerical scheme using the experimental order of convergence, several simulations were computed on a series of triangular and tetrahedral meshes.
The meshes are the same as those used in \cite{fucik:2019NumDwarf} and their properties are listed in \cref{tab:meshes}.
All simulations were computed in \ic{double} precision.
\Cref{tab:mhfem:EOC 2D,tab:mhfem:EOC 3D} show the resulting $L_1$ and $L_2$ norms of the error, the corresponding experimental orders of convergence, as well as the mesh size parameters $h$ and constant time steps $\tau$ that were used in the simulations.
The results of the numerical analysis indicate that the scheme converges with the first order of accuracy in both dimensions and both norms.
\section{Benchmarking methodology}
\label{sec:mhfem:benchmarking methodology}
@@ -1031,8 +1033,7 @@ The computational parameters that varied in the benchmark are:
\item
\emph{Linear system solver}.
First, we used the BiCGstab method implemented in TNL with the Jacobi preconditioner.
Additionally, we used the BoomerAMG preconditioner \cite{Hypre:BoomerAMG} from the Hypre library \cite{Hypre:design1,Hypre:design2,Hypre:library} version 2.25.0.
The latter preconditioner is coupled with the BiCGstab implementation from the Hypre library.
Additionally, we used the BoomerAMG preconditioner \cite{Hypre:BoomerAMG} from the Hypre library \cite{Hypre:design1,Hypre:design2,Hypre:library} version 2.25.0 together with the BiCGstab method implemented in the Hypre library.
Some other components of the solver also had to be changed due to compatibility with these two different BiCGstab implementations.
While the TNL implementation is not bound to any specific matrix format and the Ellpack format was used in the solver, the Hypre library requires the CSR format with specific conventions (see \cref{sec:distributed sparse matrix} for details).
Hence, the sparse matrix assembly uses slightly different procedures in the two configurations.
@@ -1045,7 +1046,7 @@ The computational parameters that varied in the benchmark are:
\emph{Programming framework for CPU parallelization}.
For the TNL implementation, we tested two approaches to CPU parallelization within a single node.
We used the shared-memory multi-thread approach via the OpenMP framework \cite{OpenMP} and also the general multi-process approach based on MPI \cite{mpi:3.1}.
The Hypre implementation was used only with the general MPI-based parallelization.
The Hypre implementation was used only with the MPI-based parallelization.
\end{itemize}
To ensure comparable results obtained with different preconditioners for the BiCGstab method, care must be taken to select suitable stopping criteria for the iterative algorithm.
@@ -1053,24 +1054,24 @@ The linear system solvers implemented in TNL use left-preconditioning, whereas t
Recall that right-preconditioned methods operate with the \emph{original} residual and right-hand-side vectors, whereas left-preconditioned methods operate with \emph{preconditioned} residual and right-hand-side vectors.
Hence, assuming the Jacobi preconditioner, the right-hand-side vector is scaled with the matrix diagonal entries during left-preconditioning and the stopping criterion based on the norm of the residual vector divided by the norm of the right-hand-side vector cannot be used with the same threshold as for the unpreconditioned or right-preconditioned system.
To remedy this problem, we assemble the linear system with scaled rows to obtain ones on the main diagonal of the matrix (i.e., the Jacobi preconditioning is applied during the matrix assembly).
This scales the elements of the right-hand-side vector to the same magnitude in both preconditioning techniques and allows the stopping criterion to be used with the same threshold (\np{e-11} in this study) in both cases.
This scales the elements of the right-hand-side vector to the same magnitude in both preconditioning techniques and allows the stopping criterion to be used with the same threshold (\np{e-11} in this study).
Besides the stopping criterion, the BiCGstab method requires no other configuration.
The following common parameters were set for the BoomerAMG preconditioner:
\begin{itemize}
\begin{itemize}[itemsep=0ex]
\item The PDE system size (i.e., $n$ in \cref{eq:NumDwarf}) was set to 2: \\\ic{HYPRE_BoomerAMGSetNumFunctions( precond, 2 );}
\item Aggressive coarsening with 1 level: \ic{HYPRE_BoomerAMGSetAggNumLevels( precond, 1 );}
\item For both 2D and 3D problems, the strength threshold was set to 0.25: \\\ic{HYPRE_BoomerAMGSetStrongThreshold( precond, 0.25 );}
\item The \ic{extended+i} interpolation operator with truncation to at most 4 elements per row \\\ic{HYPRE_BoomerAMGSetInterpType( precond, 6 );} and \ic{HYPRE_BoomerAMGSetPMaxElmts( precond, 4 );}
\end{itemize}
Additionally, the following parameters were set for CPU computations:
\begin{itemize}
\begin{itemize}[itemsep=0ex]
\item The HMIS coarsening algorithm: \ic{HYPRE_BoomerAMGSetCoarsenType( precond, 10 );}
\item The truncation factor for the interpolation was set to 0.3: \\\ic{HYPRE_BoomerAMGSetTruncFactor( precond, 0.3 );}
\end{itemize}
The following parameters were set specifically for GPU computations:
\begin{itemize}
\begin{itemize}[itemsep=0ex]
\item The PMIS coarsening algorithm: \ic{HYPRE_BoomerAMGSetCoarsenType( precond, 8 );}
\item The $\ell_1$-scaled Jacobi smoother: \ic{HYPRE_BoomerAMGSetRelaxType( precond, 18 );}
\item The 2-stage \ic{extended+e} interpolation in matrix-matrix form was used on the levels of aggressive coarsening: \ic{HYPRE_BoomerAMGSetAggInterpType( precond, 7 );}