The content of this section is based on the author's previous thesis \cite{klinkovsky:2017thesis} and the paper \cite{fucik:2019NumDwarf}, which describe \emph{NumDwarf}, a numerical scheme for the solution of a system of partial differential equations in a general coefficient form.
\emph{NumDwarf} is based on the mixed-hybrid finite element method (MHFEM) \cite{brezzi:1991mixed} and it was originally developed for simulating multicomponent transport phenomena in porous media (see \cite{solovsky2020,askar2021} for several recent applications), but it can be used for any problem whose governing equations can be written in a compatible form.
\emph{NumDwarf} is based on the mixed-hybrid finite element method (MHFEM) \cite{brezzi:1991mixed} and it was originally developed for simulating multicomponent flow and transport phenomena in porous media (see \cite{solovsky2020,askar2021} for several recent applications), but it can be used for any problem whose governing equations can be written in a compatible form.
Although the implementation of the numerical scheme has seen only minor modifications since the publication of the paper, the computational aspects of the author's solver have been greatly extended.
Most notably, the solver supports MPI computations on distributed unstructured meshes.
@@ -27,7 +27,7 @@ The numerical scheme is implemented for a PDE system written in the form
for $i =1, \ldots, n$, where $\vec Z =[Z_1, \ldots, Z_n]^T$ is the vector of unknown functions depending on spatial coordinates $\vec x \in\Omega\subset\mathbb R^D$ and time $t \in[0, t_{\max}]$, where $D \in\{1, 2, 3\}$ denotes the spatial dimension, $\Omega$ is a polyhedral domain, and $t_{\max}$ is the final simulation time.
The remaining symbols $\matrix N =[N_{i,j}]_{i,j=1}^n$, $\vec u =[\vec u_{i,j}]_{i,j=1}^n$, $\vec m =[m_i]_{i=1}^n$, $\matrix D =[\tensor D_{i,j}]_{i,j=1}^n$, $\vec w =[\vec w_i]_{i=1}^n$, $\vec a =[\vec a_{i,j}]_{i,j=1}^n$, $\vec r =[r_{i,j}]_{i,j=1}^n$, $\vec f =[f_i]_{i=1}^n$ are given problem-specific coefficients.
Note that the coefficients may, in general, depend on the vector of unknown functions $\vec Z$.
The idea behind the notation is that coefficients dependent on $\vec Z$ in non-linear terms are evaluated by the numerical scheme using the values from the current time level, whereas the symbols $Z_j$ in \cref{eq:NumDwarf} are treated implicitly.
The idea behind the notation is that coefficients dependent on $\vec Z$ in non-linear terms are evaluated by the numerical scheme using the values from the previous time level, whereas the symbols $Z_j$ in \cref{eq:NumDwarf} are treated implicitly.
See \cite{fucik:2019NumDwarf,klinkovsky:2017thesis} for details.
The system of equations~\eqref{eq:NumDwarf} must be supplemented by a suitable initial condition
@@ -55,9 +55,9 @@ In the following paragraphs, we briefly summarize the numerical approach for the
\emph{NumDwarf} is based on the combination of the mixed-hybrid finite element method (MHFEM) \cite{brezzi:1991mixed} and the discontinuous Galerkin method \cite{arnold:2002} for spatial discretization, the Euler method for temporal discretization and the semi-implicit approach of the frozen coefficients method for the linearization in time.
The scheme is stabilized with upwind and mass-lumping techniques and a modification of MHFEM described in \cite{fucik:2019NumDwarf} can be employed to solve problems with vanishing diffusion.
The scheme is locally conservative and leads to the solution of one linear system per time step.
The matrix of each linear system is generally non-symmetric due to the upwind stabilization.
Even when both tensor coefficients $\tensor N$ and $\tensor D$ in \cref{eq:NumDwarf} are symmetric, the upwind stabilization causes the matrix of each linear system to be non-symmetric.
In this work, we use conforming unstructured homogeneous meshes denoted as $\mathcal K_h$for the discretization of the domain and the lowest-order Raviart--Thomas--Nédélec function space $\mathbf{RTN}_0(\mathcal K_h)$.
In this work, we use conforming unstructured homogeneous meshes denoted as $\mathcal K_h$ and the lowest-order Raviart--Thomas--Nédélec space $\mathbf{RTN}_0(\mathcal K_h)$ for spatial discretization.
The set of faces of the mesh $\mathcal K_h$ will be denoted as $\mathcal E_h$ and the set of faces of an individual element $K \in\mathcal K_h$ will be denoted as $\mathcal E_K$.
\subsection{Two-phase flow in porous media}
@@ -79,7 +79,7 @@ The two-phase incompressible flow in porous media can be described macroscopical
\end{align}
\end{subequations}
where $\Phi$~[-] is the material porosity, $K$~[\si{\metre\squared}] is the material permeability, and $\vec g$~[\si{\metre\per\second\squared}] is the gravitational acceleration vector.
Symbols related to the phases in the system are denoted by subscripts $\alpha\in\{w,n\}$.
Subscripts $\alpha\in\{w,n\}$ are used to denote symbols related to the \emph{wetting} and \emph{non-wetting} phase in the system, respectively.
In particular, the quantities $\rho_\alpha$~[\si{\kilogram\per\metre\cubed}], $S_\alpha$~[-], $\lambda_\alpha$~[\si{\metre\second\per\kilogram}], and $p_\alpha$~[\si{\pascal}] represent the $\alpha$-phase density, saturation, mobility, and pressure, respectively.
The densities $\rho_\alpha$ are assumed to be constant, the mobilities are defined as $\lambda_\alpha= k_{r,\alpha}(S_\alpha)/\mu_\alpha$, where $k_{r,\alpha}(S_\alpha)$~[-] is the relative $\alpha$-phase permeability given by a specific empirical model and $\mu_\alpha$~[\si{\kilogram\per\metre\per\second}] is the $\alpha$-phase dynamic viscosity, and the quantities $S_\alpha$, $p_\alpha$ for $\alpha\in\{w,n\}$ are the four unknown variables in \cref{eq:mhfem:two-phase flow}.
The model is closed by algebraic constraints for the phase pressures and saturations
@@ -188,7 +188,8 @@ In this section, we present the results of the generalized McWhorter--Sunada pro
The geometrical configuration of the problem and boundary conditions are shown in \cref{fig:mcwhdd:domain}.
The computational domain $\Omega\subset\mathbb R^D$ is a unit square in the two-dimensional case and a unit cube in the three-dimensional case.
A homogeneous porous medium is assumed in the domain and the effect of gravity is neglected.
Initially, at time $t =0$, the entire domain is filled with water at the reference pressure $p_w = p_{\mathrm{ref}}$ and $p_n$ is calculated from $p_w$ and the constant initial water saturation $S_{w,\mathrm{ini}}$.
Initially, at time $t =0$, the entire domain is filled mostly with water and a small amount of the non-wetting phase.
The initial conditions are specified in terms of the reference pressure $p_w = p_{\mathrm{ref}}$ and the constant initial water saturation $S_{w,\mathrm{ini}}$, from which the remaining primary variable $p_n$ is calculated.
The non-aqueous phase liquid (NAPL) is being injected into the domain since $t =0$ until the final time $t_{\max}$ through a point source located in the origin $\vec x =\vec0$.
The volumetric rate of injection, denoted as $Q_0$~[\si{\metre\tothe{D}\per\second}], is prescribed by
\begin{equation}
@@ -362,7 +363,7 @@ The following parameters were set specifically for GPU computations:
There are many other configurable parameters listed in the Hypre documentation \cite{Hypre:documentation} that might have an effect on the performance, but they were not investigated in this benchmark.
To reduce the computational time with the Hypre implementation, the BoomerAMG preconditioner is not updated in every time step and instead its setup is reused multiple times as long as the number of iterations necessary for the BiCGstab method to converge does not increase significantly.
The specific heuristics used is that the preconditioner from the previous time step is reused, as long as the number of iterations needed in the previous time step is less then 5 plus the number of iterations needed when the preconditioner was last updated.
The specific heuristics used is that the preconditioner is reused, as long as the number of iterations needed in the previous time step is less then 5 plus the number of iterations needed when the preconditioner was last updated.
The computations were performed on the same hardware as described in \cref{sec:meshes:benchmarking methodology}, and also the software versions and compiler parameters were the same.
The MPI version is OpenMPI 4.0.5 with CUDA support and system-specific drivers.
@@ -117,7 +117,7 @@ Similarly to multigrid, recent development of domain decomposition methods tends
To conclude the section, we mention some general techniques that can be used when multiple preconditioners accelerating the solution of a linear system are available.
Rather than formulating a new preconditioner that combines the features of multiple preconditioners, they can be applied together within a \emph{multi-preconditioned} iterative method \cite{bridson:2006,spillane:2016,greif:2017}.
More generally, when solving sequences of similar linear systems, such as those arising from the discretization of transient partial differential equations, it may be possible to reuse information from previously computed problems, including preconditioners \cite{simoncini:2007,carr2021recycling,amritkar2015recycling,gaul2014recycling}.
More generally, when solving sequences of similar linear systems, such as those arising from the discretization of time-dependent partial differential equations, it may be possible to reuse information from previously computed problems, including preconditioners \cite{simoncini:2007,carr2021recycling,amritkar2015recycling,gaul2014recycling}.