Commit c6625524 authored by Jakub Klinkovský's avatar Jakub Klinkovský
Browse files

copy-edit - addressed comments by TO and RF

parent de77037c
Loading
Loading
Loading
Loading
+4 −4
Original line number Diff line number Diff line
@@ -84,11 +84,11 @@ Given a velocity field $\vec v(\vec x,t)$ satisfying the divergence-free conditi

The Navier--Stokes equations \eqref{eq:lbm-mhfem:ns} are solved numerically by the lattice Boltzmann method described in \cref{chapter:LBM}.
The transport equation \eqref{eq:ADE} is incorporated into the mathematical framework of the \emph{NumDwarf} solver described in \cref{chapter:MHFEM}.
Both \cref{eq:ADE:conservative} and \cref{eq:ADE:non-conservative} can be converted to the form of \cref{eq:NumDwarf} by taking $n=1$, $\vec Z = [\phi]$, $\matrix N = [1]$, $\vec m = [1]$, $\vec D = [D_0]$, $\vec w = \vec 0$, $\vec r = \vec 0$, $\vec f = [0]$, %[f_\phi]$,
Both \cref{eq:ADE:conservative} and \cref{eq:ADE:non-conservative} can be converted to the form of \cref{eq:NumDwarf} on \cpageref{eq:NumDwarf} by taking $n=1$, $\vec Z = [\phi]$, $\matrix N = [1]$, $\vec m = [1]$, $\vec D = [D_0]$, $\vec w = \vec 0$, $\vec r = \vec 0$, $\vec f = [0]$, %[f_\phi]$,
and depending on the equation:
\begin{itemize}[itemsep=0ex]
    \item $\vec u = \vec 0$, $\vec a = [\vec v]$ for \cref{eq:ADE:conservative},
    \item $\vec u = [\vec v]$, $\vec a = \vec 0$ for \cref{eq:ADE:non-conservative}.
    \item $\vec u = \vec 0$, $\vec a = [\vec v]$ for \cref{eq:ADE:conservative} (conservative form),
    \item $\vec u = [\vec v]$, $\vec a = \vec 0$ for \cref{eq:ADE:non-conservative} (non-conservative form).
\end{itemize}

The resulting solver couples the computational algorithms of MHFEM (\cref{sec:mhfem:computational algorithm}) and LBM (\cref{sec:lbm:computational algorithm}).
@@ -299,7 +299,7 @@ Quantitative comparison is presented in \cref{tab:ADE2:norms} in terms of $L^p$
Both qualitative and quantitative results in \cref{fig:ADE2:concentrations,tab:ADE2:norms} indicate that for the conservative formulation, changing linear interpolation to cubic, as well as changing the explicit upwind discretization to implicit upwind, leads to smoother and more accurate results.
Furthermore, all these variants converge to the analytical solution as the lattice and grid are refined.
However, even the most accurate numerical solution obtained using the conservative formulation exhibits an error that is larger by orders of magnitude compared to the non-conservative formulation, even in the coarsest resolution.
The only difference between the discretizations of the non-conservative and conservative formulations is in \cref{eq:mhfem:advection terms discrete} on \cpageref{eq:mhfem:advection terms discrete} where the former contains a term corresponding to the discrete divergence of velocity.
The only difference between the discretizations of the non-conservative and conservative formulations is in \cref{eq:mhfem:advection terms discrete} on \cpageref{eq:mhfem:advection terms discrete} where the former contains the term $\sum\limits_{E \in \mathcal E_K} u_{i,j,K,E}$ corresponding to the discrete divergence of velocity.
The results indicate that this extra term can be understood as a compensation for the non-zero divergence of the discrete velocity field interpolated on the mesh.
Furthermore, it can be noticed in \cref{tab:ADE2:norms} that changing the interpolation or upwind scheme does not have a significant effect on the error when the non-conservative formulation is used.
In the finest resolution RES-A3, using the linear interpolation and explicit upwind is not only advantageous for the performance of the solver, but also leads to a smaller error.
+4 −1
Original line number Diff line number Diff line
@@ -701,7 +701,7 @@ According to the results from \cref{tab:lbm:strong scaling}, the computations ar
When only GPUs from a single node are used, all communication goes through NVLink, which is the fastest interconnection between GPUs that is available on the Karolina supercomputer.
When two nodes are used, two ranks (with IDs 7 and 8) must exchange data over the InfiniBand interconnection between nodes.
When more than two nodes are used, some nodes must exchange the same amount of data with multiple nodes.
For example, with four nodes, ranks 8 and 15 from node 1 exchange data with rank 7 from node 0 and rank 16 from node 2, respectively.
For example, with four nodes (with IDs 0, 1, 2, 3) and eight ranks per node (with IDs 0 to 31), ranks 8 and 15 from node 1 exchange data with rank 7 from node 0 and rank 16 from node 2, respectively.
Also note that due to the 1D decomposition used in this strong scaling study, the computation-to-communication ratio decreases with increasing number of ranks, since the amount of communication per rank is constant, but the subdomains become smaller.
Hence, the communication cost becomes dominant as it eventually cannot be overlapped completely with the computation using \cref{alg:distributed LBM with overlapping} on \cpageref{alg:distributed LBM with overlapping}.
In the last computations using 128 ranks, the size of each subdomain is $4 \times 512 \times 512$ and the amount of communication corresponds to $1/2$ of the subdomain for all ranks except the first and the last where it corresponds to $1/4$.
@@ -712,6 +712,7 @@ We suppose that this degenerate case could be significantly improved with a mult
    \caption{
        Strong scaling in single and double precision on the Karolina supercomputer.
        The lattice size is $512 \times 512 \times 512$ in all cases.
        Each MPI rank was mapped to its own GPU accelerator.
    }
    \label{tab:lbm:strong scaling}
    \centering
@@ -729,6 +730,7 @@ A more realistic weak scaling study is described below.
    \caption{
        Weak scaling with 1D domain expansion in single and double precision on the Karolina supercomputer.
        The lattice size is scaled as $256 \, N_{\mathrm{ranks}} \times 256 \times 256$.
        Each MPI rank was mapped to its own GPU accelerator.
    }
    \label{tab:lbm:weak scaling 1D}
    \centering
@@ -739,6 +741,7 @@ A more realistic weak scaling study is described below.
    \caption{
        Weak scaling with 3D domain expansion in single and double precision on the Karolina supercomputer.
        The lattice size is scaled as $N_x = N_y = N_z = \round*{512 \cbrt{N_{\mathrm{ranks}}}}$ in single precision and $N_x = N_y = N_z = \round*{256 \cbrt{N_{\mathrm{ranks}}}}$ in double precision.
        Each MPI rank was mapped to its own GPU accelerator.
    }
    \label{tab:lbm:weak scaling 3D}
    \centering
+2 −2
Original line number Diff line number Diff line
@@ -821,7 +821,7 @@ where $p_\ast$~[\si{\pascal}] and $\lambda_\ast$~[-] are model parameters for gi
    S_{\alpha,e} = \frac{S_\alpha - S_{\alpha,r}}{1 - S_{w,r} - S_{n,r}}
\end{equation}
for $\alpha \in \{w,n\}$, where $S_{\alpha,r}$ denotes the residual saturation of the $\alpha$-phase in the material.
The relative permeability model consistent with the Brooks--Corey's model for capillary pressure is the Burdine model \cite{burdine:1953,burdine:1950}
The relative permeability model consistent with the Brooks--Corey model for capillary pressure is the Burdine model \cite{burdine:1953,burdine:1950}
\begin{subequations}
    \label{eq:mhfem:Burdine}
    \begin{align}
@@ -892,7 +892,7 @@ The $\{p_w,p_n\}$ formulation is used in this work, i.e., $\vec Z = [p_w, p_n]^T
    \end{pmatrix},
\label{eq:NumDwarf:two-phase coefficients}
\end{alignat}
where $\lambda_t = \lambda_w + \lambda_n$.
where $\lambda_t = \lambda_w + \lambda_n$ and $\tensor I$ denotes the identity tensor.
The coefficients $\vec m$ and $\tensor D$ are chosen based on the approach used in \cite{hoteit:2008}.
Consequently, the modified balance \cref{eq:mhfem:v_iKE balance derived} can be used in the numerical scheme based on \cite{fucik:2019NumDwarf} to allow solving problems with vanishing diffusion.

+1 −0
Original line number Diff line number Diff line
@@ -4,6 +4,7 @@

\inline{TODO}
\inline{General motivation for polyhedral meshes in scientific computing -- advantages w.r.t. tetrahedral or hexahedral meshes: some ideas are in \url{https://www.symscape.com/polyhedral-tetrahedral-hexahedral-mesh-comparison}}
\inline{benefit of MHFEM completely on GPU: coupling with LBM}

Mathematical modeling of fluid dynamics has many ecological, medical and industrial applications and it is one of the central research topics investigated at the Department of Mathematics, FNSPE CTU in Prague in collaboration with prominent domestic as well as foreign institutions.
In order to model complex natural processes governing the behavior of fluids, it is often necessary to run high-resolution simulations on large computational clusters or supercomputers.
+2 −2
Original line number Diff line number Diff line
@@ -210,7 +210,7 @@ A \emph{distributed data structure} is then used for combining the local data of
While the implementation of a distributed vector is straightforward, data structures for a distributed sparse matrix that provides coupling between blocks of the partitioning may be designed in different ways.
This section describes data structures for distributed sparse matrices implemented in the TNL and Hypre libraries.

The implementation of a distributed sparse matrix in TNL~\cite{oberhuber:2021tnl} is closely bound to the distributed mesh described in \cref{sec:meshes:distributed}.
The implementation of a distributed sparse matrix in TNL~\cite{oberhuber:2021tnl} is closely bound to the distributed mesh and its MPI synchronizer described in \cref{sec:meshes:distributed}.
Each row of the global matrix corresponds to one \emph{degree of freedom} associated to an entity of the global mesh and the matrix partitioning is determined by the partitioning of the mesh.
The data owned by a particular rank is stored in a single local matrix represented by a sparse matrix data structure in TNL.
Each row and column of the local matrix corresponds to an entity of the local mesh:
@@ -222,7 +222,7 @@ Each row and column of the local matrix corresponds to an entity of the local me
        Columns owned by the rank belong to the diagonal block of the global matrix and columns corresponding to ghost entities belong to the off-diagonal blocks.
\end{itemize}
It is important to realize that \emph{local indexing} is used in the local matrix, given by indices used in the local mesh.
Hence, a local matrix with $N_r$ rows and $N_c > N_r$ columns is stored such that the first $N_r$ columns represent the diagonal block and the remaining $N_c - N_r$ columns represent the off-diagonal blocks in a compact form (there are no gaps between the off-diagonal columns as there would be if the global indexing was used).
Hence, a local matrix with $N_r$ rows and $N_c > N_r$ columns is stored such that the first $N_r$ columns represent the diagonal block of the global matrix and the remaining $N_c - N_r$ columns represent the off-diagonal blocks in a compact form (there are no gaps between the off-diagonal columns as there would be if the global indexing was used).
Global indices of an entry in the local matrix can be determined based on the global indices of the corresponding mesh entity.
As explained in \cref{sec:meshes:distributed}, local entities owned by the rank are contiguous and thus their global indices can be determined by adding an offset to the local indices, but global indices of ghost entities must be stored explicitly in an array.
The compact representation of the local matrix is advantageous for operations such as distributed sparse matrix--vector multiplication, which can be performed in two steps:
Loading