refactored document structure - sections on MHFEM, LBM, and LBM-MHFEM promoted to chapters (43862e59) · Commits · Jakub Klinkovský / Dissertation

content/LBM-MHFEM.tex

+7 −7

Original line number	Diff line number	Diff line
		\inline{Supporting paper: \cite{klinkovsky2022:WT}}
		\todo[inline, color=red]{DON'T READ -- THIS CHAPTER IS WORK IN PROGRESS}

		%\section{Coupled LBM-MHFEM computational approach}
		\inline{Supporting paper: \cite{klinkovsky2022:WT}}

		In this section, the computational approach for solving \cref{eq:ns,eq:transport} based on the combination of the lattice Boltzmann and mixed-hybrid finite element methods is described.
		In this chapter, the computational approach for solving \cref{eq:ns,eq:transport} based on the combination of the lattice Boltzmann and mixed-hybrid finite element methods is described.
		Both methods are introduced and their coupling in a time-adaptive manner is explained.
		Finally, various implementation details are described.

		\inline{define domains $\Omega_1$ and $\Omega_2$}

		\subsection{Computational algorithm and time adaptivity}
		\section{Computational algorithm and time adaptivity}
		\label{sec:algorithm}

		The previous two sections described the numerical methods used for the discretizations of \cref{eq:ns,eq:transport}, respectively.
		@@ -63,7 +63,7 @@ On the other hand, if $C > C_{\max}$, the MHFEM solver needs to perform $n_M > 1
		In practice, we found empirically that limiting the time step $\Delta t_M$ with $C_{\max} = \frac{1}{2}$ is necessary to ensure the stability of the coupled solver.
		The overall performance of the solver depends on the concrete values of $\Delta t_M$ selected by the adaptive algorithm, which are influenced by the quantities needed to compute the time-step control factor $C$, i.e., local velocity magnitude, lattice time step (which is related to the lattice space step), and the local space step of the unstructured mesh (expressed by $\abs{E}$ in the algorithm).

		\subsection{Interpolation of the velocity field}
		\section{Interpolation of the velocity field}
		\label{sec:interpolation}

		Since \cref{eq:ns,eq:transport} are coupled by the velocity field $\vec v(\vec x, t)$, the numerical approach relies on the interpolation of the approximate velocity field computed by LBM and its projection into the finite element space used by MHFEM.
		@@ -96,7 +96,7 @@ The problem of compatibility between flow schemes producing a discrete velocity
		However, such post-processing would incur additional cost to the computational algorithm.
		On the other hand, we observed that solving the transport equation in the non-conservative form \eqref{eq:transport:non-conservative} is less sensitive to errors in the velocity field and the post-processing schemes are not necessary in this case.

		\subsection{Implementation remarks}
		\section{Implementation remarks}

		\inline{This section is useless -- will be incorporated into previous sections.}

		@@ -113,7 +113,7 @@ This could be improved by using interpolated boundary conditions for LBM \cite{k
		On the other hand, the MHFEM part of the solver can be used on complex domain geometries with unstructured mesh discretizations.
		The details related to the decomposition of a regular lattice overlapped with an unstructured mesh are described in the following section.

		\subsection{Domain decomposition for overlapped lattice and mesh}
		\section{Domain decomposition for overlapped lattice and mesh}
		\label{sec:decomposition}

		The combination of a lattice overlapped with an unstructured mesh requires special attention when the solver is run in a distributed fashion, e.g. utilizing multiple GPU accelerators.

content/LBM.tex

+18 −18

Original line number	Diff line number	Diff line
		\inline{add meta-intro to the section: goals, structure, summarize my own contributions (no supporting paper...)}
		\inline{add meta-intro to the chapter: goals, structure, summarize my own contributions}

		The lattice Boltzmann method \cite{kruger:2017lattice} (LBM) is an alternative to traditional computational methods such as finite difference, finite volume, and finite element methods.
		LBM can be formulated as a time-explicit scheme that can be easily parallelized \cite{kruger:2017lattice} and the advent of general-purpose computing on graphics processing units (GPGPU) made large-scale numerical simulations of \todo{make it more general -- this is the first mention of flow in this section/chapter}turbulent flows feasible \cite{kang2013,geier2015cumulant,kumar2018,peng2018,wittmann2013,zakirov2021}.
		@@ -12,12 +12,12 @@ Based on the kinetic theory \cite{kruger:2017lattice}, the density distribution
		where $\rho$~[\si{\kilogram\per\cubic\metre}] is the mass density, $\vec F = [F_x, F_y, F_z]^T$~[\si{\newton\per\cubic\metre}] represents the external body force density acting on the mass and $\mathcal C$~[\si{\kilogram\second\squared\per\metre\tothe{6}}] denotes a general collision operator that depends on the distribution function and other parameters (e.g., relaxation times and equilibrium distribution).
		LBM follows from the discretization of \cref{eq:lbm:BTE} in space, velocity space, and time.

		\subsection{Components of the lattice Boltzmann method}
		\section{Components of the lattice Boltzmann method}

		The detailed derivation of LBM is not within the scope of this thesis.
		Instead, we review the components needed to describe the implementation of the method and refer the reader to \cite{kruger:2017lattice} for further details.

		\subsubsection{Lattice and velocity set}
		\subsection{Lattice and velocity set}

		We consider the discretization of a cuboidal domain $\Omega \subset \mathbb R^D$ using an equidistant orthogonal grid represented by a finite set $\mathcal G_\Omega$ of grid nodes.
		The extended grid $\mathcal G_{\overline \Omega}$ consists of the interior nodes from $\mathcal G_\Omega$ and an additional outer layer of nodes, which is added to facilitate the discretization of the domain boundary.
		@@ -57,7 +57,7 @@ The aforementioned velocity sets can be derived using the Gauss--Hermite quadrat
		Another approach explained in \cite[Section~3.4.7.2]{kruger:2017lattice} is to determine the weights by solving the system of equations representing general conditions that ensure rotational isotropy of the lattice.
		The Python script in \cref{appendix:LBM:velocity sets} can be used to quickly verify the consistency of discrete velocities, weights and the speed of sound for a given velocity set, or even to obtain the necessary conditions when designing a new velocity set.

		\subsubsection{Discrete lattice Boltzmann equation}
		\subsection{Discrete lattice Boltzmann equation}

		Before discretization, \cref{eq:lbm:BTE} is first transformed to a non-dimensional form by scaling the physical quantities based on a characteristic length, time and density \cite{kruger:2017lattice}.
		For the purpose of lattice computations, a system of \emph{lattice units} is defined by selecting $\delta_x$, $\delta_t$ and $\rho_0$ (i.e., density of an incompressible fluid) as the conversion factors.
		@@ -86,7 +86,7 @@ where $\hat{\vec F}$ denotes the external force density in lattice units exerted
		The macroscopic velocity~$\vec v$ can be obtained by dividing the macroscopic momentum density $\rho \vec v$ by the macroscopic mass density~$\rho$.
		Other macroscopic quantities, such as the stress tensor, can be obtained from higher-order, non-conserved moments of the distribution functions \cite{kruger:2017lattice}.

		\subsubsection{Collision operator}
		\subsection{Collision operator}

		The term $\mathcal C_q$ in \cref{eq:lbm:LBE} denotes the discrete collision operator; in this paper we use the cumulant operator proposed in~\cite{geier2015cumulant}.
		The operator has several parameters that drive the relaxation rates of different cumulants towards their equilibria.
		@@ -97,20 +97,20 @@ The first relaxation rate, $\omega_1$, is linked to the kinematic viscosity $\nu
		The other relaxation rates are set as suggested by the authors of the original paper.
		Furthermore, the approximations of the spatial velocity derivatives to reduce the artifacts due to the absence of higher-order cumulants are also incorporated in the collision operator according to \cite{geier2017a,geier2017b}.

		\subsubsection{Equilibrium function}
		\subsection{Equilibrium function}

		In the kinetic theory of gases, the Maxwell--Boltzmann distribution $f^{\mathrm{eq}}(\vec x, \abs{\vec v}, t)$ describes the speeds of particles in an ideal gas moving with a mean velocity $\vec v$ in a thermodynamic equilibrium \cite{kruger:2017lattice}.
		In the derivation of the discrete lattice Boltzmann equation, the equilibrium distribution function $f^{\mathrm{eq}}$ is discretized to obtain an approximation of the equilibrium state for the given discrete velocity set.
		The discrete equilibrium distribution function $f_q^{\mathrm{eq}}$ for the cumulant collision operator can be derived in the space of cumulants following \cite{geier2015cumulant}.

		\subsubsection{Boundary conditions}
		\subsection{Boundary conditions}

		The formulation of proper boundary conditions for LBM is a major complication in its applications, since the desired macroscopic conditions (e.g., given in terms of $\rho$ and $\vec v$) must be reproduced on the mesoscopic level using the discrete density distribution functions $f_q$ \cite{kruger:2017lattice}.
		This problem does not have a unique solution and thus multiple boundary schemes for LBM exist that approximate the same macroscopic condition.
		As of this writing, the LBM implementation used by the author contains simple boundary conditions based on the wet-node equilibrium approach for inflow \cite{kruger:2017lattice,latt2008,mohamad2009,haussmann2019}, extrapolation method for outflow, and full-way bounce-back scheme for solid walls \cite{kruger:2017lattice}.
		Since the analysis of boundary conditions is not within the scope of this thesis, the details are omitted.

		\subsubsection{Initialization}
		\subsection{Initialization}

		The simplest initialization approach, which is used in this work, is to set the discrete density distribution functions at $t = 0$ to their equilibria according to the supplied macroscopic density and velocity using
		\begin{equation} \label{eq:lbm:initialization}
		@@ -120,10 +120,10 @@ for all lattice sites $\vec x \in \mathcal L_{\overline \Omega}$ and $q \in \{1,
		Unless noted otherwise, the values $\rho(\vec x, 0) = \hat\rho_0$ and $\vec v(\vec x, 0) = \vec 0$ are used.
		Different initialization schemes can be found in \cite{kruger:2017lattice}.

		\subsection{Streaming schemes}
		\section{Streaming schemes}
		\label{sec:lbm:streaming schemes}

		\subsubsection{Push and pull schemes with A-B pattern}
		\subsection{Push and pull schemes with A-B pattern}

		Based on \cref{eq:lbm:LBE}, the computational steps needed to evolve the state from the current time $t$ to the next time level $t + \Delta t$ can be decomposed into two successive steps:
		\begin{subequations}
		@@ -189,7 +189,7 @@ However, this is not a problem since these values are not directly visualized, c
		\label{fig:lbm:AB pull}
		\end{figure}

		\subsubsection{A-A pattern}
		\subsection{A-A pattern}

		The \emph{A-A pattern} \cite{bailey:2009accelerating,wittmann:2013comparison} is an alternative approach which allows to perform the streaming step in-place using just one storage array $\texttt{A}$ for the density distribution functions.
		This feature of the streaming scheme reduces memory requirements for a given lattice by a factor of two, which is crucial for high-resolution LBM simulations on GPU accelerators due to their limited amount of global memory.
		@@ -264,14 +264,14 @@ The author is not aware of any literature dealing with issues related to correct
		\label{fig:lbm:AA odd}
		\end{figure}

		\subsubsection{Other patterns}
		\subsection{Other patterns}

		The A-A pattern is not the only technique that allows to perform streaming in-place in a single array.
		Alternatives include, for example, the Esoteric twist \cite{geier:2017esoteric}, the compact LRnLA streaming pattern \cite{zakirov2021}, Esoteric push and Esoteric pull \cite{Lehmann2022}.
		However, none of the aforementioned papers discuss advanced topics such as MPI communication or the implementation of boundary conditions.
		Hence, these streaming patterns are not yet implemented in our LBM code.

		\subsection{Computational algorithm}
		\section{Computational algorithm}

		The time-stepping algorithm for LBM can be generalized to cover all aforementioned streaming patterns.
		\Cref{alg:LBM} summarizes the steps according to our implementation for the A-A pattern and the push/pull schemes with the A-B pattern.
		@@ -416,7 +416,7 @@ This approach is summarized in \cref{alg:distributed LBM with overlapping} using
		The steps \ref{algitem:LBM-overlapping:boundaries start} and \ref{algitem:LBM-overlapping:interior start} in \cref{alg:distributed LBM with overlapping} involve the same local operations as the step \ref{algitem:parallel LBM:compute} in \cref{alg:parallel LBM}, but on different sets of lattice sites.
		Likewise, all steps \ref{algitem:distributed LBM:communication 1} and \ref{algitem:distributed LBM:communication 2} in \cref{alg:distributed LBM}, and steps \ref{algitem:LBM-overlapping:communication 1} and \ref{algitem:LBM-overlapping:communication 2} in \cref{alg:distributed LBM with overlapping}, perform exactly the same operations that are explained further in \cref{sec:lbm:optimization}.

		\subsection{Implementation remarks}
		\section{Implementation remarks}
		\label{sec:lbm:implementation}

		This section covers practical aspects related to the LBM implementation in our code base \cite{LBM:mmg-gitlab}.
		@@ -503,7 +503,7 @@ Users can subclass each model, implement their modifications and pass their type
		All the aforementioned classes \ic{LBM_BLOCK}, \ic{LBM}, and \ic{State} are in fact templates with one parameter, \ic{CONFIG}, as shown in \cref{fig:lbm:class diagram}.
		The modifications of the components such as \ic{BC} or \ic{MACRO} may need to be designed together with the extensions of \ic{State}, especially when custom attributes are added to the \ic{DATA} component.

		\subsection{Optimization remarks}
		\section{Optimization remarks}
		\label{sec:lbm:optimization}

		The most important optimization for distributed LBM computations has already been outlined in \cref{alg:distributed LBM with overlapping}: processing interior lattice sites must be overlapped with data synchronization on the interfaces between neighboring subdomains.
		@@ -611,7 +611,7 @@ The thread block size is selected such that $B_y$ is a multiple of 32 (i.e., the

		\later{Future work: multidimensional domain decomposition}

		\subsection{Results}
		\section{Computational benchmark results}

		Since the verification and validation of the LBM implementation was performed primarily by other research team members \cite{fucik2019,eichler2021a,eichler2021b,eichler2022}, this section presents only the performance evaluation, which is the author's original work.

		@@ -627,7 +627,7 @@ The values reported in this section were calculated by taking the total number o
		The GLUPS values are comparable among different lattice resolutions and hardware platforms used for the computations, but they are bound to the underlying configuration of LBM.
		Most notably, the results reported in this section are specific to the D3Q27 velocity set and the cumulant collision operator.

		\subsubsection{Comparison of streaming patterns}
		\subsection{Comparison of streaming patterns}

		As the first result, we compare the performance of two streaming patterns in our implementation: the A-B pull scheme and the A-A pattern (see \cref{sec:lbm:streaming schemes}).
		The benchmark was computed on a fixed domain with $L_x = L_y = L_z = \SI{0.25}{\metre}$, using single or double precision, and without MPI distribution (i.e., using one MPI rank).
		@@ -655,7 +655,7 @@ Hence, the A-A pattern will be used in all LBM simulations presented hereafter.
		\input{./data/lbm/AB_vs_AA/table.tex}
		\end{table}

		\subsubsection{Scaling on the Karolina supercomputer}
		\subsection{Scaling on the Karolina supercomputer}

		The following results show the performance scaling of distributed LBM computations on the Karolina supercomputer~\cite{it4i:karolina}, operated by IT4Innovations in Ostrava.
		The system comprises 72 accelerated compute nodes with hardware specifications listed in \cref{tab:hardware:Karolina}.

content/MHFEM.tex

+11 −9

Original line number	Diff line number	Diff line
		The content of this section is based on the author's previous thesis \cite{klinkovsky:2017thesis} and the paper \cite{fucik:2019NumDwarf}, which describe \emph{NumDwarf}, a numerical scheme for the solution of a system of partial differential equations in a general coefficient form.
		The content of this chapter is based on the author's previous thesis \cite{klinkovsky:2017thesis} and the paper \cite{fucik:2019NumDwarf}, which describe \emph{NumDwarf}, a numerical scheme for the solution of a system of partial differential equations in a general coefficient form.
		\emph{NumDwarf} is based on the mixed-hybrid finite element method (MHFEM) \cite{brezzi:1991mixed} and it was originally developed for simulating multicomponent flow and transport phenomena in porous media (see \cite{solovsky2020,askar2021} for several recent applications), but it can be used for any problem whose governing equations can be written in a compatible form.
		Although the implementation of the numerical scheme has seen only minor modifications since the publication of the paper, the computational aspects of the author's solver have been greatly extended.
		Most notably, the solver supports MPI computations on distributed unstructured meshes.
		@@ -6,13 +6,13 @@ Most notably, the solver supports MPI computations on distributed unstructured m
		Compared to the original computational performance study from \cite{fucik:2019NumDwarf}, which was computed on structured grids as well as unstructured meshes but using only CUDA and OpenMP parallelization, the presented results were computed only on unstructured meshes, but include MPI-based distributed computations using multiple GPUs and multiple CPU nodes.
		This extends the updated performance study from \cite{klinkovsky:2022meshes} with a comparison between OpenMP and MPI for multi-core CPU parallelization on a single node, and a comparison between different solvers for sparse linear systems.

		The section is organized as follows.
		The chapter is organized as follows.
		First, we introduce the general formulation of the PDE system solved by \emph{NumDwarf} in \cref{sec:mhfem:formulation} and summarize the numerical scheme in \cref{sec:mhfem:numerical scheme}.
		\Cref{sec:mhfem:two-phase flow} summarizes the mathematical model of incompressible two-phase flow in porous media and \cref{sec:mhfem:mcwhdd} presents the results of a benchmark problem that was used to verify the numerical scheme and implementation by means of comparison against known semi-analytical solutions.
		The same benchmark problem was used to also evaluate the performance of the solver using various computational parameters.
		The benchmarking methodology is described in \cref{sec:mhfem:benchmarking methodology} and the benchmark results are presented in \cref{sec:mhfem:computational results}.

		\subsection{General formulation}
		\section{General formulation}
		\label{sec:mhfem:formulation}

		The numerical scheme is implemented for a PDE system written in the form
		@@ -46,9 +46,11 @@ There are multiple options, such as the Dirichlet-type fixed-value condition
		where $\Gamma_i^{\mathcal D}$ and $\Gamma_i^{\mathcal N}$ denote parts of the domain boundary where the respective conditions are active, and $\vec n_{\partial\Omega}$ is the outward unit normal vector.
		Note that in case of non-linear problems where the coefficient $\vec m$ depends on the unknowns $\vec Z$, it may be necessary to specify fixed values of $Z_i$ even on parts of the domain boundary $\Gamma_i^{\mathcal N}$ where the Neumann-type condition is used to specify inflow, see \cite{fucik:2019NumDwarf,klinkovsky:2017thesis} for details.

		\subsection{Numerical scheme}
		\section{Numerical scheme}
		\label{sec:mhfem:numerical scheme}

		\inline{Expand with more details from \cite{klinkovsky:2017thesis} and the revised paper \cite{klinkovsky2022:WT}}

		Since the formulation of the numerical scheme has seen no modifications since the works \cite{klinkovsky:2017thesis,fucik:2019NumDwarf}, the full derivation of the numerical scheme is omitted in this thesis and readers are referred to the aforementioned publications for details.
		In the following paragraphs, we briefly summarize the numerical approach for the purpose of applications presented in this work.

		@@ -60,7 +62,7 @@ Even when both tensor coefficients $\tensor N$ and $\tensor D$ in \cref{eq:NumDw
		In this work, we use conforming unstructured homogeneous meshes denoted as $\mathcal K_h$ and the lowest-order Raviart--Thomas--Nédélec space $\mathbf{RTN}_0(\mathcal K_h)$ for spatial discretization.
		The set of faces of the mesh $\mathcal K_h$ will be denoted as $\mathcal E_h$ and the set of faces of an individual element $K \in \mathcal K_h$ will be denoted as $\mathcal E_K$.

		\subsection{Two-phase flow in porous media}
		\section{Two-phase flow in porous media}
		\label{sec:mhfem:two-phase flow}

		This section summarizes the mathematical model of incompressible two-phase flow in porous media.
		@@ -180,7 +182,7 @@ The $\{p_w,p_n\}$ formulation is used in this work, i.e., $\vec Z = [p_w, p_n]^T
		where $\lambda_t = \lambda_w + \lambda_n$.
		The coefficients $\vec m$ and $\tensor D$ are chosen based on the approach used in \cite{hoteit:2008}.

		\subsection{Generalized McWhorter--Sunada problem}
		\section{Generalized McWhorter--Sunada problem}
		\label{sec:mhfem:mcwhdd}

		In this section, we present the results of the generalized McWhorter--Sunada problem, a special case of incompressible two-phase flow in porous media described by \cref{eq:mhfem:two-phase flow,eq:mhfem:algebraic constraints} with known semi-analytical solution \cite{mcwhorter1990exact,fucik2016multidimensional}.
		@@ -262,7 +264,7 @@ The values of all parameters used in this work are summarized in \cref{tab:mcwhd
		\end{tabular}
		\end{table}

		\subsubsection{Verification results}
		\subsection{Verification results}

		For each mesh $\mathcal K_h$, we define the error $E_{h,S_n} = S_{n,\mathrm{ex}}(t_{\max}) - S_{n,h}(t_{\max})$ in terms of the injected phase saturation $S_n$ as the difference between the semi-analytical solution $S_{n,\mathrm{ex}}$ and numerical solution $S_{n,h}$ on the mesh $\mathcal K_h$ at time $t_{\max}$.
		The $L_p$ norm in $\Omega \subset \mathbb R^D$ of the error $\norm{E_{h,S_n}}_p$ is evaluated for $p \in \{1,2\}$ as
		@@ -304,7 +306,7 @@ The results of the numerical analysis indicate that the numerical scheme converg
		\input{./data/mcwhdd/eoc_3D.tex}
		\end{table}

		\subsection{Benchmarking methodology}
		\section{Benchmarking methodology}
		\label{sec:mhfem:benchmarking methodology}

		The generalized McWhorter--Sunada problem was computed as a benchmark for several configurations of the solver.
		@@ -376,7 +378,7 @@ The secondary quantities of interest for the evaluation of parallel computations
		\end{equation*}
		In case of GPU computations, the speed-up and efficiency are calculated relative to the number of whole GPUs rather than the number of individual GPU cores, because the GPU cores are not independent.

		\subsection{Computational benchmark results}
		\section{Computational benchmark results}
		\label{sec:mhfem:computational results}

		The benchmark results are presented as strong scaling studies for the finest triangular mesh 2D$^\triangle_5$ and finest tetrahedral mesh 3D$^\triangle_5$.

content/document.tex

+10 −10

Original line number	Diff line number	Diff line
		@@ -56,23 +56,23 @@ In this chapter, we present several data structures which are flexible in the se
		\input{content/linear_systems.tex}

		\cleardoublepage
		\chapter{Numerical Methods for Computational Fluid Dynamics}
		\label{chapter:numerical methods}

		\section{MHFEM}
		\label{sec:MHFEM}
		\chapter{Mixed-Hybrid Finite Element Method}
		\label{chapter:MHFEM}
		\input{content/MHFEM.tex}

		\section{LBM}
		\label{sec:LBM}
		\cleardoublepage
		\chapter{Lattice Boltzmann Method}
		\label{chapter:LBM}
		\input{content/LBM.tex}

		\section{Coupled LBM-MHFEM}
		\label{sec:LBM-MHFEM}
		\cleardoublepage
		\chapter{Coupled LBM-MHFEM Computational Approach}
		\label{chapter:LBM-MHFEM}
		\input{content/LBM-MHFEM.tex}

		\cleardoublepage
		\chapter{Mathematical Modeling of Vapor Transport in Air}
		\label{sec:vapor transport}
		\label{chapter:vapor transport}
		\input{content/vapor_transport.tex}

		\cleardoublepage

content/meshes/meshes.tex

+1 −1

Original line number	Diff line number	Diff line
		@@ -63,7 +63,7 @@ The library currently features definitions of the following topologies: vertex,

		The efficiency of the implemented data structure is verified using several benchmark problems.
		We developed two benchmark problems based on simple parallel algorithms to evaluate different configurations of the data structure, to compare it with MOAB~\cite{tautges:2004moab}, and to demonstrate the efficiency of CUDA-based GPU parallelization compared to OpenMP-based CPU parallelization.
		Additionally, the data structure has been successfully used as a fundamental building block in advanced numerical solvers described in \cref{chapter:numerical methods}.
		Additionally, the data structure has been successfully used as a fundamental building block in advanced numerical solvers based on the method described in \cref{chapter:MHFEM}.

		\subsection{Terminology}
		\label{sec:meshes:terminology}