\inline{General motivation for polyhedral meshes in scientific computing -- advantages w.r.t. tetrahedral or hexahedral meshes: some ideas are in \url{https://www.symscape.com/polyhedral-tetrahedral-hexahedral-mesh-comparison}}
\inline{benefit of MHFEM completely on GPU: coupling with LBM}
Mathematical modeling of fluid dynamics has many ecological, medical and industrial applications and it is one of the central research topics investigated at the Department of Mathematics, FNSPE CTU in Prague in collaboration with prominent domestic as well as foreign institutions.
In order to model complex natural processes governing the behavior of fluids, it is often necessary to run high-resolution simulations on large computational clusters or supercomputers.
However, using the facilities for high performance computing efficiently is non-trivial, as it requires careful management of data in the computer memory and appropriate division of the task between all available processing units.
Especially when designing algorithms for systems with GPU accelerators, which provide significantly more processing units as well as memory levels compared to traditional computational systems, the specifics of the hardware architecture have to be considered.
In order to accurately model complex natural processes governing the behavior of fluids, it is often necessary to employ advanced numerical methods and to run high-resolution simulations on large computational clusters or supercomputers.
However, using the facilities for high performance computing efficiently is non-trivial, as it requires careful management of data in the computer memory and appropriate division of the computations between all available processing units.
Especially when designing algorithms for systems with GPU accelerators, which provide significantly more processing units as well as memory levels compared to traditional computational systems, specifics of the hardware architectures have to be considered.
The purpose of this thesis is the development of efficient data structures and parallel algorithms and using them as the main ingredients in solvers based on, e.g., the lattice Boltzmann method or the mixed-hybrid finite element method, which are used at the Department of Mathematics, FNSPE CTU in Prague.
Development of data structures for organizing structured as well as unstructured data in numerical simulations is necessary in order to match the requirements for efficient utilization of modern hardware architectures such as GPU clusters.
Similarly, high performance parallel algorithms have to be designed specifically for the hardware architectures used by modern GPUs.
Since the field of computational fluid dynamics includes many substantially different applications, the most important requirements imposed on the building blocks of numerical solvers are configurability and composability.
The former allows to adapt a piece of software, such as a data structure, for a specific application, and the latter allows to combine multiple existing pieces together to resolve a new use case.
Similarly, it is desirable to facilitate coupling even high-level tools and methods in order to provide more general solvers for multiphysics problems that arise in practice.
The purpose of this thesis is the development of efficient data structures and parallel algorithms that can be used as fundamental ingredients in solvers based on, e.g., the lattice Boltzmann method or the mixed-hybrid finite element method.
Development of data structures for organizing structured as well as unstructured data in numerical simulations is necessary in order to match the requirements for efficient utilization of modern supercomputers.
Similarly, high performance parallel algorithms have to be designed specifically for the hardware architectures used by modern GPU accelerators.
Due to the diversity of parallel computing platforms, it is desirable for scientists to use established high-level libraries that provide a portable and easy to use interface for common operations.
However, it may be difficult to combine different packages and libraries, or even to gain sufficient overview of the available options.
\section*{State of the art}
\addcontentsline{toc}{section}{State of the Art}
TODO
\inline{Data structures for GPUs -- easy to formulate for linear algebra as it provides a well-established framework for representing scientific problems, but for other fields such as mesh computations there is not one obvious way to do things (for example, one problem is that the system of notation is not settled).}
\inline{Linear algebra -- general description, highlight the solution of sparse linear systems (details in \cref{chapter:linear systems})}
\inline{CFD -- FVM, FEM, MHFEM, LBM}
\inline{coupling between different methods -- multiphysics}
\inline{established multiphysics software}
Contemporary supercomputers are based on several different hardware platforms and each has an associated parallel programming framework providing native performance.
For example, CUDA \cite{nvidia:cuda} targets Nvidia GPU accelerators, ROCm \cite{amd:rocm} targets AMD GPU accelerators, and TBB \cite{oneTBB} targets multi-core processors.
In order to simplify parallel programming and provide performance portability, high-level libraries such as Thrust \cite{thrust:website}, Kokkos \cite{Kokkos:ecosystem}, or TNL \cite{oberhuber:2021tnl} are being developed.
Additionally, established scientific libraries such as Trilinos \cite{trilinos} or PETSc \cite{petsc-web-page} are being updated in order to support GPU accelerators via suitable interfaces.
However, this often requires a complete redesign of an algorithm due to hardware platforms having different requirements on optimal data layout.
Practically all numerical solvers in computational fluid dynamics utilize methods of linear algebra, which provides a well-established framework for representing scientific problems.
Many data structures and algorithms for GPU accelerators are available in common libraries such as cuBLAS \cite{nvidia:cuda} or cuSPARSE \cite{nvidia:cuda}.
An important problem in linear algebra is the solution of sparse linear systems, which is covered later in this thesis in \cref{chapter:linear systems}.
Unlike linear algebra, other fields such as mesh computations are not as established and may not even have a fixed universal system of notation.
Hence, there is not one obvious way to do formulate problems and the development of reusable software components is inherently difficult.
Overall, there is a lack of software libraries providing a robust and multi-platform data structure for the representation of unstructured meshes.
Moreover, multiple types of meshes are used in practice, e.g., tetrahedral, hexahedral, and polyhedral.
While tetrahedral meshes are the easiest to use in numerical methods such as finite elements, hexahedral and especially polyhedral meshes are advantageous in complex cases with high amount of data per cell.
Compared to other types, polyhedral meshes need smaller number of cells to cover a given domain and numerical methods such as finite volumes can be designed carefully with the same level of accuracy that can be achieved on tetrahedral meshes.
Numerous computational tools based on numerical methods such as finite volumes or finite elements are available for solving partial differential equations originating from mathematical modeling of various biological, environmental, or industrial problems.
In particular, software projects such as deal.II~\cite{bangerth:2007deal.II}, DUNE~\cite{bastian:2006DUNE}, OpenFOAM~\cite{jasak:2007openfoam}, TOUGH2~\cite{pruess:1999TOUGH2}, MFiX~\cite{syamlal:1993}, ANSYS Fluent~\cite{ansys-fluent:2009} or COMSOL Multiphysics~\cite{COMSOLMultiphysics1998} are suitable for simulations in the field of computational fluid dynamics.
Additionally, the lattice Boltzmann method (LBM) has become popular for turbulent flow simulations \cite{kang2013,geier2015cumulant,kumar2018,peng2018,wittmann2013,zakirov2021}.
While many of the aforementioned projects are open-source and thus can be freely modified, it is difficult to incorporate novel approaches and methods into extensive software packages, especially for external users.
Hence, significant stream of innovation originates from small separate projects that gradually either evolve into larger projects, or get incorporated in existing software.
This thesis builds on the author's previous work \cite{klinkovsky:2017thesis}, an implementation of the mixed-hybrid finite element method for multiphase compositional flow in porous media, which can utilize Nvidia GPU accelerators.
Additionally, the work extends the LBM implementation that is developed at the Department of Mathematics, FNSPE CTU in Prague.
\section*{Research goals}
\addcontentsline{toc}{section}{Research Goals}
@@ -30,29 +49,84 @@ To the best of our knowledge, they represent unique ideas that push forward the
\begin{enumerate}
\item
\inline{develop TNL -- an open-source software library of high performance algorithms and efficient data structures that follows modern software design patterns and simplifies the development of CFD solvers}
\inline{explain what makes it unique}
\inline{note that it is a never ending story (the goal lies in continuous effort rather than reaching a predetermined state)}
To develop the Template Numerical Library (TNL) \cite{oberhuber:2021tnl}, an open-source software library of high performance algorithms and efficient data structures that follows modern software design patterns and simplifies the development of CFD solvers.
Why develop yet another numerical library?
The short answer is: Why not?
There are many competing libraries with similar features even in established fields such as linear algebra, where one might hope for the existence of a universal project collecting contributions from all over the world.
However, competition inspires innovation and maintaining a separate project provides its developers with an easy way to try and share ideas with others.
Note that this goal is never-ending, it lies in continuous effort rather than achieving a target.
\item
\inline{data structure for the representation of unstructured meshes with full support of modern parallel platforms, including GPUs and MPI}
To develop an efficient data structure for the representation of conforming unstructured meshes with full support of modern distributed computing platforms based on GPU accelerators.
\item
\inline{MHFEM and LBM -- extend the author's previous work}
To extend the author's previously developed solver based on the mixed-hybrid finite element method with distributed computing capabilities.
\item
\inline{LBM -- develop a scalable solver for GPU-based supercomputers}
To develop a scalable solver based on the lattice Boltzmann method for GPU-based supercomputers.
\item
\inline{develop a high-performance solver for a coupled model for the simulation of vapor transport in air and validate the model using experimentally measured data}
To develop a mathematical model of vapor transport in air, implement a high-performance solver for the model, and validate it using experimentally measured data.
\end{enumerate}
\section*{Achieved results}
\addcontentsline{toc}{section}{Achieved results}
TODO
\section*{Outlook into the future}
\addcontentsline{toc}{section}{Outlook into the future}
TODO
The thesis presents the following novel results:
\begin{enumerate}
\item
\textbf{Data structure for conforming unstructured meshes supporting distributed computing on GPU accelerators.}
The data structure is implemented in the TNL library \cite{oberhuber:2021tnl} and several benchmarks were performed to evaluate its efficiency, showing that for triangular and tetrahedral meshes it outperforms the MOAB library \cite{tautges:2004moab} by at least an order of magnitude.
We have published our results in \cite{klinkovsky:2022meshes}.
Since the publication of the paper, the data structure has been extended for polygonal and polyhedral meshes and these modifications are included in the thesis.
\item
\textbf{General solver based on the mixed-hybrid finite element method (MHFEM) with distributed computing capabilities.}
This extends the author's previous work \cite{klinkovsky:2017thesis} and serves as an important building block for other results in this thesis.
The solver uses the aforementioned distributed data structure for unstructured meshes and can utilize the Hypre library \cite{Hypre:library} for efficient solution of sparse linear systems via wrappers implemented in the TNL library.
Our results related to the development of the MHFEM solver are included in the publications \cite{fucik:2019NumDwarf,klinkovsky:2022meshes}.
\item
\textbf{Scalable implementation of the lattice Boltzmann method (LBM) for supercomputers based on GPU accelerators.}
The implementation is based on a distributed multidimensional array data structure and an MPI synchronizer for distributed data, which are implemented in the TNL library.
Strong scaling as well as weak scaling studies were performed on the Karolina supercomputer \cite{it4i:karolina}.
The lattice Boltzmann method code is developed by the research group at the Czech Technical University in Prague and the author's contributions are included in the publications \cite{fucik2019,fucik2020,fucik:lbmat,eichler2022}.
\item
\textbf{Coupled computational approach based on the combination of lattice Boltzmann method and mixed-hybrid finite element method.}
We consider a model based on the Navier--Stokes equations (solved by LBM) coupled with a linear advection--diffusion equation (discretized using MHFEM) and analyze properties of the numerical scheme and performance of its implementation.
A simple benchmark problem with analytical solution is constructed in order to investigate accuracy of the scheme for conservative and non-conservative form of the transport equation.
The solver benefits from native implementation of both LBM and MHFEM for GPU accelerators, which allows for efficient coupling between the methods.
Our results are submitted for publication in the paper \cite{klinkovsky2022:WT}.
\item
\textbf{Mathematical model for vapor transport in air and its validation using experimental data.}
The aforementioned coupled computational approach has been used to model transport of water vapor in the turbulent boundary layer above a disturbed soil surface.
While the porous medium below the surface is not simulated in this work, the interaction between soil and atmosphere such as evaporation is modeled using boundary conditions.
The model is compared both qualitatively and quantitatively to experimental data measured in three configurations resulting in different flow regimes.
Our results are submitted for publication in the paper \cite{klinkovsky2022:WT}.
\end{enumerate}
\section*{Outlook}
\addcontentsline{toc}{section}{Outlook}
During the work on this thesis, we identified several interesting problems and directions for future research.
Here, we summarize a few of them.
The coupled LBM-MHFEM solver that was validated for vapor transport in air is the first step towards the development of a flexible multiphysics solver.
In the future, it can be extended to include coupling with porous medium, thermodynamic effects, or component transport.
The primary use case for such solver is to investigate land-atmospheric interactions with soil surface disturbances (e.g., vegetation, micro-topography) and macro-porous disturbances in the subsurface.
The resulting solver would facilitate the symbiosis between modeling and experimental approaches for the investigation of environmental processes.
The numerical models used in this work have several problems that hinder their application in large-scale and complex scenarios.
Notably, our MHFEM implementation is limited to fixed meshes that cannot adapt to the simulated fields, and the LBM implementation assumes a uniform lattice.
The development of schemes for adaptive meshes and non-uniform lattice discretizations may be necessary for successful use of these methods to solve real-world problems.
The performance and efficiency of the developed solvers was tested on the largest computing systems that were available to the author at that time.
However, many supercomputers in the world are much larger and the scaling of the presented solvers on such systems should be investigated.
Furthermore, hardware as well as software platforms powering contemporary supercomputers are continuously evolving.
The TNL library supports parallelization via OpenMP and CUDA for multi-core CPUs and Nvidia GPUs, but the implementation of new backends (e.g., using HIP and SYCL) is needed to provide support for other hardware platforms.
Finally, the TNL library has proved to be an effective tool for the development of scalable and flexible numerical solvers.
Its development requires continuous effort in order to keep up with the world.
The most important general directions of future development, as viewed by the author, are interoperability with other libraries, improving the modular structure of the project, and using formal \emph{concepts} in the \CPPtwenty standard to improve type-checking, documentation, and compiler diagnostics.
\section*{System of notation}
\addcontentsline{toc}{section}{System of notation}
TODO
\inline{subscripts and superscripts; vectors and matrices/tensors; 3D: $\vec v =[v_x, v_y, v_z]^T$ implies that $v_x \equiv v_1$, $v_y \equiv v_2$, and $v_z \equiv v_3$}