Commit 7aff2dc2 authored by Jakub Klinkovský's avatar Jakub Klinkovský
Browse files

LBM-MHFEM chapter - updated domain decomposition section

parent 38a5c354
Loading
Loading
Loading
Loading
+71 −67
Original line number Diff line number Diff line
@@ -171,28 +171,30 @@ The accuracy of the numerical scheme applied to the conservative form of \cref{e
\label{sec:lbm-mhfem:decomposition}

The combination of a lattice overlapped with an unstructured mesh requires special attention when the solver is run in a distributed fashion, e.g. utilizing multiple GPU accelerators.
Both the lattice and the mesh have to be decomposed into subdomains and each assigned to a GPU.
Sufficiently wide overlapping regions on the lattice subdomains have to be generated to ensure that each GPU can interpolate the velocity field from its lattice subdomains to its mesh subdomain.
Both the lattice and the mesh have to be decomposed into subdomains and each assigned to an MPI rank.
Sufficiently wide overlapping regions on the lattice subdomains have to be generated to ensure that each rank can interpolate the velocity field from its lattice subdomains to its mesh subdomains.
Furthermore, since computations on the lattice and the mesh are never executed concurrently, it is desirable to balance the sizes of the subdomains in order to achieve good computational efficiency.

\Cref{fig:domain_decomposition} illustrates the problems with decomposition on an example involving a non-uniform cuboidal mesh that is refined around the two synthetic plants in configuration EX-1.
Due to limitations of our LBM implementation, only 1D decompositions (i.e., such that all interfaces between two lattice subdomains are planes perpendicular to the $x$-axis) can be considered.
\Cref{fig:domain_decomposition}a shows a naive approach with uniformly sized lattice subdomains (highlighted with rainbow-colored rectangles), which leads to highly non-uniform distribution of mesh cell counts in each subdomain (indicated by percentages below the figure).
In order to solve this balancing problem, we implemented a decomposition strategy which optimizes the lattice as well as mesh subdomains such that each GPU is assigned approximately the same number of lattice sites as well as mesh cells.
The essential idea is to first determine the part of the domain where the lattice and mesh overlap, perform its decomposition such that an optimal mesh decomposition is achieved, and then decompose the remaining parts of the lattice (which do not overlap with the mesh) to add up to the optimal number of lattice sites in each subdomain.
\Cref{fig:lbm-mhfem:decomposition} illustrates the problems with decomposition on an example involving a non-uniform cuboidal mesh that is refined around the two small black rectangles (they correspond to the two synthetic plants in the configuration EX-1 that will be described in \cref{chapter:vapor transport}).
Due to a limitation of our LBM implementation, only 1D decompositions (i.e., such that all interfaces between two lattice subdomains are planes perpendicular to the $x$-axis) can be considered.
\Cref{fig:lbm-mhfem:uniform decomposition} shows a naive approach with uniformly sized lattice subdomains (highlighted with rainbow-colored rectangles), which leads to highly non-uniform distribution of mesh cell counts in each subdomain (indicated by percentages below the figure).
In order to solve this balancing problem, we implemented a decomposition strategy that optimizes the lattice as well as mesh subdomains such that each MPI rank is assigned approximately the same number of lattice sites as well as mesh cells.
The essential idea is to first determine the part of the domain where the lattice and mesh overlap, perform its decomposition such that an optimal mesh decomposition is achieved, and then decompose the remaining parts of the lattice (which do not overlap with the mesh) to add up to the optimal number of lattice sites assigned to each rank.

For a given regular lattice and an unstructured mesh covering the domain $\Omega_1$ and its subdomain $\Omega_2$, respectively, the decomposition procedure (with $N_{\mathrm{ranks}}$ denoting the number of MPI ranks used in the computation and $N_{\mathrm{cells}}$ denoting the total number of mesh cells) can be summarized as follows:
For a given regular lattice and an unstructured mesh covering the domain $\Omega_1$ and its subdomain $\Omega_2$, respectively, the decomposition procedure (with $N_{\mathrm{ranks}}$ denoting the number of MPI ranks used in the computation and $N_{\mathrm{cells}}$ denoting the total number of mesh cells) is summarized in \cref{alg:LBM-MHFEM:decomposition}.

\begin{enumerate}
\begin{algorithm}[Decomposition of lattice overlapped with unstructured mesh]
    \label{alg:LBM-MHFEM:decomposition}
    \begin{algsteps}
        \item
            For all $x$-coordinates of the lattice sites, count the number of mesh cells whose centroid is located left of this $x$-coordinate.
            Use linear interpolation to obtain a continuous interpolant function $F(x)$ that is increasing from 0 to $N_{\mathrm{cells}}$.
        \item
            Find the smallest interval $[x_0, x_{N_{\mathrm{ranks}}}]$ such that $F(x_{N_{\mathrm{ranks}}}) - F(x_0) = N_{\mathrm{cells}}$.
        This interval identifies the part of the lattice that is overlapped by the mesh, i.e., the dark transparent rectangle in \cref{fig:domain_decomposition}.
            This interval identifies the part of the lattice that is overlapped by the mesh, i.e., the dark transparent rectangle in \cref{fig:lbm-mhfem:decomposition}.
        \item
            Find a partition $\{x_0, x_1, \ldots, x_{N_{\mathrm{ranks}} - 1}, x_{N_{\mathrm{ranks}}}\}$ of the interval $[x_0, x_{N_{\mathrm{ranks}}}]$ such that each subinterval contains approximately $N_{\mathrm{cells}} / N_{\mathrm{ranks}}$ mesh cells:
  \begin{enumerate}
            \begin{algsteps}
                \item
                    Define the objective function $f(x_1, \ldots, x_{N_{\mathrm{ranks}} - 1})$ which measures the imbalance of mesh cells included in each subinterval based on the function $F$.
                    %Define the objective function $f: \mathbb R^{N_{\mathrm{ranks}}} \mapsto \mathbb R$.
@@ -208,47 +210,49 @@ For a given regular lattice and an unstructured mesh covering the domain $\Omega
                    Round the solution from $\mathbb R$ to the lattice coordinates (i.e., from \texttt{double} to \texttt{int}).
                    As the rounding does not ensure the optimal result in integer precision, we additionally minimize the objective function in integer precision.
                    We try to iteratively increment/decrement each component of the solution as long as it improves the partition.
  \end{enumerate}
            \end{algsteps}
        \item
            Decompose the remaining parts of the lattice which do not overlap with the mesh.
  Note that these parts of the lattice are decomposed separately in reversed order (i.e., from right to left) in order to allow merging the non-overlapping subdomains with the adjacent mesh-overlapping subdomains (see the red and gray subdomains in \cref{fig:domain_decomposition}b).
\end{enumerate}
            Note that these parts of the lattice are decomposed separately in reversed order (i.e., from right to left) in order to allow merging the non-overlapping subdomains with the adjacent mesh-overlapping subdomains (see the red and gray subdomains in \cref{fig:lbm-mhfem:non-uniform decomposition}).
    \end{algsteps}
\end{algorithm}

\begin{figure}[!t]
    \raggedright
    a) Uniform lattice decomposition
\begin{figure}[tb]
    \begin{subfigure}[b]{\textwidth}
        \caption{Uniform lattice decomposition}
        \label{fig:lbm-mhfem:uniform decomposition}
        \includegraphics[width=\textwidth]{figures/domain_decomposition/lattice-mesh-uniform.crop.png}
        \\ \vspace{-0.5ex}
    \begin{minipage}{\textwidth}
        \begin{minipage}{0.99\textwidth}
            \footnotesize
        \hspace{0.035\textwidth} 12\%
        \hspace{0.072\textwidth} 14\%
        \hspace{0.072\textwidth} 14\%
        \hspace{0.072\textwidth} 14\%
        \hspace{0.072\textwidth} 24\%
        \hspace{0.072\textwidth} 19\%
        \hspace{0.072\textwidth} \hphantom{0}3\%
        \hspace{0.072\textwidth} \hphantom{0}0\%
            \hspace{0.04\textwidth} 12\%
            \hspace{0.085\textwidth} 14\%
            \hspace{0.085\textwidth} 14\%
            \hspace{0.085\textwidth} 14\%
            \hspace{0.085\textwidth} 24\%
            \hspace{0.085\textwidth} 19\%
            \hspace{0.085\textwidth} \hphantom{0}3\%
            \hspace{0.085\textwidth} \hphantom{0}0\%
        \end{minipage}
    \\ \vspace{1ex}
    b) Non-uniform lattice decomposition
    %\includegraphics[width=\textwidth]{data/figures/domain_decomposition/lattice-mesh-balanced.crop.png}
    %\\ \vspace{1ex}
    %c) Non-uniform lattice decomposition with reversed order and merged gray subdomains
        \vspace{1ex}
    \end{subfigure}
    \begin{subfigure}[b]{\textwidth}
        \caption{Non-uniform lattice decomposition}
        \label{fig:lbm-mhfem:non-uniform decomposition}
        \includegraphics[width=\textwidth]{figures/domain_decomposition/lattice-mesh-merged-blocks.crop.png}

    \end{subfigure}
    \caption{
        Domain decompositions of a regular lattice (rainbow-colored subdomains) overlapped with an unstructured mesh (dark transparent rectangle) that is refined around the synthetic plants (two small black rectangles).
        The percentages below the case a) indicate the portion of the total number of mesh cells included in the corresponding lattice subdomain.
        All lattice subdomains in the case b) include 1/8 of the total number of mesh cells.
        Domain decompositions of a regular lattice (rainbow-colored subdomains) overlapped with an unstructured mesh (dark transparent rectangle) that is refined around the two small black rectangles (they correspond to the synthetic plants from \cref{chapter:vapor transport}).
        The percentages below the case (a) indicate the portion of the total number of mesh cells included in the corresponding lattice subdomain.
        All lattice subdomains in the case (b) include 1/8 of the total number of mesh cells.
    }
    \label{fig:domain_decomposition}
    \label{fig:lbm-mhfem:decomposition}
\end{figure}

The result of this decomposition procedure is illustrated in \cref{fig:domain_decomposition}b.
The result of this decomposition procedure is illustrated in \cref{fig:lbm-mhfem:non-uniform decomposition}.
Overall, the decomposition algorithm optimizes the computational cost and memory requirements of each MPI rank at the cost of increased communication due to increased number of lattice subdomains.

\later[inline]{Future work: problem of mapping MPI ranks to GPUs -- quadratic assignment problem, plus we need to get the weights (communication cost between each pair of GPUs) somehow.}
\later{Future work: problem of mapping MPI ranks to GPUs -- quadratic assignment problem, plus we need to get the weights (communication cost between each pair of GPUs) somehow.}


\section{Numerical analysis}