For each subdomain $\mathcal L_{\overline\Omega, j}$:
\begin{itemize}[labelsep=1em, itemindent=-0.1em]
\item Parallel initialization for all $\vec x \in\mathcal L_{\overline\Omega, j}$, excluding the ghost lattice sites (step \ref{algitem:initialization} in \cref{alg:LBM}).
\item Perform initialization for all $\vec x \in\mathcal L_{\overline\Omega, j}$ in parallel, excluding the ghost lattice sites (step \ref{algitem:initialization} in \cref{alg:LBM}).
Copy distribution functions in the \ic{A} array on the interfaces between neighboring subdomains.
@@ -434,6 +434,7 @@ The data structure used for all of these arrays, \ic{TNL::Containers::Distribute
Recall that the global array corresponding to the lattice $\mathcal L_{\overline\Omega}$ must be decomposed in a \emph{structured conforming} manner, but each rank can be assigned multiple subdomains and the rank numbering is arbitrary.
The decomposition of the aforementioned arrays in LBM must be consistent.
Hence, we collect all distributed data structures in a class called \ic{LBM_BLOCK} that contains all data associated to one subdomain: local subarrays, subdomain sizes and offsets, indices of neighboring blocks and ranks that own them.
\todo{add figure: diagram of the classes/objects and their relations}
To facilitate computations on GPU accelerators, two objects for each array are needed to represent data in different memory spaces.
One object allocates the array data in the host memory where it can be processed by the CPU, and the other object allocates the array data in the device memory where it can be processed by the GPU accelerator.