Commit 093e6570 authored by Jakub Klinkovský's avatar Jakub Klinkovský
Browse files

working on the linear systems chapter - review of iterative methods

parent 7d835916
Loading
Loading
Loading
Loading
+40 −2
Original line number Diff line number Diff line
\inline{No supporting paper since the CWY-GMRES paper was discarded...}
\inline{mention direct methods vs iterative methods in the intro}

This chapter is dedicated to methods for the solution of large and sparse systems of linear equations.
It does not contain any novel work on the methods by the author, but provides an overview of the state of the art.
High performance solvers for sparse linear systems are an important building block for the numerical methods developed in the following chapter.

The methods for the solution of linear systems are typically classified into \emph{direct methods} \cite{li2022direct} and \emph{iterative methods} \cite{greenbaum:1997iterative,saad:2003iterative}.
Iterative methods have become preferable in many applications, especially where very large linear systems arise, because they can provide significantly lower computational cost and they are easier to implement efficiently for high performance parallel computing systems \cite{saad:2003iterative}.
In the following sections, we summarize the state of the art iterative methods (\cref{sec:iterative methods}), preconditioning techniques (\cref{sec:preconditioning}) and software packages (\cref{sec:linear solvers software}) providing ready-to-use implementations of these.
In the final \cref{sec:distributed sparse matrix}, we describe the concept of a distributed sparse matrix with more implementation details to provide a link between a distributed unstructured mesh and assembling the linear system for applications such as partial differential equation discretized on the mesh.

\section{Iterative methods}
\inline{make a summary of the state of the art}
\label{sec:iterative methods}

The first iterative methods that were used for solving linear systems are the so called \emph{stationary methods} such as the Jacobi, Gauss--Seidel, and SOR methods \cite{saad:2003iterative}.
Although they have been overcome by more efficient methods, they are still often used thanks to their simplicity as part of more complex methods (such as smoothers in multigrid methods).

Probably the most important class of iterative methods nowadays are the \emph{Krylov subspace methods}, which include some of the most popular iterative methods used.
The conjugate gradients (CG) method remains the obvious choice for systems with a symmetric positive-definite matrix as it is well-researched \cite{greenbaum:1997iterative,saad:2003iterative} and provides good convergence and error estimates.
Options for symmetric indefinite systems include the MINRES and SYMMLQ methods \cite{paige:1975}.
Similar and more general methods for systems with a non-symmetric matrix are the biconjugate gradient method (BiCG) and its stabilized transpose-free variant (BiCGstab) \cite{vandervorst:1992}.
GMRES \cite{saad:1986gmres} is another famous and well-researched method providing many variants for implementation, such as \cite{walker:1988,bai:1994,baker:2005,baker:2009,matinfar:2012}.
The flexible GMRES algorithm \cite{saad:1993flexible} is a notable variant that allows more general preconditioning techniques to be applied in the GMRES algorithm.
Methods based on augmentation and deflation \cite{erhel:1996,chapman:1997,giraud:2010,gaul:2013} have been developed to improve the convergence properties of restarted GMRES.
See \cite{simoncini:2007} for a thorough review of the development in Krylov subspace methods.

The aforementioned Krylov subspace methods are all suitable for modern high-performance computing platforms, either in their original formulation or with modifications improving their performance.
Communication-avoiding variants have been developed for the CG, BiCG, and GMRES methods \cite{hoemmen:2010,yamazaki:2014,yamazaki:2015,ghysels:2013}.
There are also improved BiCGstab variants for parallel distributed systems \cite{yang:2002,krasnopolsky:2010,cools:2017,zhu:2014}.
Krylov subspace methods can also benefit from various optimizations developed specifically for GPU accelerators \cite{anzt:2014,yamazaki:2015,gao:2017}.

The induced dimension reduction methods, denoted as IDR($s$), are a new family of Krylov subspace methods for solving large non-symmetric systems of linear equations \cite{sonneveld:2009,vangijzen:2011,rendel:2013}.
Although they were reported to outperform traditional methods such as BiCGstab for several problems, they are still much less popular.
In general, the research focuses on the development of more efficient preconditioners rather than new iterative methods themselves.

\inline{domain decomposition and multigrid methods}

\inline{stopping criteria?}

\section{Preconditioning techniques}
\label{sec:preconditioning}

\inline{general intro, discuss left-preconditioning vs right-preconditioning}
\inline{make a summary of the state of the art}
\inline{stationary methods (Jacobi, SOR, SSOR), incomplete factorizations (IC and ILU), polynomial, sparse approximate inverse, multigrid (MG, problem-specific) or algebraic multigrid (AMG), GenEO, ...}

\section{Software packages}
\label{sec:linear solvers software}

\inline{incomplete list of notable software which implements iterative methods or preconditioners -- features and development status}
\inline{state of the art: Hypre}
\inline{TNL status}

\section{Distributed sparse matrix}
\label{sec:distributed sparse matrix}

\inline{introduce the section in the context of the chapter}

+386 −35
Original line number Diff line number Diff line
@@ -14,14 +14,17 @@
  year   = {2014},
}

@Article{langr:2012fake,
@InProceedings{langr:2012fake,
  author    = {Langr, Daniel and Tvrd{\'\i}k, Pavel and Dytrych, Tom{\'a}{\v{s}} and Draayer, Jerry},
  journal   = {Objects, Models, Components, Patterns},
  booktitle = {International Conference on Modelling Techniques and Tools for Computer Performance Evaluation},
  title     = {Fake run-time selection of template arguments in {C++}},
  year      = {2012},
  editor    = {Carlo A. Furia and Sebastian Nanz},
  pages     = {140--154},
  publisher = {Springer, Berlin, Heidelberg},
  series    = {Lecture Notes in Computer Science},
  volume    = {7304},
  doi       = {10.1007/978-3-642-30561-0_11},
  publisher = {Springer},
}

@Manual{nvidia:cuda,
@@ -87,23 +90,14 @@
  isbn={978-80-227-3742-5},
}

@Book{saad:2003iterative,
  author    = {Saad, Yousef},
  publisher = {SIAM},
  title     = {Iterative methods for sparse linear systems},
  year      = {2003},
  isbn      = {0-89871-534-2},
  doi       = {10.1137/1.9780898718003},
}

# explanation of the CSR format, including references for historical development
@InProceedings{bulucc:2009parallel,
  author    = {Bulu{\c{c}}, Aydin and Fineman, Jeremy T. and Frigo, Matteo and Gilbert, John R. and Leiserson, Charles E.},
  booktitle = {Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures},
  title     = {Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks},
  year      = {2009},
  organization = {ACM},
  pages     = {233--244},
  publisher = {ACM},
  doi       = {10.1145/1583991.1584053},
}

@@ -119,6 +113,363 @@
  publisher = {IEEE},
}

@Report{li2022direct,
  title    = {Direct solvers for sparse matrices},
  author   = {Li, X.},
  url      = {https://portal.nersc.gov/project/sparse/superlu/SparseDirectSurvey.pdf},
  year     = {2022},
  month    = apr,
  keywords = {manual},
}

@Book{greenbaum:1997iterative,
  author    = {Greenbaum, Anne},
  publisher = {SIAM},
  title     = {Iterative methods for solving linear systems},
  year      = {1997},
  isbn      = {978-0-89871-396-1},
  doi       = {10.1137/1.9781611970937},
}

@Book{saad:2003iterative,
  author    = {Saad, Yousef},
  publisher = {SIAM},
  title     = {Iterative methods for sparse linear systems},
  year      = {2003},
  isbn      = {0-89871-534-2},
  doi       = {10.1137/1.9780898718003},
}

@Article{simoncini:2007,
  author    = {Simoncini, Valeria and Szyld, Daniel B.},
  journal   = {Numerical Linear Algebra with Applications},
  title     = {Recent computational developments in {Krylov} subspace methods for linear systems},
  year      = {2007},
  number    = {1},
  pages     = {1--59},
  volume    = {14},
  doi       = {10.1002/nla.499},
  publisher = {Wiley Online Library},
}

@Article{paige:1975,
  author    = {Paige, Christopher C. and Saunders, Michael A.},
  journal   = {SIAM Journal on Numerical Analysis},
  title     = {Solution of sparse indefinite systems of linear equations},
  year      = {1975},
  number    = {4},
  pages     = {617--629},
  volume    = {12},
  doi       = {10.1137/0712047},
  publisher = {SIAM},
}

@Article{saad:1986gmres,
  author    = {Saad, Yousef and Schultz, Martin H.},
  journal   = {SIAM Journal on Scientific and Statistical Computing},
  title     = {{GMRES}: A generalized minimal residual algorithm for solving nonsymmetric linear systems},
  year      = {1986},
  number    = {3},
  pages     = {856--869},
  volume    = {7},
  doi       = {10.1137/0907058},
  publisher = {SIAM},
}

@Article{walker:1988,
  author    = {Walker, Homer F.},
  journal   = {SIAM Journal on Scientific and Statistical Computing},
  title     = {Implementation of the {GMRES} method using {Householder} transformations},
  year      = {1988},
  number    = {1},
  pages     = {152--163},
  volume    = {9},
  doi       = {10.1137/0909010},
  publisher = {SIAM},
}

@Article{bai:1994,
  author    = {Bai, Zhaojun and Hu, Dan and Reichel, Lothar},
  journal   = {IMA Journal of Numerical Analysis},
  title     = {A {Newton} basis {GMRES} implementation},
  year      = {1994},
  number    = {4},
  pages     = {563--581},
  volume    = {14},
  doi       = {10.1093/imanum/14.4.563},
  publisher = {Oxford University Press},
}

@Article{matinfar:2012,
  author    = {Matinfar, Mashaallah and Zareamoghaddam, H. and Eslami, M. and Saeidy, M.},
  journal   = {Computers \& Mathematics with Applications},
  title     = {{GMRES} implementations and residual smoothing techniques for solving ill-posed linear systems},
  year      = {2012},
  number    = {1},
  pages     = {1--13},
  volume    = {63},
  doi       = {10.1016/j.camwa.2011.09.022},
  publisher = {Elsevier},
}

@Article{baker:2005,
  author    = {Baker, Allison H. and Jessup, Elizabeth R. and Manteuffel, Thomas},
  journal   = {SIAM Journal on Matrix Analysis and Applications},
  title     = {A technique for accelerating the convergence of restarted {GMRES}},
  year      = {2005},
  number    = {4},
  pages     = {962--984},
  volume    = {26},
  doi       = {10.1137/S0895479803422014},
  publisher = {SIAM},
}

@Article{baker:2009,
  author    = {Baker, Allison H. and Jessup, Elizabeth R. and Kolev, Tz. V.},
  journal   = {Journal of computational and applied mathematics},
  title     = {A simple strategy for varying the restart parameter in {GMRES(m)}},
  year      = {2009},
  number    = {2},
  pages     = {751--761},
  volume    = {230},
  doi       = {10.1016/j.cam.2009.01.009},
  publisher = {Elsevier},
}

@PhdThesis{hoemmen:2010,
  author = {Hoemmen, Mark Frederick},
  school = {EECS Department, University of California, Berkeley},
  title  = {Communication--avoiding {Krylov} subspace methods},
  year   = {2010},
  month  = apr,
  type   = {phdthesis},
  number = {UCB/EECS-2010-37},
  url    = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-37.html},
}

@Article{saad:1993flexible,
  author    = {Saad, Youcef},
  journal   = {SIAM Journal on Scientific Computing},
  title     = {A flexible inner-outer preconditioned {GMRES} algorithm},
  year      = {1993},
  number    = {2},
  pages     = {461--469},
  volume    = {14},
  doi       = {10.1137/0914028},
  publisher = {SIAM},
}

@Article{erhel:1996,
  author    = {Erhel, Jocelyne and Burrage, Kevin and Pohl, Bert},
  journal   = {Journal of Computational and Applied Mathematics},
  title     = {Restarted {GMRES} preconditioned by deflation},
  year      = {1996},
  number    = {2},
  pages     = {303--318},
  volume    = {69},
  doi       = {10.1016/0377-0427(95)00047-X},
  publisher = {Elsevier},
}

@Article{chapman:1997,
  author    = {Chapman, Andrew and Saad, Yousef},
  journal   = {Numerical linear algebra with applications},
  title     = {Deflated and augmented {Krylov} subspace techniques},
  year      = {1997},
  number    = {1},
  pages     = {43--66},
  volume    = {4},
  doi       = {10.1002/(SICI)1099-1506(199701/02)4:1<43::AID-NLA99>3.0.CO;2-Z},
  publisher = {Wiley Online Library},
}

@Article{giraud:2010,
  author    = {Giraud, Luc and Gratton, Serge and Pinel, Xavier and Vasseur, Xavier},
  journal   = {SIAM Journal on Scientific Computing},
  title     = {Flexible {GMRES} with deflated restarting},
  year      = {2010},
  number    = {4},
  pages     = {1858--1878},
  volume    = {32},
  doi       = {10.1137/080741847},
  publisher = {SIAM},
}

@Article{gaul:2013,
  author    = {Gaul, Andr{\'e} and Gutknecht, Martin H. and Liesen, Jorg and Nabben, Reinhard},
  journal   = {SIAM Journal on Matrix Analysis and Applications},
  title     = {A framework for deflated and augmented {Krylov} subspace methods},
  year      = {2013},
  number    = {2},
  pages     = {495--518},
  volume    = {34},
  doi       = {10.1137/110820713},
  publisher = {SIAM},
}

@InProceedings{yamazaki:2014,
  author    = {Yamazaki, Ichitaro and Tomov, Stanimire and Dongarra, Jack},
  booktitle = {5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems},
  title     = {Deflation strategies to improve the convergence of communication-avoiding {GMRES}},
  year      = {2014},
  pages     = {39--46},
  publisher = {IEEE},
  doi       = {10.1109/ScalA.2014.6},
}

@InProceedings{yamazaki:2015,
  author    = {Yamazaki, Ichitaro and Tomov, Stanimire and Dong, Tingxing and Dongarra, Jack},
  booktitle = {International Conference on High Performance Computing for Computational Science},
  title     = {Mixed-precision orthogonalization scheme and adaptive step size for improving the stability and performance of {CA-GMRES} on {GPU}s},
  year      = {2015},
  pages     = {17--30},
  publisher = {Springer},
  series    = {Lecture Notes in Computer Science},
  volume    = {8969},
  doi       = {10.1007/978-3-319-17353-5_2},
}

@Article{ghysels:2013,
  author    = {Ghysels, Pieter and Ashby, Thomas J. and Meerbergen, Karl and Vanroose, Wim},
  journal   = {SIAM Journal on Scientific Computing},
  title     = {Hiding global communication latency in the {GMRES} algorithm on massively parallel machines},
  year      = {2013},
  number    = {1},
  pages     = {C48--C71},
  volume    = {35},
  doi       = {10.1137/12086563X},
  publisher = {SIAM},
}

@Article{vandervorst:1992,
  author    = {Van der Vorst, Henk A.},
  journal   = {SIAM Journal on Scientific and Statistical Computing},
  title     = {{Bi-CGSTAB}: A fast and smoothly converging variant of {Bi-CG} for the solution of nonsymmetric linear systems},
  year      = {1992},
  number    = {2},
  pages     = {631--644},
  volume    = {13},
  doi       = {10.1137/0913035},
  publisher = {SIAM},
}

# parallel BiCGstab variants
@InProceedings{yang:2002,
  author    = {Yang, Laurence Tianruo and Brent, Richard P.},
  booktitle = {Fifth International Conference on Algorithms and Architectures for Parallel Processing},
  title     = {The improved {BiCGStab} method for large and sparse unsymmetric linear systems on parallel distributed memory architectures},
  year      = {2002},
  pages     = {324--328},
  publisher = {IEEE},
  doi       = {10.1109/ICAPP.2002.1173595},
}

@Article{krasnopolsky:2010,
  author    = {Krasnopolsky, Boris},
  journal   = {Procedia Computer Science},
  title     = {The reordered {BiCGStab} method for distributed memory computer systems},
  year      = {2010},
  number    = {1},
  pages     = {213--218},
  volume    = {1},
  doi       = {10.1016/j.procs.2010.04.024},
  publisher = {Elsevier},
}

@Article{cools:2017,
  author    = {Cools, Siegfried and Vanroose, Wim},
  journal   = {Parallel Computing},
  title     = {The communication-hiding pipelined {BiCGStab} method for the parallel solution of large unsymmetric linear systems},
  year      = {2017},
  pages     = {1--20},
  volume    = {65},
  doi       = {10.1016/j.parco.2017.04.005},
  publisher = {Elsevier},
}

@Article{zhu:2014,
  author   = {Sheng-Xin Zhu and Tong-Xiang Gu and Xing-Ping Liu},
  journal  = {Computers \& Mathematics with Applications},
  title    = {Minimizing synchronizations in sparse iterative solvers for distributed supercomputers},
  year     = {2014},
  issn     = {0898-1221},
  number   = {1},
  pages    = {199--209},
  volume   = {67},
  abstract = {Eliminating synchronizations is one of the important techniques related to minimizing communications for modern high performance computing. This paper discusses principles of reducing communications due to global synchronizations in sparse iterative solvers on distributed supercomputers. We demonstrate how to minimize global synchronizations by rescheduling a typical Krylov subspace method. The benefit of minimizing synchronizations is shown in theoretical analysis and verified by numerical experiments. The experiments also show the local communications for some structured sparse matrix–vector multiplications and global communications in the underlying supercomputers increase in the order P1/2.5 and P4/5 respectively, where P is the number of processors.},
  doi      = {10.1016/j.camwa.2013.11.008},
}

@InProceedings{anzt:2014,
  author    = {Anzt, Hartwig and Sawyer, William and Tomov, Stanimire and Luszczek, Piotr and Yamazaki, Ichitaro and Dongarra, Jack},
  booktitle = {2014 IEEE International Parallel \& Distributed Processing Symposium Workshops},
  title     = {Optimizing {Krylov} subspace solvers on graphics processing units},
  year      = {2014},
  publisher = {IEEE},
  pages     = {941--949},
  doi       = {10.1109/IPDPSW.2014.107},
}

@Article{gao:2017,
  author    = {Gao, Jiaquan and Zhou, Yuanshen and He, Guixia and Xia, Yifei},
  journal   = {Parallel Computing},
  title     = {A multi-{GPU} parallel optimization model for the preconditioned conjugate gradient algorithm},
  year      = {2017},
  pages     = {1--16},
  volume    = {63},
  doi       = {10.1016/j.parco.2017.04.003},
  publisher = {Elsevier},
}

@Article{sonneveld:2009,
  author    = {Sonneveld, Peter and van Gijzen, Martin B.},
  journal   = {SIAM Journal on Scientific Computing},
  title     = {{IDR(\textit{s})}: A family of simple and fast algorithms for solving large nonsymmetric systems of linear equations},
  year      = {2009},
  number    = {2},
  pages     = {1035--1062},
  volume    = {31},
  doi       = {10.1137/070685804},
  publisher = {SIAM},
}

@Article{vangijzen:2011,
  author    = {van Gijzen, Martin B. and Sonneveld, Peter},
  journal   = {ACM Transactions on Mathematical Software},
  title     = {Algorithm 913: An elegant {IDR(\textit{s})} variant that efficiently exploits biorthogonality properties},
  year      = {2011},
  issn      = {0098-3500},
  month     = dec,
  number    = {1},
  volume    = {38},
  address   = {New York, NY, USA},
  doi       = {10.1145/2049662.2049667},
  publisher = {Association for Computing Machinery},
}

@Article{rendel:2013,
  author    = {Rendel, Olaf and Rizvanolli, Anisa and Zemke, Jens-Peter M.},
  journal   = {Linear Algebra and its Applications},
  title     = {{IDR}: A new generation of {Krylov} subspace methods?},
  year      = {2013},
  number    = {4},
  pages     = {1040--1061},
  volume    = {439},
  doi       = {10.1016/j.laa.2012.11.021},
  publisher = {Elsevier},
}

@Article{bauer:2016,
  author    = {Bauer, Petr and Klement, Vladimír and Oberhuber, Tomáš and Žabka, Vítězslav},
  journal   = {Computer Physics Communications},
  title     = {Implementation of the {Vanka}-type multigrid solver for the finite element approximation of the {Navier--Stokes} equations on {GPU}},
  year      = {2016},
  pages     = {50--56},
  volume    = {200},
  doi       = {10.1016/j.cpc.2015.10.021},
  publisher = {Elsevier},
}

# Metis library
@Article{karypis:1998fast,
  author    = {Karypis, George and Kumar, Vipin},
@@ -145,8 +496,8 @@
  booktitle = {Proceedings of the 1969 24th national conference},
  title     = {Reducing the bandwidth of sparse symmetric matrices},
  year      = {1969},
  organization = {ACM},
  pages     = {157--172},
  publisher = {ACM},
  doi       = {10.1145/800195.805928},
}

@@ -465,7 +816,7 @@
  title     = {A {GPU}-adapted structure for unstructured grids},
  year      = {2017},
  number    = {2},
  organization = {Wiley Online Library},
  publisher = {Wiley Online Library},
  pages     = {495--507},
  volume    = {36},
  doi       = {10.1111/cgf.13144},