@@ -119,9 +119,86 @@ More generally, when solving sequences of similar linear systems, such as those
\section{Software packages}
\label{sec:linear solvers software}
\inline{incomplete list of notable software which implements iterative methods or preconditioners -- features and development status}
\inline{state of the art: Hypre}
\inline{TNL status}
The software listed here can be divided into two main groups: the first containing software dedicated specifically to providing iterative methods and/or preconditioners for linear systems, and the other group consisting of software with larger ambitions, such as scientific computing frameworks.
Packages from the latter group can often be interfaced with smaller libraries from the former group to supplement their native algorithms.
In both groups, we focus on open-source software with active development status.
Each package is listed with a brief description and overview of the main features.
In the final subsection, we summarize the current status in the Template Numerical Library.
\subsection{Dedicated projects}
Hypre \cite{Hypre:library,Hypre:design1,Hypre:design2} is an open-source library of high performance preconditioners and iterative solvers for large sparse linear systems.
Additionally, it provides several interfaces to create linear systems based on structured or semi-structured grids, finite element discretizations, or general linear-algebraic interface.
The implemented solvers include common Krylov subspace methods such as CG, GMRES, and BiCGstab.
Available preconditioners are algebraic multigrid (BoomerAMG \cite{Hypre:BoomerAMG}), several variants of incomplete LU factorization \cite{karypis:1997,hysom:1999,hysom:2001}, sparse approximate inverse \cite{chow:2000,kolotilina:1993,janna:2015}, and other.
The library is implemented in the C language, uses MPI \cite{mpi:3.1} for distributed computing and optionally uses CUDA \cite{nvidia:cuda}, HIP \cite{amd:hip}, or SYCL \cite{khronos:sycl} for GPU acceleration.
PARALUTION \cite{Paralution,Paralution:manual} is an open-source library implemented in the \C++ language providing parallel iterative solvers and preconditioners.
Additionally, it provides plugins for several high-level libraries, including deal.II \cite{bangerth:2007deal.II} and OpenFOAM \cite{openfoam:8.0}.
Among the implemented iterative solvers are stationary methods (Jacobi, Gauss--Seidel, symmetric Gauss--Seidel, SOR, SSOR), Krylov subspace solvers (CG, BiCGstab, GMRES, IDR), geometric and algebraic multigrid, and other.
Available preconditioners are stationary methods, several variants of incomplete LU factorization, additive Schwarz preconditioner, and other.
The library also provides multiple sparse matrix formats, including CSR and Ellpack.
PARALUTION uses MPI \cite{mpi:3.1} for distributed computing and provides several back-ends for parallel execution: OpenMP \cite{OpenMP}, CUDA \cite{nvidia:cuda}, OpenCL \cite{khronos:opencl}.
It was also ported to the ROCm \cite{amd:rocm} platform as rocALUTION \cite{rocALUTION}.
AmgX \cite{nvidia:amgx,naumov:2015AmgX} is an open-source library of GPU accelerated algebraic multigrid and preconditioned iterative methods developed by Nvidia.
Besides algebraic multigrid, it provides also standard Krylov subspace methods (CG, BiCGstab, GMRES) and several preconditioners/smoothers (block Jacobi, Gauss--Seidel, ILU, polynomial).
AmgX uses CUDA \cite{nvidia:cuda} for GPU acceleration and MPI \cite{mpi:3.1} for distributed computing.
The algebraic multigrid implementation is partially based on the Hypre library.
AMGCL \cite{demidov:2019,demidov:2020} is an open-source header-only \C++ library for solving large sparse linear systems with algebraic multigrid.
It also provides several Krylov subspace methods (CG, BiCGstab, GMRES, IDR) and preconditioners/smoothers (Jacobi, Gauss--Seidel, Chebyshev, ILU, SPAI, and some problem-specific preconditioners for Navier--Stokes and reservoir simulations).
Furthermore, it provides matrix adapters for wrapping data structures of various other libraries, including Eigen \cite{eigen}, Trilinos Epetra \cite{trilinos:epetra} and uBlas \cite{ublas}.
AMGCL builds the algebraic multigrid hierarchy on a CPU and then transfers it to one of the provided back-ends.
The currently supported acceleration frameworks are OpenMP \cite{OpenMP}, CUDA \cite{nvidia:cuda}, and OpenCL \cite{khronos:opencl}.
AMGCL also supports MPI \cite{mpi:3.1} for distributed computing.
Ginkgo \cite{ginkgo-toms-2022} is an open-source linear algebra library implemented in modern \C++.
It provides multiple sparse matrix formats, Krylov solvers (CG, BiCGstab, GMRES, IDR), and preconditioners (Jacobi, IC, ILU \cite{anzt:2018parilut}, incomplete sparse approximate inverse \cite{anzt:2018}).
There is also a preliminary implementation of an algebraic multigrid solver and preconditioner.
Ginkgo supports native parallel execution via OpenMP \cite{OpenMP}, CUDA \cite{nvidia:cuda}, HIP \cite{amd:hip}, or SYCL \cite{khronos:sycl}.
Support for distributed computing via MPI \cite{mpi:3.1} in Ginkgo is a work in progress.
There are many smaller open-source software packages which serve as the basis for the development of new or improved numerical methods.
The following projects are some of the most successful ones with continuing development.
The PyAMG \cite{PyAMG} project implements the algebraic multigrid method and supporting tools in the Python language, with the help of \C++ for performance-critical operations.
RAPtor \cite{RAPtor} is a \C++ implementation of parallel algebraic multigrid based on MPI \cite{mpi:3.1}.
Monolis \cite{monolis} is a monolithic domain decomposition based linear system solver implemented in Fortran.
Finally, BDDCML \cite{BDDCML,sousedik:2013} is a massively parallel linear system solver based on the adaptive multilevel BDDC method.
\subsection{Large frameworks}
Trilinos \cite{trilinos,heroux:2005,heroux:2006,heroux:2012} is a large collection of open-source packages with various objectives in scientific computing.
The individual packages are highly inter-operable, but not strictly interdependent, i.e., each package depends on interfaces that can be satisfied by multiple other packages rather than on an explicit implementation.
Trilinos packages are developed in the \C++ language and include many direct and iterative linear system solvers, ILU-type preconditioners, smoothers, multigrid and domain decomposition methods.
Trilinos supports distributed computing using MPI \cite{mpi:3.1}, multi-core parallel execution using a variety of approaches, as well as GPU acceleration.
PETSc \cite{petsc-web-page,petsc-user-ref,petsc-efficient} is an open-source library of data structures and algorithms for parallel solution of scientific problems modeled by partial differential equations.
Unlike Trilinos, PETSc is a monolithic library developed in the C language.
Its \ic{KSP} module provides many parallel and sequential, direct and iterative solvers for linear systems and the \ic{PC} module provides preconditioners such as stationary methods, ILU factorizations, algebraic multigrid, or BDDC.
OpenFOAM \cite{openfoam:8.0,jasak:2007openfoam} is a large open-source multi-physics package based on the finite volume method.
It also provides its own implementation of various iterative methods and preconditioners for linear systems, such as stationary methods, incomplete factorizations, CG and BiCGstab methods, and geometric agglomerated algebraic multigrid.
OpenFOAM itself does not support GPU acceleration, though the possibilities are being explored via external extensions \cite{bna2020petsc4foam,posey:2019,martineau:2020}.
DUNE (Distributed and Unified Numerics Environment) \cite{bastian:2006DUNE} is a modular \C++ library for solving partial differential equations with finite element, finite volume, or finite difference methods.
Its ISTL (Iterative Solver Template Library) module provides implementations of several Krylov subspace solvers and preconditioners, including algebraic multigrid.
\subsection{Template Numerical Library}
In this subsection, we summarize the current status related to the solution of linear systems in the TNL \cite{oberhuber:2021tnl,TNL:documentation} library.
Unless stated otherwise, all implementations support any matrix format implemented in TNL, distributed computing with MPI \cite{mpi:3.1}, and GPU acceleration using CUDA \cite{nvidia:cuda}.
TNL does not implement any direct method for the solution of large, dense or sparse, linear systems.
The currently implemented stationary iterative methods are Jacobi and SOR (the latter works only in a shared memory environment).
The currently implemented Krylov subspace methods are CG, BiCGstab, GMRES, TFQMR, IDR.
The only preconditioner fully implemented in TNL is the Jacobi (diagonal) preconditioner.
Additionally, TNL provides sequential implementations of the ILU(0) and ILUT factorizations that can also be applied as a block-Jacobi preconditioner for distributed matrices.
To easily utilize the state of the art solvers and preconditioners in TNL, we started implementing \emph{wrappers} for external libraries with more advanced features.
There are currently finished wrappers for iterative methods and preconditioners from the Hypre \cite{Hypre:library,Hypre:design1,Hypre:design2} library (they require the CSR matrix format with specific conventions that are described in the following section).
Additionally, TNL provides a wrapper for UMFPACK \cite{davis:2004} which applies a direct method to solve the linear system (it requires a non-distributed matrix in the CSR format).
author={Ga\"{e}l Guennebaud and Beno\^{i}t Jacob and others},
title={Eigen v3},
url={https://eigen.tuxfamily.org},
}
@Online{ublas,
author={Walter, Joerg and Koch, Mathias and others},
title={{uBLAS}},
url={https://github.com/boostorg/ublas},
}
@Online{trilinos,
author={{Trilinos project team}},
title={The {T}rilinos {P}roject {W}ebsite},
url={https://trilinos.github.io},
}
@Online{trilinos:epetra,
author={{Epetra project team}},
title={Trilinos {Epetra} package},
url={https://trilinos.github.io/epetra.html},
}
@InProceedings{heroux:2006,
author={Heroux, Michael A. and Sala, Marzio},
booktitle={Applied Parallel Computing. State of the Art in Scientific Computing},
title={The design of {Trilinos}},
year={2006},
editor={Dongarra, Jack and Madsen, Kaj and Waśniewski, Jerzy},
pages={620--628},
publisher={Springer, Berlin, Heidelberg},
series={Lecture Notes in Computer Science},
abstract={The Trilinos Project is an effort to facilitate the design, development, integration and ongoing support of mathematical software libraries within an object-oriented framework for the solution of large-scale, complex multi-physics engineering and scientific problems.},
doi={10.1007/11558958_74},
isbn={978-3-540-33498-9},
}
@Article{heroux:2012,
author={Heroux, Michael A. and Willenbring, James M.},
journal={Scientific Programming},
title={A new overview of the {Trilinos} project},
year={2012},
month={apr},
number={2},
pages={83--88},
volume={20},
abstract={Since An Overview of the Trilinos Project [ACM Trans. Math. Softw. 313 2005, 397--423] was published in 2005, Trilinos has grown significantly. It now supports the development of a broad collection of libraries for scalable computational science and engineering applications, and a full-featured software infrastructure for rigorous lean/agile software engineering. This growth has created significant opportunities and challenges. This paper focuses on some of the most notable changes to the Trilinos project in the last few years.At the time of the writing of this article, the current release version of Trilinos was 10.12.2.},
doi={10.1155/2012/408130},
publisher={IOS Press},
}
@Article{heroux:2005,
author={Heroux, Michael A. and Bartlett, Roscoe A. and Howle, Vicki E. and Hoekstra, Robert J. and Hu, Jonathan J. and Kolda, Tamara G. and Lehoucq, Richard B. and Long, Kevin R. and Pawlowski, Roger P. and Phipps, Eric T. and Salinger, Andrew G. and Thornquist, Heidi K. and Tuminaro, Ray S. and Willenbring, James M. and Williams, Alan and Stanley, Kendall S.},
journal={ACM Transactions on Mathematical Software},
title={An overview of the {Trilinos} project},
year={2005},
issn={0098-3500},
month={sep},
number={3},
pages={397--423},
volume={31},
abstract={The Trilinos Project is an effort to facilitate the design, development, integration, and ongoing support of mathematical software libraries within an object-oriented framework for the solution of large-scale, complex multiphysics engineering and scientific problems. Trilinos addresses two fundamental issues of developing software for these problems: (i) providing a streamlined process and set of tools for development of new algorithmic implementations and (ii) promoting interoperability of independently developed software.Trilinos uses a two-level software structure designed around collections of packages. A Trilinos package is an integral unit usually developed by a small team of experts in a particular algorithms area such as algebraic preconditioners, nonlinear solvers, etc. Packages exist underneath the Trilinos top level, which provides a common look-and-feel, including configuration, documentation, licensing, and bug-tracking.Here we present the overall Trilinos design, describing our use of abstract interfaces and default concrete implementations. We discuss the services that Trilinos provides to a prospective package and how these services are used by various packages. We also illustrate how packages can be combined to rapidly develop new algorithms. Finally, we discuss how Trilinos facilitates high-quality software engineering practices that are increasingly required from simulation software.},
doi={10.1145/1089014.1089021},
publisher={Association for Computing Machinery},
}
@Online{petsc-web-page,
author={Balay, Satish and Abhyankar, Shrirang and Adams, Mark F. and Benson, Steven and Brown, Jed and Brune, Peter and Buschelman, Kris and Constantinescu, Emil M. and Dalcin, Lisandro and Dener, Alp and Eijkhout, Victor and Gropp, William D. and Hapla, V\'{a}clav and Isaac, Tobin and Jolivet, Pierre and Karpeev, Dmitry and Kaushik, Dinesh and Knepley, Matthew G. and Kong, Fande and Kruger, Scott and May, Dave A. and McInnes, Lois Curfman and Mills, Richard Tran and Mitchell, Lawrence and Munson, Todd and Roman, Jose E. and Rupp, Karl and Sanan, Patrick and Sarich, Jason and Smith, Barry F. and Zampini, Stefano and Zhang, Hong and Zhang, Hong and Zhang, Junchao},
title={{PETS}c web page},
url={https://petsc.org/},
year={2022},
}
@TechReport{petsc-user-ref,
author={Balay, Satish and Abhyankar, Shrirang and Adams, Mark F. and Benson, Steven and Brown, Jed and Brune, Peter and Buschelman, Kris and Constantinescu, Emil and Dalcin, Lisandro and Dener, Alp and Eijkhout, Victor and Gropp, William D. and Hapla, V\'{a}clav and Isaac, Tobin and Jolivet, Pierre and Karpeev, Dmitry and Kaushik, Dinesh and Knepley, Matthew G. and Kong, Fande and Kruger, Scott and May, Dave A. and McInnes, Lois Curfman and Mills, Richard Tran and Mitchell, Lawrence and Munson, Todd and Roman, Jose E. and Rupp, Karl and Sanan, Patrick and Sarich, Jason and Smith, Barry F. and Zampini, Stefano and Zhang, Hong and Zhang, Hong and Zhang, Junchao},