Parallel-in-time Block-Krylov methods to improve node-level performance
Combining parallel-in-time and Block-Krylov methods we present a promising novel approach to increase the node-level performance of PDE solvers for a wide range of instationary problems on modern hardware. In many applications low order methods are still the most common approach to solve PDEs. While they are easy to implement, they are inherently memory bound due to a low arithmetic intensity and thus don’t benefit from the high level of concurrency of modern hardware architectures. To increase the arithmetic intensity, it is necessary to increase the work per matrix entry. For applications which require solving different linear systems with the same operator, Block-Krylov methods offer a mathematical tool to increase the arithmetic intensity and in previous work we added corresponding support to our DUNE linear algebra library. Conceptually, a system for multiple time steps of a time stepping method or multiple Runge-Kutta stages can be reformulated to solve a single matrix equation instead of many linear systems which may be solved using Block-Krylov methods. We introduce the ideas of parallel in time methods, how they can be adapted to increase the local work per node and how they link to block-Krylov methods. We present numerical results that show the the potential, but also the challenges and discuss implementational aspects.