Tridiagonal System Solvers Internal Report

TítuloTridiagonal System Solvers Internal Report
AutoresJ. Lamas-Rodríguez, D. B. Heras, M.Bóo, F. Argüello
TipoTechnical report
Fonte 2011.
AbstractNowadays GPUs are high-performance, many-core processors that can be used to accelerate a wide range of applications. Nevertheless, the direct projection of algorithms not speci cally designed for this architecture o ers scant results. This is also true for the resolution of tridiagonal systems of equations. In this paper we analyze the projection of four known parallel tridiagonal system solvers on the GPU: cyclic reduction, recursive doubling, Bondeli's divide and conquer algorithm, and Wang's partition method. We propose several optimized GPU implementations that use the CUDA framework and eciently exploit the thousands of threads available on the GPU, the block model, the shared memory space and the coalesced accesses to global memory. Three of the above-mentioned algorithms obtain better execution times than the Thomas algorithm computed on the CPU. In particular, one of our proposals of cyclic reduction yields the best results, with a speed-up value of 23.4x over the Thomas algorithm.
Palabras chaveParallel tridiagonal linear system solvers, GPU, CUDA, cyclic reduction, recursive doubling, Bondeli's divide and conquer algorithm, Wang's partition method.

Programas científicos