Improving performance of iterative solvers with the AXC format using the Intel Xeon Phi

This work is focused on the application of the new AXC format in iterative algorithms on the Intel Xeon Phi coprocessor to solve linear systems by accelerating the sparse matrix-vector (SpMV) product. These algorithms are the Conjugate Gradient (CG) and the Biconjugate Gradient Stabilised (BiCGS) methods used to solve symmetric and non-symmetric linear systems. Two highly efficient formats were selected to compare the AXC format: the Compressed Sparse Row (CSR) format and the Sliced ELLPACK-C- α (SELL-C- α) format. The evaluation process consists of two phases. The first phase is focused on the performance comparison of the SpMV kernels alone. The second phase compares the performance of the solvers with the different kernels integrated within them. As this work is oriented to explore the full capabilities of the Intel Xeon Phi architecture, the Intel’s intrinsic instructions set were used to code the kernels for the AXC and SELL-C- α formats and compared with the Intel’s MKL optimal implementation of the CSR format. Numerical results demonstrate that the AXC format achieves the higher average performance 15.9 GFLOPS against the 5.5 GFLOPS for the SELL-C- α format and 8.4 GFLOPs for the MKL CSR format, and it also have the lowest variability when performing the SpMV product alone. Results also show that the AXC format achieves up to 6.5 ×, 1.5 × and 1.9 × greater performance over the MKL CSR format, its closest competitor, for the SpMV product, the CG and the BiCGS algorithms, respectively.

keywords: Sparse Matrix Storage, Sparse Matrix Vector Product, Conjugate Gradient, Biconjugate Gradient Stabilised, Intel Xeon Phi, MIC Intrinsics