High Performance Polar Decomposition on Distributed Memory Systems

Dalal Sukkari; Hatem Ltaief; David Keyes

doi:10.1007/978-3-319-43659-3_44

Back

High Performance Polar Decomposition on Distributed Memory Systems

Conference proceeding

Peer reviewed

High Performance Polar Decomposition on Distributed Memory Systems

Dalal Sukkari, Hatem Ltaief and David Keyes

EURO-PAR 2016: PARALLEL PROCESSING, Vol.9833, pp.605-616

Lecture Notes in Computer Science

01/01/2016

DOI: https://doi.org/10.1007/978-3-319-43659-3_44

Abstract

Computer Science

Computer Science, Theory & Methods

Science & Technology

Technology

The polar decomposition of a dense matrix is an important operation in linear algebra. It can be directly calculated through the singular value decomposition (SVD) or iteratively using the QR dynamically-weighted Halley algorithm (QDWH). The former is difficult to parallelize due to the preponderant number of memory-bound operations during the bidiagonal reduction. We investigate the latter scenario, which performs more floating-point operations but exposes at the same time more parallelism, and therefore, runs closer to the theoretical peak performance of the system, thanks to more compute-bound matrix operations. Profiling results show the performance scalability of QDWH for calculating the polar decomposition using around 9200 MPI processes on well and ill-conditioned matrices of 100Kx100K problem size. We study then the performance impact of the QDWH-based polar decomposition as a pre-processing step toward calculating the SVD itself. The new distributed-memory implementation of the QDWH-SVD solver achieves up to five-fold speedup against current state-of-the-art vendor SVD implementations.

Metrics

1 Record Views

Details

Title: High Performance Polar Decomposition on Distributed Memory Systems
Creators - without role: Dalal Sukkari - King Abdullah University of Science and Technology
Hatem Ltaief - King Abdullah University of Science and Technology
David Keyes - King Abdullah University of Science and Technology
Contributors - without role: P F Dutot
D Trystram
Publication Details: EURO-PAR 2016: PARALLEL PROCESSING, Vol.9833, pp.605-616
Series: Lecture Notes in Computer Science
Publisher: Springer Nature
Number of pages: 12
Identifiers: 9941230208331
Academic Unit: King Abdullah University of Science & Technology
Language: English
Resource Type: Conference proceeding