High Performance by Design
How Fast a Variable Band Solver Could Speed Up in a Parallel Environment
[Posted by Jenn-Ching Luo on Apr. 18, 2011 ]
The following cites the research paper, "A Parallel-Vector Algorithm for Rapid Structural Analysis on High-Performance Computers", by Dr. Olaf O. Storaasli, to show how fast a variable band solver can speed up in a parallel environment. Dr. Storaasli's paper can be read here,
The Storaasli paper was back in year 1990, which made researches into parallel structural analysis. In the Dr. Storaasli's paper, stiffness matrices had a variable bandwidth, and variable band solvers were applied.
In the previous post "Grandpa on Multicores (2)", this writer showed some performances of parallel skyline solvers of LAIPE. Skyline solver is a member of the family of variable band solvers. Since Dr. Storaasli's research also had performances of variable band solvers. This post will put together the parallel performances from Storaasli paper and the previous post "Grandpa on Multicores (2)". We can see how fast a variable band solver could speed up on multicores and in a parallel environment.
Timing Results From Dr. Storaasli's Paper
Dr. Storaasli's research was implemented on an 8-processor Cray Y-MP supercomputer. Dr. Storaasli used two examples in his research, one was an aircraft and the other was space shuttle solid rocket booster.
In the example of aircraft, Dr. Storaasli formulated the problem to have a sparse, symmetric and positive definite stiffness matrix of order (16146x16146), whose maximal bandwidth is 600 with an average bandwidth 321. The Table 1 of Dr. Storaasli's paper had the timing result that spent in decomposing the stiffness matrix into triangular matrices. We cite the timing result in the following:
We can see, from Dr. Storaasli's paper, that two processors could yield a speedup 1.93 and efficiency on 2 processors was up to 96%; 8 processors yielded a speedup 5.64, which is equivalent to 70% efficiency.
The second example of Dr. Storaasli was a space shuttle solid rocket booster. Dr. Storaasli formulated the booster into a stiffness matrix of order (54870x54870), with a maximal bandwidth 900 and an average bandwidth 383. The stiffness matrix was decomposed into triangular matrices, and the timing result from Table 2 of Dr. Storaasili's paper is as:
The second example of Dr. Storaasli showed that 8 processors could yield a speedup 6.67, which is better than the first example. Parallel performance depends on sparsity and problem size, which might contribute to the second example to yield a better speedup.
Dr. Storaasli's research gave us an idea how fast a variable band solver can speed up in a parallel environment. From Dr. Storaasli's paper, on 8 processors, speedup can reach up to 6.67, which is equivalent to 83% efficiency.
Timing Results From "Grandpa on Multicores (2)"
Now, we look back the parallel performances of 16-byte REAL variables in the previous post "Grandpa on Multicores (2)". We cite the timing result in the following:
Grandpa can yield an almost perfect speedup. 8 cores can speed up to 7.85X; efficiency on 8 cores is up to 98%.
As compared with timing results of Dr. Storaasli paper, Grandpa LAIPE could yield a better speedup. It is unclear when we can see a parallel skyline solver that could yield a speedup better than grandpa LAIPE.