Equation Solution
High Performance by Design
List of Blog Contents
 Page: 5   Grandpa on Multicores (5)   Is parallel computing easy?   Grandpa on Multicores (4)   Grandpa on Multicores (3)   How Fast a Variable Band Solver Could Speed up in a Parallel Environment

Grandpa on Multicores (4)

[Posted by Jenn-Ching Luo on May 02, 2011 ]

This post shows how grandpa LAIPE decomposes a general dense matrix on multicores. Dense matrix is the simplest type, for example, in a form as:
```       /                       \
|  x  x  x  x  x  x  x  |
|  x  x  x  x  x  x  x  |
|  x  x  x  x  x  x  x  |
[A] = |  x  x  x  x  x  x  x  |
|  x  x  x  x  x  x  x  |
|  x  x  x  x  x  x  x  |
|  x  x  x  x  x  x  x  |
\                       /
```
We are going to see how efficient grandpa LAIPE decomposes the matrix [A] into [L][U] in parallel. We did not often see a performance of multicores on the internet. This blog conducts a serious of performance tests, and provides us with an opportunity to see speedup and efficiency. In this post, we are going to see how efficient multicores speed up a decomposition of a dense matrix into triangular matrices.

The test matrix is of order (3000x3000), and is in 16-byte REAL variables. This post applied a pseudo random process to generate the matrix. The test example called LAIPE subroutine, decompose_DAG_16, to decompose matrix. Grandpa LAIPE is ancient software. As mentioned previously, grandpa LAIPE was programmed in ancient parallel concept, whose base language is fortran-77. The testing platform remained the same as the first post "Grandpa on Multicores (1)". We obtained a set of timing results as follows.

core: 1
Elapsed Time (Seconds): 2739.05
CPU Time in User Mode (Seconds): 2738.96
CPU Time in Kernel Mode (Seconds): 0.09
Total CPU Time (Seconds): 2739.05

cores: 2
Elapsed Time (Seconds): 1373.54
CPU Time in User Mode (Seconds): 2735.79
CPU Time in Kernel Mode (Seconds): 0.19
Total CPU Time (Seconds): 2735.98

cores: 3
Elapsed Time (Seconds): 929.64
CPU Time in User Mode (Seconds): 2766.10
CPU Time in Kernel Mode (Seconds): 0.19
Total CPU Time (Seconds): 2766.29

cores: 4
Elapsed Time (Seconds): 704.70
CPU Time in User Mode (Seconds): 2784.48
CPU Time in Kernel Mode (Seconds): 0.23
Total CPU Time (Seconds): 2784.71

cores: 5
Elapsed Time (Seconds): 568.64
CPU Time in User Mode (Seconds): 2797.36
CPU Time in Kernel Mode (Seconds): 0.19
Total CPU Time (Seconds): 2797.55

cores: 6
Elapsed Time (Seconds): 476.54
CPU Time in User Mode (Seconds): 2801.04
CPU Time in Kernel Mode (Seconds): 0.12
Total CPU Time (Seconds): 2801.17

cores: 7
Elapsed Time (Seconds): 410.95
CPU Time in User Mode (Seconds): 2807.19
CPU Time in Kernel Mode (Seconds): 0.19
Total CPU Time (Seconds): 2807.38

cores: 8
Elapsed Time (Seconds): 361.45
CPU Time in User Mode (Seconds): 2809.34
CPU Time in Kernel Mode (Seconds): 0.27
Total CPU Time (Seconds): 2809.61

After we quickly examined the timing result, we could find that elapsed time was almost linearly reduced when increasing cores. We summarize the timing data to have speedup and efficiency:

 number of cores elapsed time (sec.) speedup efficiency (%) 1 2739.05 1.00 100.00 2 1373.54 1.99 99.7 3 929.64 2.95 98.21 4 704.70 3.89 97.17 5 568.64 4.82 96.34 6 476.54 5.75 95.80 7 410.95 6.67 95.22 8 361.45 7.58 94.72

On 2 cores, speedup was up to 1.99x, which is equivalent to a 99.71% of efficiency; on 8 cores, grandpa could speed up to 7.58x, and reached a 94.72% of efficiency.

The ancient software, grandpa LAIPE, showed us an almost perfect speedup again.