Equation Solution High Performance by Design |
||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||
|
Grandpa on Multicores (4)
[Posted by Jenn-Ching Luo on May 02, 2011 ]
This post shows how grandpa LAIPE decomposes a general dense matrix on multicores. Dense matrix is the simplest type, for example, in a form as:
/ \ | x x x x x x x | | x x x x x x x | | x x x x x x x | [A] = | x x x x x x x | | x x x x x x x | | x x x x x x x | | x x x x x x x | \ / The test matrix is of order (3000x3000), and is in 16-byte REAL variables. This post applied a pseudo random process to generate the matrix. The test example called LAIPE subroutine, decompose_DAG_16, to decompose matrix. Grandpa LAIPE is ancient software. As mentioned previously, grandpa LAIPE was programmed in ancient parallel concept, whose base language is fortran-77. The testing platform remained the same as the first post "Grandpa on Multicores (1)". We obtained a set of timing results as follows.
core: 1
Elapsed Time (Seconds): 2739.05 CPU Time in User Mode (Seconds): 2738.96 CPU Time in Kernel Mode (Seconds): 0.09 Total CPU Time (Seconds): 2739.05 cores: 2 Elapsed Time (Seconds): 1373.54 CPU Time in User Mode (Seconds): 2735.79 CPU Time in Kernel Mode (Seconds): 0.19 Total CPU Time (Seconds): 2735.98 cores: 3 Elapsed Time (Seconds): 929.64 CPU Time in User Mode (Seconds): 2766.10 CPU Time in Kernel Mode (Seconds): 0.19 Total CPU Time (Seconds): 2766.29 cores: 4 Elapsed Time (Seconds): 704.70 CPU Time in User Mode (Seconds): 2784.48 CPU Time in Kernel Mode (Seconds): 0.23 Total CPU Time (Seconds): 2784.71 cores: 5 Elapsed Time (Seconds): 568.64 CPU Time in User Mode (Seconds): 2797.36 CPU Time in Kernel Mode (Seconds): 0.19 Total CPU Time (Seconds): 2797.55 cores: 6 Elapsed Time (Seconds): 476.54 CPU Time in User Mode (Seconds): 2801.04 CPU Time in Kernel Mode (Seconds): 0.12 Total CPU Time (Seconds): 2801.17 cores: 7 Elapsed Time (Seconds): 410.95 CPU Time in User Mode (Seconds): 2807.19 CPU Time in Kernel Mode (Seconds): 0.19 Total CPU Time (Seconds): 2807.38 cores: 8 Elapsed Time (Seconds): 361.45 CPU Time in User Mode (Seconds): 2809.34 CPU Time in Kernel Mode (Seconds): 0.27 Total CPU Time (Seconds): 2809.61 After we quickly examined the timing result, we could find that elapsed time was almost linearly reduced when increasing cores. We summarize the timing data to have speedup and efficiency:
On 2 cores, speedup was up to 1.99x, which is equivalent to a 99.71% of efficiency; on 8 cores, grandpa could speed up to 7.58x, and reached a 94.72% of efficiency. The ancient software, grandpa LAIPE, showed us an almost perfect speedup again. |
|||||||||||||||||||||||||||||||||||||||||