Equation Solution
Search the Web  
                                     
Navigation Tree  
Home
'- - Programming Tools
'- - Parallel Computing
'- - Blog
'- - In-Situ Evaluation
'- - Numerical Analysis
'- - Structural Mechanics
|     '- - IFAS
|     '- - mCable
'- - Write Us
|     '- - Feedback  
|     '- - Info  
|     '- - Support  
|     '- - Webmaster  
'- - Privacy Policy


Implementation of Constant-Bandwidth Solver
of LAIPE2 on 48 Cores


[Posted by Jenn-Ching Luo on Nov. 20, 2014 ]

      This post shows parallel performance of a LAIPE2 solver on a 48-core computer. Multicore can speed up a computing. Some people might like to see how fast a multicore computer can improve. This post will present a set of timing results to show the performance.

      The test was implemented on a Dell Poweredge R815 with four 1.9GHZ Opterons, each of which has 12 cores. Only the basic core of Microsoft Server 2008 was installed on the computer, without graphical user interface, e.g., a command-driven environment.

      The testing problem was a sparse, symmetric, and positive definite system of equations, [A]{X}={B}, where [A] is of order (50,000x50,000) and has a constant bandwidth 7,000.

      The solution invoked laipe2$decompose_CSP_10 and laipe2$substitute_CSP_10. The subroutine laipe2$decompose_CSP_10 decomposes matrix [A] into a product of triangular matrices; The subroutine laipe2$substitute_CSP_10 performs substitutions. Substitution is not a time consuming procedure, and is less interested. This post only shows the timing results which were spent in the subroutine laipe2$decompose_CSP_10.

QUICK LOOK AT THE IMPRESSIVE PERFORMANCE

      From the timing results below, it can be seen one core took 13878.71 seconds to solve the testing problem, which is about 3 hours and 52 minutes. It was a long computing. Complex engineering or scientific problems could be formulated into a system with more than 100,000 unknowns, which could request a time longer than the one in this testing problem. Efficient parallel solvers to speed up a computation are always desired.

      An interesting question is how fast 48 cores can solve the testing problem. We are going to see it.

      From the timing results below, we can see that 48 cores took only 332.84 seconds, about 5 minutes and 30 seconds. Wow! An 5-hour computing on one core could be done in about 5 minutes on 48 cores. It is very clear for us to see that there is no reason to deny parallel computing. Undoubtedly, we can benefit from multicore and parallel programming.

      One thing is mentioned here. Parallel computing is not everything, either. Sometimes, parallel computing could be an extra burden, and cannot gain a benefit. It could happen when the problem size is small, having no much computations that are able to be distributed among cooperative cores. In such situation, parallel computing cannot gain a benefit, but in contrary may slow a computing.

TIMING RESULTS

      Detailed timing results are listed in the following. The results include elapsed time, CPU time in user mode and in kernel mode, and the total CPU time. The computer when solving the testing problem did not have other running jobs, except the operating system. Under the circumstance, the computer was deemed as standalone. The elapsed time could be applied to measure speedup and efficiency.

      In general, the total CPU time should be increased when using more cores. More cores require a relatively higher overhead and more synchronizations. However, a "good luck" may reduce the total CPU time. Before an instruction is excuted, data should be loaded onto cache memory. When a core demands a specific data and loads the data onto cache memory, other lucky cores that also demand the data may access the data without a need to load the data. Total CPU time may be reduced if such "good luck" is possible for most of the time. When such "good luck" comes, we could see "super linear performance".

      Besides "good luck", we also may experience "bad luck" from cache coherency that could degrade parallel performance.

      From the above briefly description, the following timing results are understandable. The total CPU time was not in an increasing order. The following results use the word "processors" which effectively mean "cores". The testing program was developed before multicores were invented. In those days, we used "processors" as units for parallel processing.

      Processor:  1
      Elapsed Time (Seconds): 13878.71
      CPU Time in User Mode (Seconds): 13877.04
      CPU Time in Kernel Mode (Seconds):     1.67
      Total CPU Time (Seconds): 13878.71
  
      Processors:  2
      Elapsed Time (Seconds):  6317.14
      CPU Time in User Mode (Seconds): 12629.31
      CPU Time in Kernel Mode (Seconds):     1.98
      Total CPU Time (Seconds): 12631.29
  
      Processors:  3
      Elapsed Time (Seconds):  4234.10
      CPU Time in User Mode (Seconds): 12693.50
      CPU Time in Kernel Mode (Seconds):     2.11
      Total CPU Time (Seconds): 12695.61
  
      Processors:  4
      Elapsed Time (Seconds):  3193.19
      CPU Time in User Mode (Seconds): 12759.43
      CPU Time in Kernel Mode (Seconds):     2.31
      Total CPU Time (Seconds): 12761.74
  
      Processors:  5
      Elapsed Time (Seconds):  2567.09
      CPU Time in User Mode (Seconds): 12818.01
      CPU Time in Kernel Mode (Seconds):     2.57
      Total CPU Time (Seconds): 12820.58
  
      Processors:  6
      Elapsed Time (Seconds):  2148.12
      CPU Time in User Mode (Seconds): 12867.32
      CPU Time in Kernel Mode (Seconds):     2.96
      Total CPU Time (Seconds): 12870.29
  
      Processors:  7
      Elapsed Time (Seconds):  1852.99
      CPU Time in User Mode (Seconds): 12942.69
      CPU Time in Kernel Mode (Seconds):     2.84
      Total CPU Time (Seconds): 12945.52
  
      Processors:  8
      Elapsed Time (Seconds):  1633.27
      CPU Time in User Mode (Seconds): 13030.03
      CPU Time in Kernel Mode (Seconds):     3.01
      Total CPU Time (Seconds): 13033.04
  
      Processors:  9
      Elapsed Time (Seconds):  1462.84
      CPU Time in User Mode (Seconds): 13124.40
      CPU Time in Kernel Mode (Seconds):     3.09
      Total CPU Time (Seconds): 13127.48
  
      Processors: 10
      Elapsed Time (Seconds):  1326.31
      CPU Time in User Mode (Seconds): 13215.48
      CPU Time in Kernel Mode (Seconds):     3.04
      Total CPU Time (Seconds): 13218.53
  
      Processors: 11
      Elapsed Time (Seconds):  1216.07
      CPU Time in User Mode (Seconds): 13313.34
      CPU Time in Kernel Mode (Seconds):     3.29
      Total CPU Time (Seconds): 13316.64
  
      Processors: 12
      Elapsed Time (Seconds):  1122.52
      CPU Time in User Mode (Seconds): 13410.52
      CPU Time in Kernel Mode (Seconds):     3.56
      Total CPU Time (Seconds): 13414.07
  
      Processors: 13
      Elapsed Time (Seconds):  1045.47
      CPU Time in User Mode (Seconds): 13515.02
      CPU Time in Kernel Mode (Seconds):     3.95
      Total CPU Time (Seconds): 13518.97
  
      Processors: 14
      Elapsed Time (Seconds):   979.37
      CPU Time in User Mode (Seconds): 13628.78
      CPU Time in Kernel Mode (Seconds):     3.62
      Total CPU Time (Seconds): 13632.40
  
     Processors: 15
      Elapsed Time (Seconds):   923.06
      CPU Time in User Mode (Seconds): 13752.11
      CPU Time in Kernel Mode (Seconds):     3.70
      Total CPU Time (Seconds): 13755.81
  
      Processors: 16
      Elapsed Time (Seconds):   873.25
      CPU Time in User Mode (Seconds): 13873.36
      CPU Time in Kernel Mode (Seconds):     3.96
      Total CPU Time (Seconds): 13877.32
  
      Processors: 17
      Elapsed Time (Seconds):   833.70
      CPU Time in User Mode (Seconds): 14036.14
      CPU Time in Kernel Mode (Seconds):     3.90
      Total CPU Time (Seconds): 14040.04
  
      Processors: 18
      Elapsed Time (Seconds):   797.43
      CPU Time in User Mode (Seconds): 14208.98
      CPU Time in Kernel Mode (Seconds):     4.41
      Total CPU Time (Seconds): 14213.39
  
      Processors: 19
      Elapsed Time (Seconds):   748.90
      CPU Time in User Mode (Seconds): 14097.19
      CPU Time in Kernel Mode (Seconds):     4.32
      Total CPU Time (Seconds): 14101.51
  
      Processors: 20
      Elapsed Time (Seconds):   714.03
      CPU Time in User Mode (Seconds): 14133.80
      CPU Time in Kernel Mode (Seconds):     4.23
      Total CPU Time (Seconds): 14138.03
  
      Processors: 21
      Elapsed Time (Seconds):   682.24
      CPU Time in User Mode (Seconds): 14151.40
      CPU Time in Kernel Mode (Seconds):     4.66
      Total CPU Time (Seconds): 14156.06
  
      Processors: 22
      Elapsed Time (Seconds):   651.62
      CPU Time in User Mode (Seconds): 14145.34
      CPU Time in Kernel Mode (Seconds):     4.35
      Total CPU Time (Seconds): 14149.70
  
      Processors: 23
      Elapsed Time (Seconds):   622.76
      CPU Time in User Mode (Seconds): 14140.23
      CPU Time in Kernel Mode (Seconds):     4.63
      Total CPU Time (Seconds): 14144.86
  
      Processors: 24
      Elapsed Time (Seconds):   596.58
      CPU Time in User Mode (Seconds): 14129.90
      CPU Time in Kernel Mode (Seconds):     4.65
      Total CPU Time (Seconds): 14134.55
  
      Processors: 25
      Elapsed Time (Seconds):   572.91
      CPU Time in User Mode (Seconds): 14132.85
      CPU Time in Kernel Mode (Seconds):     4.87
      Total CPU Time (Seconds): 14137.72
  
      Processors: 26
      Elapsed Time (Seconds):   555.04
      CPU Time in User Mode (Seconds): 14184.20
      CPU Time in Kernel Mode (Seconds):     5.37
      Total CPU Time (Seconds): 14189.57
  
      Processors: 27
      Elapsed Time (Seconds):   538.87
      CPU Time in User Mode (Seconds): 14235.92
      CPU Time in Kernel Mode (Seconds):     4.91
      Total CPU Time (Seconds): 14240.83
  
      Processors: 28
      Elapsed Time (Seconds):   522.49
      CPU Time in User Mode (Seconds): 14277.27
      CPU Time in Kernel Mode (Seconds):     5.12
      Total CPU Time (Seconds): 14282.39
  
      Processors: 29
      Elapsed Time (Seconds):   505.71
      CPU Time in User Mode (Seconds): 14300.58
      CPU Time in Kernel Mode (Seconds):     5.43
      Total CPU Time (Seconds): 14306.01
  
      Processors: 30
      Elapsed Time (Seconds):   488.50
      CPU Time in User Mode (Seconds): 14321.48
      CPU Time in Kernel Mode (Seconds):     4.84
      Total CPU Time (Seconds): 14326.32
  
      Processors: 31
      Elapsed Time (Seconds):   470.47
      CPU Time in User Mode (Seconds): 14286.46
      CPU Time in Kernel Mode (Seconds):     5.21
      Total CPU Time (Seconds): 14291.67
  
      Processors: 32
      Elapsed Time (Seconds):   456.58
      CPU Time in User Mode (Seconds): 14280.63
      CPU Time in Kernel Mode (Seconds):     5.71
      Total CPU Time (Seconds): 14286.34
  
      Processors: 33
      Elapsed Time (Seconds):   443.61
      CPU Time in User Mode (Seconds): 14274.22
      CPU Time in Kernel Mode (Seconds):     5.26
      Total CPU Time (Seconds): 14279.47
  
      Processors: 34
      Elapsed Time (Seconds):   430.88
      CPU Time in User Mode (Seconds): 14273.89
      CPU Time in Kernel Mode (Seconds):     6.01
      Total CPU Time (Seconds): 14279.89
  
      Processors: 35
      Elapsed Time (Seconds):   418.61
      CPU Time in User Mode (Seconds): 14289.91
      CPU Time in Kernel Mode (Seconds):     5.58
      Total CPU Time (Seconds): 14295.50
  
      Processors: 36
      Elapsed Time (Seconds):   407.52
      CPU Time in User Mode (Seconds): 14307.51
      CPU Time in Kernel Mode (Seconds):     5.49
      Total CPU Time (Seconds): 14313.00
  
     Processors: 37
      Elapsed Time (Seconds):   400.33
      CPU Time in User Mode (Seconds): 14361.81
      CPU Time in Kernel Mode (Seconds):     6.02
      Total CPU Time (Seconds): 14367.83
  
      Processors: 38
      Elapsed Time (Seconds):   392.30
      CPU Time in User Mode (Seconds): 14492.29
      CPU Time in Kernel Mode (Seconds):     5.88
      Total CPU Time (Seconds): 14498.17
  
      Processors: 39
      Elapsed Time (Seconds):   384.26
      CPU Time in User Mode (Seconds): 14460.33
      CPU Time in Kernel Mode (Seconds):     6.26
      Total CPU Time (Seconds): 14466.58
  
      Processors: 40
      Elapsed Time (Seconds):   378.27
      CPU Time in User Mode (Seconds): 14539.96
      CPU Time in Kernel Mode (Seconds):     6.12
      Total CPU Time (Seconds): 14546.08
  
      Processors: 41
      Elapsed Time (Seconds):   372.47
      CPU Time in User Mode (Seconds): 14620.93
      CPU Time in Kernel Mode (Seconds):     6.46
      Total CPU Time (Seconds): 14627.39
  
      Processors: 42
      Elapsed Time (Seconds):   366.40
      CPU Time in User Mode (Seconds): 14678.45
      CPU Time in Kernel Mode (Seconds):     6.35
      Total CPU Time (Seconds): 14684.80
  
      Processors: 43
      Elapsed Time (Seconds):   360.80
      CPU Time in User Mode (Seconds): 14743.30
      CPU Time in Kernel Mode (Seconds):     7.10
      Total CPU Time (Seconds): 14750.39
  
      Processors: 44
      Elapsed Time (Seconds):   355.57
      CPU Time in User Mode (Seconds): 14836.85
      CPU Time in Kernel Mode (Seconds):     6.21
      Total CPU Time (Seconds): 14843.06
  
      Processors: 45
      Elapsed Time (Seconds):   349.60
      CPU Time in User Mode (Seconds): 14898.11
      CPU Time in Kernel Mode (Seconds):     6.69
      Total CPU Time (Seconds): 14904.80
  
      Processors: 46
      Elapsed Time (Seconds):   343.76
      CPU Time in User Mode (Seconds): 14971.68
      CPU Time in Kernel Mode (Seconds):     6.33
      Total CPU Time (Seconds): 14978.02
  
      Processors: 47
      Elapsed Time (Seconds):   337.95
      CPU Time in User Mode (Seconds): 15031.01
      CPU Time in Kernel Mode (Seconds):     6.97
      Total CPU Time (Seconds): 15037.98
  
      Processors: 48
      Elapsed Time (Seconds):   332.84
      CPU Time in User Mode (Seconds): 15135.08
      CPU Time in Kernel Mode (Seconds):     6.97
      Total CPU Time (Seconds): 15142.05

SPEEDUP AND EFFICIENCY

      The above timing results are summarized into the following table.
 Number of Cores   Elapsed Time (Sec.)   Speedup   Efficiency (%)
 ----------------------------------------------------------------
        1               13878.71          1.0000      100.00
        2                6317.14          2.1970      109.85
        3                4234.10          3.2778      109.26
        4                3193.19          4.3463      108.66
        5                2567.09          5.4064      108.13
        6                2148.12          6.4609      107.68
        7                1852.99          7.4899      107.00
        8                1633.27          8.4975      106.22
        9                1462.84          9.4875      105.42
       10                1326.31         10.4642      104.64
       11                1216.07         11.4128      103.75
       12                1122.52         12.3639      103.03
       13                1045.47         13.2751      102.12
       14                 979.37         14.1711      101.22
       15                 923.06         15.0355      100.24
       16                 873.25         15.8932       99.33
       17                 833.70         16.6471       97.92
       18                 797.43         17.4043       96.69
       19                 748.90         18.5321       97.54
       20                 714.03         19.4371       97.19
       21                 682.24         20.3429       96.87
       22                 651.62         21.2988       96.81
       23                 622.76         22.2858       96.89
       24                 596.58         23.2638       96.93
       25                 572.91         24.2249       96.90
       26                 555.04         25.0049       96.17
       27                 538.87         25.7552       95.39
       28                 522.49         26.5626       94.87
       29                 505.71         27.4440       94.63
       30                 488.50         28.4109       94.70
       31                 470.47         29.4997       95.16
       32                 456.58         30.3971       94.99
       33                 443.61         31.2858       94.81
       34                 430.88         32.2102       94.74
       35                 418.61         33.1543       94.73
       36                 407.52         34.0565       94.60
       37                 400.33         34.6682       93.70
       38                 392.30         35.3778       93.10
       39                 384.26         36.1180       92.61
       40                 378.27         36.6900       91.73
       41                 372.47         37.2613       90.88
       42                 366.40         37.8786       90.19
       43                 360.80         38.4665       89.46
       44                 355.57         39.0323       88.71
       45                 349.60         39.6988       88.22
       46                 343.76         40.3733       87.77
       47                 337.95         41.0673       87.38
       48                 332.84         41.6978       86.87
      The first column is the number of cores that solved the testing problem; The second column is the elapsed time in seconds. From the table, we can see the elapsed time is in a decreasing order. The computing was speeded up with more cores; The third column is the speedup. From the speedup, we see super linear performance in the ranges of 2 to 15 cores. In super linear performance, speedup is greater than the physical cores. For example, 2 cores speed up to 2.1970x. That looks illogical, but it happens and benefits from caching; The fourth column is the efficiency. We also see super linear performance in the range of 2 to 15 cores from efficiency. In super linear performance, efficiency is higher than 100%. In this testing problem, 48 cores can yield an efficiency of 86.87, which is highly efficient.