Equation Solution  
    High Performance by Design
Navigation Tree  
Home
'- - Programming Tools
'- - Parallel Computing
'- - Blog
'- - In-Situ Evaluation
'- - Numerical Analysis
'- - Structural Mechanics
|     '- - IFAS
|     '- - mCable
'- - Write Us
|     '- - Feedback  
|     '- - Info  
|     '- - Support  
|     '- - Webmaster  
'- - Privacy Policy
 
 


Parallel Performance of laipe$Decompose_DAG_4 on 48 Cores


[Posted by Jenn-Ching Luo on Dec. 24, 2015 ]

      This post shares a set of parallel performance of the LAIPE2 subroutine laipe$Decompose_DAG_4 on 48 cores. LAIPE2 is programmed in neuloop. neuLoop distributes computing on soft cores. Two types of soft core are programmed in neuloop, homogeneous cores and heterogeneous cores. The testing program was linked against a library of homogeneous cores.

      The computing environment is a Dell PowerEdge R815 with quad 1.9GHZ 12-core Opterons on Windows Server 2008. Compiler is gfortran. The testing problem is a 4-byte dense matrix of order 20,000-by-20,000. Timing results is listed as follows.

Timing Result

  Core: 1
      Elapsed Time (Seconds): 4501.97
      CPU Time in User Mode (Seconds): 4501.81
      CPU Time in Kernel Mode (Seconds): 0.14
      Total CPU Time (Seconds): 4501.96

  Cores: 2
      Elapsed Time (Seconds): 2136.72
      CPU Time in User Mode (Seconds): 4261.45
      CPU Time in Kernel Mode (Seconds): 0.30
      Total CPU Time (Seconds): 4261.74

  Cores: 3
      Elapsed Time (Seconds): 1433.56
      CPU Time in User Mode (Seconds): 4263.77
      CPU Time in Kernel Mode (Seconds): 0.16
      Total CPU Time (Seconds): 4263.93

  Cores: 4
      Elapsed Time (Seconds): 1079.81
      CPU Time in User Mode (Seconds): 4268.25
      CPU Time in Kernel Mode (Seconds): 0.23
      Total CPU Time (Seconds): 4268.48

  Cores: 5
      Elapsed Time (Seconds): 870.33
      CPU Time in User Mode (Seconds): 4271.93
      CPU Time in Kernel Mode (Seconds): 0.45
      Total CPU Time (Seconds): 4272.38

  Cores: 6
      Elapsed Time (Seconds): 729.94
      CPU Time in User Mode (Seconds): 4278.50
      CPU Time in Kernel Mode (Seconds): 0.56
      Total CPU Time (Seconds): 4279.06

  Cores: 7
      Elapsed Time (Seconds): 638.90
      CPU Time in User Mode (Seconds): 4344.53
      CPU Time in Kernel Mode (Seconds): 0.62
      Total CPU Time (Seconds): 4345.16

  Cores: 8
      Elapsed Time (Seconds): 567.95
      CPU Time in User Mode (Seconds): 4394.81
      CPU Time in Kernel Mode (Seconds): 0.97
      Total CPU Time (Seconds): 4395.78

  Cores: 9
      Elapsed Time (Seconds): 513.43
      CPU Time in User Mode (Seconds): 4442.71
      CPU Time in Kernel Mode (Seconds): 1.67
      Total CPU Time (Seconds): 4444.38

  Cores: 10
      Elapsed Time (Seconds): 469.56
      CPU Time in User Mode (Seconds): 4490.40
      CPU Time in Kernel Mode (Seconds): 1.62
      Total CPU Time (Seconds): 4492.02

  Cores: 11
      Elapsed Time (Seconds): 434.03
      CPU Time in User Mode (Seconds): 4541.67
      CPU Time in Kernel Mode (Seconds): 1.44
      Total CPU Time (Seconds): 4543.11

  Cores: 12
      Elapsed Time (Seconds): 402.70
      CPU Time in User Mode (Seconds): 4573.37
      CPU Time in Kernel Mode (Seconds): 2.11
      Total CPU Time (Seconds): 4575.48

  Cores: 13
      Elapsed Time (Seconds): 375.71
      CPU Time in User Mode (Seconds): 4591.56
      CPU Time in Kernel Mode (Seconds): 1.65
      Total CPU Time (Seconds): 4593.22

  Cores: 14
      Elapsed Time (Seconds): 352.75
      CPU Time in User Mode (Seconds): 4613.50
      CPU Time in Kernel Mode (Seconds): 1.45
      Total CPU Time (Seconds): 4614.95

  Cores: 15
      Elapsed Time (Seconds): 333.64
      CPU Time in User Mode (Seconds): 4649.58
      CPU Time in Kernel Mode (Seconds): 2.45
      Total CPU Time (Seconds): 4652.03

  Cores: 16
      Elapsed Time (Seconds): 317.20
      CPU Time in User Mode (Seconds): 4682.28
      CPU Time in Kernel Mode (Seconds): 2.29
      Total CPU Time (Seconds): 4684.57

  Cores: 17
      Elapsed Time (Seconds): 302.72
      CPU Time in User Mode (Seconds): 4718.94
      CPU Time in Kernel Mode (Seconds): 3.20
      Total CPU Time (Seconds): 4722.13

  Cores: 18
      Elapsed Time (Seconds): 288.65
      CPU Time in User Mode (Seconds): 4738.80
      CPU Time in Kernel Mode (Seconds): 2.46
      Total CPU Time (Seconds): 4741.26

  Cores: 19
      Elapsed Time (Seconds): 277.84
      CPU Time in User Mode (Seconds): 4784.32
      CPU Time in Kernel Mode (Seconds): 2.48
      Total CPU Time (Seconds): 4786.80

  Cores: 20
      Elapsed Time (Seconds): 266.92
      CPU Time in User Mode (Seconds): 4798.61
      CPU Time in Kernel Mode (Seconds): 2.40
      Total CPU Time (Seconds): 4801.01

  Cores: 21
      Elapsed Time (Seconds): 258.49
      CPU Time in User Mode (Seconds): 4847.26
      CPU Time in Kernel Mode (Seconds): 3.48
      Total CPU Time (Seconds): 4850.74

  Cores: 22
      Elapsed Time (Seconds): 248.96
      CPU Time in User Mode (Seconds): 4864.19
      CPU Time in Kernel Mode (Seconds): 3.01
      Total CPU Time (Seconds): 4867.20

  Cores: 23
      Elapsed Time (Seconds): 242.85
      CPU Time in User Mode (Seconds): 4922.30
      CPU Time in Kernel Mode (Seconds): 3.23
      Total CPU Time (Seconds): 4925.53

  Cores: 24
      Elapsed Time (Seconds): 237.93
      CPU Time in User Mode (Seconds): 4989.52
      CPU Time in Kernel Mode (Seconds): 3.34
      Total CPU Time (Seconds): 4992.86

  Cores: 25
      Elapsed Time (Seconds): 230.57
      CPU Time in User Mode (Seconds): 5007.68
      CPU Time in Kernel Mode (Seconds): 3.00
      Total CPU Time (Seconds): 5010.67

  Cores: 26
      Elapsed Time (Seconds): 226.73
      CPU Time in User Mode (Seconds): 5078.91
      CPU Time in Kernel Mode (Seconds): 4.12
      Total CPU Time (Seconds): 5083.03

  Cores: 27
      Elapsed Time (Seconds): 221.71
      CPU Time in User Mode (Seconds): 5122.98
      CPU Time in Kernel Mode (Seconds): 3.18
      Total CPU Time (Seconds): 5126.16

  Cores: 28
      Elapsed Time (Seconds): 216.84
      CPU Time in User Mode (Seconds): 5162.70
      CPU Time in Kernel Mode (Seconds): 3.60
      Total CPU Time (Seconds): 5166.30

  Cores: 29
      Elapsed Time (Seconds): 214.83
      CPU Time in User Mode (Seconds): 5251.23
      CPU Time in Kernel Mode (Seconds): 3.51
      Total CPU Time (Seconds): 5254.74

  Cores: 30
      Elapsed Time (Seconds): 209.79
      CPU Time in User Mode (Seconds): 5271.26
      CPU Time in Kernel Mode (Seconds): 3.87
      Total CPU Time (Seconds): 5275.13

  Cores: 31
      Elapsed Time (Seconds): 208.68
      CPU Time in User Mode (Seconds): 5395.89
      CPU Time in Kernel Mode (Seconds): 2.98
      Total CPU Time (Seconds): 5398.87

  Cores: 32
      Elapsed Time (Seconds): 207.08
      CPU Time in User Mode (Seconds): 5467.51
      CPU Time in Kernel Mode (Seconds): 3.37
      Total CPU Time (Seconds): 5470.88

  Cores: 33
      Elapsed Time (Seconds): 206.95
      CPU Time in User Mode (Seconds): 5577.30
      CPU Time in Kernel Mode (Seconds): 3.79
      Total CPU Time (Seconds): 5581.09

  Cores: 34
      Elapsed Time (Seconds): 203.94
      CPU Time in User Mode (Seconds): 5654.65
      CPU Time in Kernel Mode (Seconds): 4.34
      Total CPU Time (Seconds): 5658.98

  Cores: 35
      Elapsed Time (Seconds): 201.44
      CPU Time in User Mode (Seconds): 5716.14
      CPU Time in Kernel Mode (Seconds): 5.79
      Total CPU Time (Seconds): 5721.93

  Cores: 36
      Elapsed Time (Seconds): 203.92
      CPU Time in User Mode (Seconds): 5892.75
      CPU Time in Kernel Mode (Seconds): 5.10
      Total CPU Time (Seconds): 5897.85

  Cores: 37
      Elapsed Time (Seconds): 201.41
      CPU Time in User Mode (Seconds): 5934.62
      CPU Time in Kernel Mode (Seconds): 6.02
      Total CPU Time (Seconds): 5940.64

  Cores: 38
      Elapsed Time (Seconds): 202.69
      CPU Time in User Mode (Seconds): 6101.84
      CPU Time in Kernel Mode (Seconds): 4.74
      Total CPU Time (Seconds): 6106.58

  Cores: 39
      Elapsed Time (Seconds): 203.55
      CPU Time in User Mode (Seconds): 6220.95
      CPU Time in Kernel Mode (Seconds): 5.58
      Total CPU Time (Seconds): 6226.53

  Cores: 40
      Elapsed Time (Seconds): 205.38
      CPU Time in User Mode (Seconds): 6440.61
      CPU Time in Kernel Mode (Seconds): 5.44
      Total CPU Time (Seconds): 6446.05

  Cores: 41
      Elapsed Time (Seconds): 203.72
      CPU Time in User Mode (Seconds): 6498.35
      CPU Time in Kernel Mode (Seconds): 5.79
      Total CPU Time (Seconds): 6504.13

  Cores: 42
      Elapsed Time (Seconds): 203.78
      CPU Time in User Mode (Seconds): 6576.22
      CPU Time in Kernel Mode (Seconds): 6.27
      Total CPU Time (Seconds): 6582.49

  Cores: 43
      Elapsed Time (Seconds): 207.47
      CPU Time in User Mode (Seconds): 6835.87
      CPU Time in Kernel Mode (Seconds): 4.66
      Total CPU Time (Seconds): 6840.53

  Cores: 44
      Elapsed Time (Seconds): 208.18
      CPU Time in User Mode (Seconds): 7000.56
      CPU Time in Kernel Mode (Seconds): 4.38
      Total CPU Time (Seconds): 7004.94

  Cores: 45
      Elapsed Time (Seconds): 208.14
      CPU Time in User Mode (Seconds): 7100.90
      CPU Time in Kernel Mode (Seconds): 5.55
      Total CPU Time (Seconds): 7106.45

  Cores: 46
      Elapsed Time (Seconds): 206.81
      CPU Time in User Mode (Seconds): 7162.22
      CPU Time in Kernel Mode (Seconds): 7.25
      Total CPU Time (Seconds): 7169.48

  Cores: 47
      Elapsed Time (Seconds): 210.04
      CPU Time in User Mode (Seconds): 7352.03
      CPU Time in Kernel Mode (Seconds): 4.49
      Total CPU Time (Seconds): 7356.52

  Cores: 48
      Elapsed Time (Seconds): 208.88
      CPU Time in User Mode (Seconds): 7458.50
      CPU Time in Kernel Mode (Seconds): 6.15
      Total CPU Time (Seconds): 7464.65

      The elapsed time was sped up to 201.41 seconds with 35 cores. After employing more than 35 cores, the elapsed time on the testing problem could not be further reduced. From the timing results, we can see the problem is on the total CPU time. The total CPU time was increased with the number of cores enabled. When enabling more cores, the increasing CPU time could not be traded off by the additional cores enabled. Under such circumstance, it was not possible to further speed up the computing. However, the problem could be improved by a tune-up. Here shows a set of performance. Speedup and efficiency are listed in the following.

Speedup and Efficiency

Number
of Cores
Elapsed
Time (sec)
Speedup Efficiency
(%)
1 4501.97 1.0000 100.00
2 2136.72 2.1070 105.35
3 1433.56 3.1404 104.68
4 1079.81 4.1692 104.23
5 870.33 5.1727 103.45
6 729.94 6.1676 102.79
7 638.90 7.0464 100.66
8 567.95 7.9267 99.08
9 513.43 8.7684 97.43
10 469.56 9.5876 95.88
11 434.03 10.3725 94.30
12 402.70 11.1795 93.16
13 375.71 11.9826 92.17
14 352.75 12.7625 91.16
15 333.64 13.4935 89.96
16 317.20 14.1928 88.71
17 302.72 14.8717 87.48
18 288.65 15.5966 86.65
19 277.84 16.2035 85.28
20 266.92 16.8664 84.33
21 258.49 17.416 82.94
22 248.96 18.0831 82.20
23 242.85 18.5381 80.60
24 237.93 18.9214 78.84
25 230.57 19.5254 78.10
26 226.73 19.8561 76.37
27 221.71 20.3057 75.21
28 216.84 20.7617 74.15
29 214.83 20.9560 72.26
30 209.79 21.4594 71.53
31 208.68 21.5736 69.59
32 207.08 21.7402 67.94
33 206.95 21.7539 65.92
34 203.94 22.0750 64.93
35 201.44 22.3489 63.85
36 203.92 22.0771 61.33
37 201.41 22.3523 60.41
38 202.69 22.2111 58.45
39 203.55 22.1173 56.71
40 205.38 21.9202 54.80
41 203.72 22.0988 53.90
42 203.78 22.0923 52.60
43 207.47 21.6994 50.46
44 208.18 21.6254 49.15
45 208.14 21.6295 48.06
46 206.81 21.7686 47.32
47 210.04 21.4339 45.60
48 208.88 21.5529 44.90


In this example, the computing was sped up within the range of 35 cores, and also yeilded an efficiency higher than 90% within the range of 15 cores.