Equation Solution  
    High Performance by Design

 
Page: 2
 
Parallel Performance of laipe$decompose_DAG_8 on 48 cores
 
Chunk Size and Parallel Performance
 
Parallel Performance of laipe$decompose_DAG_4 on 48 cores
 
Parallel Performance of laipe$decompose_DAG_10 on 48 cores
 
Execution Time of One-Core-Enabled Parallel Code and Sequential Code
 

  1   2   3   4   5  



Parallel Performance of laipe$Decompose_DAG_8 on 48 Cores


[Posted by Jenn-Ching Luo on Feb. 18, 2016 ]

        This post has a set of parallel performance of the LAIPE2 subroutine laipe$Decompose_DAG_8 on 48 cores. The subroutine laipe$Decompose_DAG_8 decomposes a 8-byte dense matrix [A] into [L][U]. Testing example is a 8-byte dense matrix of order 20,000-by-20,000.

        This example was implemented on homogeneous softcore of neuLoop. A Dell PowerEdge R815 with quad 1.9GHZ 12-core Opterons on Windows Server 2008, a total of 48 cores, implemented the computing. Compiler is gfortran. Timing result is listed as follows.

Timing Result

  Core: 1
      Elapsed Time (Seconds): 8943.55
      CPU Time in User Mode (Seconds): 8943.30
      CPU Time in Kernel Mode (Seconds): 0.25
      Total CPU Time (Seconds): 8943.55

  Cores: 2
      Elapsed Time (Seconds): 4031.88
      CPU Time in User Mode (Seconds): 8039.15
      CPU Time in Kernel Mode (Seconds): 0.42
      Total CPU Time (Seconds): 8039.57

  Cores: 3
      Elapsed Time (Seconds): 2692.89
      CPU Time in User Mode (Seconds): 8024.25
      CPU Time in Kernel Mode (Seconds): 0.39
      Total CPU Time (Seconds): 8024.65

  Cores: 4
      Elapsed Time (Seconds): 2025.16
      CPU Time in User Mode (Seconds): 8023.21
      CPU Time in Kernel Mode (Seconds): 0.36
      Total CPU Time (Seconds): 8023.57

  Cores: 5
      Elapsed Time (Seconds): 1627.99
      CPU Time in User Mode (Seconds): 8025.97
      CPU Time in Kernel Mode (Seconds): 0.59
      Total CPU Time (Seconds): 8026.56

  Cores: 6
      Elapsed Time (Seconds): 1362.15
      CPU Time in User Mode (Seconds): 8029.42
      CPU Time in Kernel Mode (Seconds): 0.84
      Total CPU Time (Seconds): 8030.26

  Cores: 7
      Elapsed Time (Seconds): 1185.90
      CPU Time in User Mode (Seconds): 8126.01
      CPU Time in Kernel Mode (Seconds): 0.67
      Total CPU Time (Seconds): 8126.69

  Cores: 8
      Elapsed Time (Seconds): 1051.68
      CPU Time in User Mode (Seconds): 8204.19
      CPU Time in Kernel Mode (Seconds): 0.81
      Total CPU Time (Seconds): 8205.00

  Cores: 9
      Elapsed Time (Seconds): 948.00
      CPU Time in User Mode (Seconds): 8281.95
      CPU Time in Kernel Mode (Seconds): 0.84
      Total CPU Time (Seconds): 8282.80

  Cores: 10
      Elapsed Time (Seconds): 861.08
      CPU Time in User Mode (Seconds): 8331.50
      CPU Time in Kernel Mode (Seconds): 0.98
      Total CPU Time (Seconds): 8332.48

  Cores: 11
      Elapsed Time (Seconds): 790.88
      CPU Time in User Mode (Seconds): 8386.71
      CPU Time in Kernel Mode (Seconds): 1.05
      Total CPU Time (Seconds): 8387.75

  Cores: 12
      Elapsed Time (Seconds): 733.28
      CPU Time in User Mode (Seconds): 8449.59
      CPU Time in Kernel Mode (Seconds): 1.56
      Total CPU Time (Seconds): 8451.15

  Cores: 13
      Elapsed Time (Seconds): 687.06
      CPU Time in User Mode (Seconds): 8545.05
      CPU Time in Kernel Mode (Seconds): 2.64
      Total CPU Time (Seconds): 8547.69

  Cores: 14
      Elapsed Time (Seconds): 646.69
      CPU Time in User Mode (Seconds): 8628.88
      CPU Time in Kernel Mode (Seconds): 2.76
      Total CPU Time (Seconds): 8631.64

  Cores: 15
      Elapsed Time (Seconds): 609.98
      CPU Time in User Mode (Seconds): 8693.11
      CPU Time in Kernel Mode (Seconds): 2.07
      Total CPU Time (Seconds): 8695.18

  Cores: 16
      Elapsed Time (Seconds): 579.37
      CPU Time in User Mode (Seconds): 8759.64
      CPU Time in Kernel Mode (Seconds): 1.06
      Total CPU Time (Seconds): 8760.70

  Cores: 17
      Elapsed Time (Seconds): 552.65
      CPU Time in User Mode (Seconds): 8841.06
      CPU Time in Kernel Mode (Seconds): 1.68
      Total CPU Time (Seconds): 8842.75

  Cores: 18
      Elapsed Time (Seconds): 527.42
      CPU Time in User Mode (Seconds): 8907.00
      CPU Time in Kernel Mode (Seconds): 2.50
      Total CPU Time (Seconds): 8909.50

  Cores: 19
      Elapsed Time (Seconds): 503.40
      CPU Time in User Mode (Seconds): 8940.46
      CPU Time in Kernel Mode (Seconds): 2.59
      Total CPU Time (Seconds): 8943.05

  Cores: 20
      Elapsed Time (Seconds): 481.00
      CPU Time in User Mode (Seconds): 8945.30
      CPU Time in Kernel Mode (Seconds): 2.64
      Total CPU Time (Seconds): 8947.94

  Cores: 21
      Elapsed Time (Seconds): 462.15
      CPU Time in User Mode (Seconds): 8987.75
      CPU Time in Kernel Mode (Seconds): 2.43
      Total CPU Time (Seconds): 8990.18

  Cores: 22
      Elapsed Time (Seconds): 444.67
      CPU Time in User Mode (Seconds): 9001.79
      CPU Time in Kernel Mode (Seconds): 2.06
      Total CPU Time (Seconds): 9003.85

  Cores: 23
      Elapsed Time (Seconds): 430.80
      CPU Time in User Mode (Seconds): 9080.46
      CPU Time in Kernel Mode (Seconds): 2.76
      Total CPU Time (Seconds): 9083.22

  Cores: 24
      Elapsed Time (Seconds): 417.68
      CPU Time in User Mode (Seconds): 9146.51
      CPU Time in Kernel Mode (Seconds): 2.92
      Total CPU Time (Seconds): 9149.43

  Cores: 25
      Elapsed Time (Seconds): 403.65
      CPU Time in User Mode (Seconds): 9175.43
      CPU Time in Kernel Mode (Seconds): 2.57
      Total CPU Time (Seconds): 9178.01

  Cores: 26
      Elapsed Time (Seconds): 392.51
      CPU Time in User Mode (Seconds): 9221.62
      CPU Time in Kernel Mode (Seconds): 2.03
      Total CPU Time (Seconds): 9223.65

  Cores: 27
      Elapsed Time (Seconds): 382.40
      CPU Time in User Mode (Seconds): 9282.06
      CPU Time in Kernel Mode (Seconds): 3.40
      Total CPU Time (Seconds): 9285.46

  Cores: 28
      Elapsed Time (Seconds): 372.70
      CPU Time in User Mode (Seconds): 9336.22
      CPU Time in Kernel Mode (Seconds): 3.32
      Total CPU Time (Seconds): 9339.55

  Cores: 29
      Elapsed Time (Seconds): 363.84
      CPU Time in User Mode (Seconds): 9396.42
      CPU Time in Kernel Mode (Seconds): 3.57
      Total CPU Time (Seconds): 9400.00

  Cores: 30
      Elapsed Time (Seconds): 356.74
      CPU Time in User Mode (Seconds): 9496.44
      CPU Time in Kernel Mode (Seconds): 2.84
      Total CPU Time (Seconds): 9499.28

  Cores: 31
      Elapsed Time (Seconds): 349.85
      CPU Time in User Mode (Seconds): 9567.93
      CPU Time in Kernel Mode (Seconds): 2.65
      Total CPU Time (Seconds): 9570.58

  Cores: 32
      Elapsed Time (Seconds): 344.43
      CPU Time in User Mode (Seconds): 9673.73
      CPU Time in Kernel Mode (Seconds): 3.57
      Total CPU Time (Seconds): 9677.30

  Cores: 33
      Elapsed Time (Seconds): 336.04
      CPU Time in User Mode (Seconds): 9700.64
      CPU Time in Kernel Mode (Seconds): 4.01
      Total CPU Time (Seconds): 9704.65

  Cores: 34
      Elapsed Time (Seconds): 330.35
      CPU Time in User Mode (Seconds): 9759.14
      CPU Time in Kernel Mode (Seconds): 4.43
      Total CPU Time (Seconds): 9763.57

  Cores: 35
      Elapsed Time (Seconds): 326.24
      CPU Time in User Mode (Seconds): 9864.89
      CPU Time in Kernel Mode (Seconds): 4.20
      Total CPU Time (Seconds): 9869.09

  Cores: 36
      Elapsed Time (Seconds): 321.58
      CPU Time in User Mode (Seconds): 10004.19
      CPU Time in Kernel Mode (Seconds): 4.66
      Total CPU Time (Seconds): 10008.85

  Cores: 37
      Elapsed Time (Seconds): 318.38
      CPU Time in User Mode (Seconds): 10096.28
      CPU Time in Kernel Mode (Seconds): 5.19
      Total CPU Time (Seconds): 10101.47

  Cores: 38
      Elapsed Time (Seconds): 315.51
      CPU Time in User Mode (Seconds): 10221.64
      CPU Time in Kernel Mode (Seconds): 4.41
      Total CPU Time (Seconds): 10226.05

  Cores: 39
      Elapsed Time (Seconds): 310.39
      CPU Time in User Mode (Seconds): 10272.34
      CPU Time in Kernel Mode (Seconds): 4.73
      Total CPU Time (Seconds): 10277.06

  Cores: 40
      Elapsed Time (Seconds): 309.29
      CPU Time in User Mode (Seconds): 10455.56
      CPU Time in Kernel Mode (Seconds): 4.71
      Total CPU Time (Seconds): 10460.27

  Cores: 41
      Elapsed Time (Seconds): 307.68
      CPU Time in User Mode (Seconds): 10595.63
      CPU Time in Kernel Mode (Seconds): 4.90
      Total CPU Time (Seconds): 10600.53

  Cores: 42
      Elapsed Time (Seconds): 304.22
      CPU Time in User Mode (Seconds): 10695.66
      CPU Time in Kernel Mode (Seconds): 4.37
      Total CPU Time (Seconds): 10700.03

  Cores: 43
      Elapsed Time (Seconds): 304.59
      CPU Time in User Mode (Seconds): 10885.03
      CPU Time in Kernel Mode (Seconds): 5.79
      Total CPU Time (Seconds): 10890.82

  Cores: 44
      Elapsed Time (Seconds): 302.02
      CPU Time in User Mode (Seconds): 10985.79
      CPU Time in Kernel Mode (Seconds): 5.40
      Total CPU Time (Seconds): 10991.19

  Cores: 45
      Elapsed Time (Seconds): 301.17
      CPU Time in User Mode (Seconds): 11173.10
      CPU Time in Kernel Mode (Seconds): 5.88
      Total CPU Time (Seconds): 11178.98

  Cores: 46
      Elapsed Time (Seconds): 299.02
      CPU Time in User Mode (Seconds): 11305.94
      CPU Time in Kernel Mode (Seconds): 5.18
      Total CPU Time (Seconds): 11311.12

  Cores: 47
      Elapsed Time (Seconds): 297.96
      CPU Time in User Mode (Seconds): 11454.33
      CPU Time in Kernel Mode (Seconds): 5.73
      Total CPU Time (Seconds): 11460.05

  Cores: 48
      Elapsed Time (Seconds): 298.57
      CPU Time in User Mode (Seconds): 11601.56
      CPU Time in Kernel Mode (Seconds): 5.69
      Total CPU Time (Seconds): 11607.25

        Timing result includes elapsed time, CPU time in user mode, CPU time in kernel mode, and total CPU time. In general, total CPU time increases with number of cores, except when a superlinearity occurs. The reason is straightforward. More cores always maintain more local caches. More local caches would request more demands to keep the consistency of shared resource, which drives up the costs. That is the reason why total CPU time would increase with number of cores in general.

        From the above, we could see superlinearities. For example, one core took 8943.55 seconds of the total CPU time; Two cores took 8039.57 seconds, an occurrence of superlinearity; While, 48 cores took 11607.25 seconds of total CPU time. In general, total CPU time would increase with number of cores.

        Elapsed time measures speedup. Speedup and efficiency are listed in the following.

Speedup and Efficiency

Number
of Cores
Elapsed
Time (sec)
Speedup Efficiency
(%)
1 8943.55 1.0000 100.00
2 4031.88 2.2182 110.91
3 2692.89 3.3212 110.71
4 2025.16 4.4162 110.41
5 1627.99 5.4936 109.87
6 1362.15 6.5658 109.43
7 1185.90 7.5416 107.74
8 1051.68 8.5041 106.30
9 948.00 9.4341 104.82
10 861.08 10.3864 103.86
11 790.88 11.3083 102.80
12 733.28 12.1966 101.64
13 687.06 13.0171 100.13
14 646.69 13.8297 98.78
15 609.98 14.6620 97.75
16 579.37 15.4367 96.48
17 552.65 16.1830 95.19
18 527.42 16.9572 94.21
19 503.40 17.7663 93.51
20 481.00 18.5937 92.97
21 462.15 19.3521 92.15
22 444.67 20.1128 91.42
23 430.80 20.7603 90.26
24 417.68 21.4124 89.22
25 403.65 22.1567 88.63
26 392.51 22.7855 87.64
27 382.40 23.3879 86.62
28 372.70 23.9966 85.70
29 363.84 24.5810 84.76
30 356.74 25.0702 83.57
31 349.85 25.5640 82.46
32 344.43 25.9662 81.14
33 336.04 26.6145 80.65
34 330.35 27.0730 79.63
35 326.24 27.4140 78.33
36 321.58 27.8113 77.25
37 318.38 28.0908 75.92
38 315.51 28.3463 74.60
39 310.39 28.8139 73.88
40 309.29 28.9164 72.29
41 307.68 29.0677 70.90
42 304.22 29.3983 70.00
43 304.59 29.3626 68.29
44 302.02 29.6124 67.30
45 301.17 29.6960 65.99
46 299.02 29.9095 65.02
47 297.96 30.0159 63.86
48 298.57 29.9546 62.41

        From the above table, speedup could be seen in the third column. The computing kept speeding up with more cores enabled. It also can be seen that, up to 23 cores, the subroutine laipe$Decompose_DAG_8 could yield an efficiency over 90%. It sped up to 21X on 23 cores.

        Further, it also can be seen that, with 33 cores, the example could yield an efficiency 80.65%; With 41 cores it could yield an efficiency 70.00%; With 48 cores it could yield an efficiency 62.41%