Equation Solution  
    High Performance by Design
List of Blog Contents  

 
Page: 1
 
Parallel Performance of laipe$decompose_DAG_10 on 48 cores
 
Execution Time of One-Core-Enabled Parallel Code and Sequential Code
 
Implementation of Constant-Bandwidth Solver of LAIPE2 on 48 Cores
 
Parallel Performance of Skyline Solver on Soft Cores (3)
 
Parallel Performance of Skyline Solver on Soft Cores (2)
 

1   2   3   4  
 
 



Parallel Performance of laipe$decompose_DAG_10 on 48 Cores


[Posted by Jenn-Ching Luo on Dec. 15, 2015 ]

      This post shares a set of parallel performance of the subroutine laipe$decompose_DAG_10 on 48 cores. The platform is Dell PowerEdge R815 with quad 1.9GHZ 12-core Opertons. Compiler is gfortran. The testing problem is a 10-byte dense matrix of order (7000 x 7000). The testing program was linked against homogeneous cores of neuloop. Timing result, speedup and performance are listed in the following.

Timing Result

  Core: 1
      Elapsed Time (seconds): 1287.82
      cpu Time in User Mode (seconds): 1287.69
      cpu Time in Kernel Mode (seconds): 0.09
      Total cpu Time (seconds): 1287.79

  Cores: 2
      Elapsed Time (seconds): 636.91
      cpu Time in User Mode (seconds): 1269.10
      cpu Time in Kernel Mode (seconds): 0.14
      Total cpu Time (seconds): 1269.24

  Cores: 3
      Elapsed Time (seconds): 430.28
      cpu Time in User Mode (seconds): 1278.27
      cpu Time in Kernel Mode (seconds): 0.20
      Total cpu Time (seconds): 1278.47

  Cores: 4
      Elapsed Time (seconds): 326.14
      cpu Time in User Mode (seconds): 1285.67
      cpu Time in Kernel Mode (seconds): 0.31
      Total cpu Time (seconds): 1285.98

  Cores: 5
      Elapsed Time (seconds): 263.72
      cpu Time in User Mode (seconds): 1291.36
      cpu Time in Kernel Mode (seconds): 0.50
      Total cpu Time (seconds): 1291.86

  Cores: 6
      Elapsed Time (seconds): 221.79
      cpu Time in User Mode (seconds): 1297.23
      cpu Time in Kernel Mode (seconds): 0.58
      Total cpu Time (seconds): 1297.80

  Cores: 7
      Elapsed Time (seconds): 192.04
      cpu Time in User Mode (seconds): 1303.86
      cpu Time in Kernel Mode (seconds): 0.64
      Total cpu Time (seconds): 1304.50

  Cores: 8
      Elapsed Time (seconds): 169.74
      cpu Time in User Mode (seconds): 1311.33
      cpu Time in Kernel Mode (seconds): 0.87
      Total cpu Time (seconds): 1312.20

  Cores: 9
      Elapsed Time (seconds): 153.01
      cpu Time in User Mode (seconds): 1319.88
      cpu Time in Kernel Mode (seconds): 0.87
      Total cpu Time (seconds): 1320.75

  Cores: 10
      Elapsed Time (seconds): 139.31
      cpu Time in User Mode (seconds): 1328.61
      cpu Time in Kernel Mode (seconds): 0.70
      Total cpu Time (seconds): 1329.32

  Cores: 11
      Elapsed Time (seconds): 128.08
      cpu Time in User Mode (seconds): 1338.21
      cpu Time in Kernel Mode (seconds): 1.29
      Total cpu Time (seconds): 1339.50

  Cores: 12
      Elapsed Time (seconds): 118.72
      cpu Time in User Mode (seconds): 1345.62
      cpu Time in Kernel Mode (seconds): 1.25
      Total cpu Time (seconds): 1346.87

  Cores: 13
      Elapsed Time (seconds): 110.39
      cpu Time in User Mode (seconds): 1347.47
      cpu Time in Kernel Mode (seconds): 1.45
      Total cpu Time (seconds): 1348.93

  Cores: 14
      Elapsed Time (seconds): 103.29
      cpu Time in User Mode (seconds): 1353.87
      cpu Time in Kernel Mode (seconds): 1.54
      Total cpu Time (seconds): 1355.41

  Cores: 15
      Elapsed Time (seconds): 97.25
      cpu Time in User Mode (seconds): 1356.18
      cpu Time in Kernel Mode (seconds): 1.53
      Total cpu Time (seconds): 1357.71

  Cores: 16
      Elapsed Time (seconds): 92.04
      cpu Time in User Mode (seconds): 1363.73
      cpu Time in Kernel Mode (seconds): 1.83
      Total cpu Time (seconds): 1365.55

  Cores: 17
      Elapsed Time (seconds): 87.49
      cpu Time in User Mode (seconds): 1368.46
      cpu Time in Kernel Mode (seconds): 1.70
      Total cpu Time (seconds): 1370.16

  Cores: 18
      Elapsed Time (seconds): 83.49
      cpu Time in User Mode (seconds): 1377.55
      cpu Time in Kernel Mode (seconds): 1.86
      Total cpu Time (seconds): 1379.41

  Cores: 19
      Elapsed Time (seconds): 79.72
      cpu Time in User Mode (seconds): 1380.84
      cpu Time in Kernel Mode (seconds): 2.01
      Total cpu Time (seconds): 1382.86

  Cores: 20
      Elapsed Time (seconds): 76.28
      cpu Time in User Mode (seconds): 1384.17
      cpu Time in Kernel Mode (seconds): 2.12
      Total cpu Time (seconds): 1386.29

  Cores: 21
      Elapsed Time (seconds): 73.45
      cpu Time in User Mode (seconds): 1387.60
      cpu Time in Kernel Mode (seconds): 2.57
      Total cpu Time (seconds): 1390.17

  Cores: 22
      Elapsed Time (seconds): 70.90
      cpu Time in User Mode (seconds): 1398.46
      cpu Time in Kernel Mode (seconds): 2.62
      Total cpu Time (seconds): 1401.08

  Cores: 23
      Elapsed Time (seconds): 68.52
      cpu Time in User Mode (seconds): 1403.04
      cpu Time in Kernel Mode (seconds): 2.78
      Total cpu Time (seconds): 1405.82

  Cores: 24
      Elapsed Time (seconds): 66.38
      cpu Time in User Mode (seconds): 1412.36
      cpu Time in Kernel Mode (seconds): 2.98
      Total cpu Time (seconds): 1415.33

  Cores: 25
      Elapsed Time (seconds): 64.63
      cpu Time in User Mode (seconds): 1426.35
      cpu Time in Kernel Mode (seconds): 2.82
      Total cpu Time (seconds): 1429.17

  Cores: 26
      Elapsed Time (seconds): 62.90
      cpu Time in User Mode (seconds): 1436.66
      cpu Time in Kernel Mode (seconds): 3.03
      Total cpu Time (seconds): 1439.69

  Cores: 27
      Elapsed Time (seconds): 61.32
      cpu Time in User Mode (seconds): 1445.15
      cpu Time in Kernel Mode (seconds): 2.78
      Total cpu Time (seconds): 1447.92

  Cores: 28
      Elapsed Time (seconds): 59.97
      cpu Time in User Mode (seconds): 1459.87
      cpu Time in Kernel Mode (seconds): 3.06
      Total cpu Time (seconds): 1462.93

  Cores: 29
      Elapsed Time (seconds): 58.80
      cpu Time in User Mode (seconds): 1474.85
      cpu Time in Kernel Mode (seconds): 4.04
      Total cpu Time (seconds): 1478.89

  Cores: 30
      Elapsed Time (seconds): 57.49
      cpu Time in User Mode (seconds): 1482.18
      cpu Time in Kernel Mode (seconds): 3.76
      Total cpu Time (seconds): 1485.94

  Cores: 31
      Elapsed Time (seconds): 56.55
      cpu Time in User Mode (seconds): 1501.51
      cpu Time in Kernel Mode (seconds): 3.65
      Total cpu Time (seconds): 1505.16

  Cores: 32
      Elapsed Time (seconds): 55.36
      cpu Time in User Mode (seconds): 1506.69
      cpu Time in Kernel Mode (seconds): 3.31
      Total cpu Time (seconds): 1510.00

  Cores: 33
      Elapsed Time (seconds): 54.32
      cpu Time in User Mode (seconds): 1521.85
      cpu Time in Kernel Mode (seconds): 3.62
      Total cpu Time (seconds): 1525.47

  Cores: 34
      Elapsed Time (seconds): 53.49
      cpu Time in User Mode (seconds): 1529.93
      cpu Time in Kernel Mode (seconds): 3.70
      Total cpu Time (seconds): 1533.63

  Cores: 35
      Elapsed Time (seconds): 52.59
      cpu Time in User Mode (seconds): 1542.32
      cpu Time in Kernel Mode (seconds): 3.98
      Total cpu Time (seconds): 1546.30

  Cores: 36
      Elapsed Time (seconds): 51.87
      cpu Time in User Mode (seconds): 1556.28
      cpu Time in Kernel Mode (seconds): 4.31
      Total cpu Time (seconds): 1560.59

  Cores: 37
      Elapsed Time (seconds): 51.17
      cpu Time in User Mode (seconds): 1570.12
      cpu Time in Kernel Mode (seconds): 4.13
      Total cpu Time (seconds): 1574.25

  Cores: 38
      Elapsed Time (seconds): 50.54
      cpu Time in User Mode (seconds): 1589.21
      cpu Time in Kernel Mode (seconds): 4.60
      Total cpu Time (seconds): 1593.82

  Cores: 39
      Elapsed Time (seconds): 50.09
      cpu Time in User Mode (seconds): 1600.91
      cpu Time in Kernel Mode (seconds): 4.98
      Total cpu Time (seconds): 1605.89

  Cores: 40
      Elapsed Time (seconds): 49.61
      cpu Time in User Mode (seconds): 1620.65
      cpu Time in Kernel Mode (seconds): 5.13
      Total cpu Time (seconds)

  Cores: 41
      Elapsed Time (seconds): 49.05
      cpu Time in User Mode (seconds): 1633.97
      cpu Time in Kernel Mode (seconds): 4.74
      Total cpu Time (seconds): 1638.71

  Cores: 42
      Elapsed Time (seconds): 48.64
      cpu Time in User Mode (seconds): 1655.05
      cpu Time in Kernel Mode (seconds): 4.59
      Total cpu Time (seconds): 1659.63

  Cores: 43
      Elapsed Time (seconds): 48.35
      cpu Time in User Mode (seconds): 1669.62
      cpu Time in Kernel Mode (seconds): 6.22
      Total cpu Time (seconds): 1675.84

  Cores: 44
      Elapsed Time (seconds): 48.02
      cpu Time in User Mode (seconds): 1697.81
      cpu Time in Kernel Mode (seconds): 5.32
      Total cpu Time (seconds): 1703.13

  Cores: 45
      Elapsed Time (seconds): 47.74
      cpu Time in User Mode (seconds): 1718.40
      cpu Time in Kernel Mode (seconds): 5.74
      Total cpu Time (seconds): 1724.14

  Cores: 46
      Elapsed Time (seconds): 47.52
      cpu Time in User Mode (seconds): 1735.01
      cpu Time in Kernel Mode (seconds): 5.60
      Total cpu Time (seconds): 1740.61

  Cores: 47
      Elapsed Time (seconds): 47.38
      cpu Time in User Mode (seconds): 1763.54
      cpu Time in Kernel Mode (seconds): 5.57
      Total cpu Time (seconds): 1769.11

  Cores: 48
      Elapsed Time (seconds): 47.31
      cpu Time in User Mode (seconds): 1784.98
      cpu Time in Kernel Mode (seconds): 5.69
      Total cpu Time (seconds): 1790.67

In this set of timing result, we can see the total cpu time was increased significantly. When using one core, the total cpu time was 1287.79 seconds; While the total cpu time was increased to 1790.67 seconds when using 48 cores. There was 502.88 seconds in difference, which degraded the performance. We further can see the elapsed time was reduced by increasing the number of cores. Speed and efficiency are summarized in the following.

Speedup and Efficiency

Number
of Cores
Elapsed
Time (sec)
Speedup Efficiency
(%)
1 1287.82 1.0000 100.00
2 636.91 2.0220 101.10
3 430.28 2.9930 99.77
4 326.14 3.9487 98.72
5 263.72 4.8833 97.67
6 221.79 5.8065 96.77
7 192.04 6.7060 95.80
8 169.74 7.5870 94.84
9 153.01 8.4166 93.52
10 139.31 9.2442 92.44
11 128.08 10.0548 91.41
12 118.72 10.8475 90.40
13 110.39 11.6661 89.74
14 103.29 12.4680 89.06
15 97.25 13.2424 88.28
16 92.04 13.9920 87.45
17 87.49 14.7196 86.59
18 83.49 15.4248 85.69
19 79.72 16.1543 85.02
20 76.28 16.8828 84.41
21 73.45 17.5333 83.49
22 70.90 18.1639 82.56
23 68.52 18.7948 81.72
24 66.38 19.4007 80.84
25 64.63 19.9260 79.70
26 62.90 20.4741 78.75
27 61.32 21.0016 77.78
28 59.97 21.4744 76.69
29 58.80 21.9017 75.52
30 57.49 22.4008 74.67
31 56.55 22.7731 73.46
32 55.36 23.2626 72.70
33 54.32 23.7080 71.84
34 53.49 24.0759 70.81
35 52.59 24.4879 69.97
36 51.87 24.8278 68.97
37 51.17 25.1675 68.02
38 50.54 25.4812 67.06
39 50.09 25.7101 65.92
40 49.61 25.9589 64.90
41 49.05 26.2552 64.04
42 48.64 26.4766 63.04
43 48.35 26.6354 61.94
44 48.02 26.8184 60.95
45 47.74 26.9757 59.95
46 47.52 27.1006 58.91
47 47.38 27.1807 57.83
48 47.31 27.2208 56.71

The efficiency was dropped down below 90% when using 13 cores. Most LAIPE subroutines can yield an efficiency up to 90% when using 10 cores.