Equation Solution
High Performance by Design

List of Blog Contents
 Page: 1   Parallel Performance of 8-Byte Matrix Multiplication   How Fast 64 Cores Can Improve   Parallel Performance of 10-Byte Real Matrix Product on 48 Cores   Parallel Dense Solver on 64 Cores   Parallel Performance of laipe\$decompose_DAG_16 on 48 cores   1   2   3   4   ›

How Fast 64 Cores Can Improve

[Posted by Jenn-Ching Luo on June 2, 2016 ]

In this post, a set of timing results is presented for an answer how fast 64 cores can improve. A short answer is that 64 cores can improve the computing speed up to 60x as compared with the speed by one core. In the following, we are going to see the parallel performance.

TESTING EXAMPLE

Compute [C]=[A][B], where matrices [A], [B] and [C] are 16-byte real matrix. Matrix [A] is of order (4700-by-3600), and matrix [B] is of order (3600-by-3700), and Matrix [C] is of order (4700-by-3700).

COMPUTING ENVIRONMENT

Computer: It is a Dell PoweredEdge R815 with quad Opteron 6276, a total of 64 cores. Because Opteron 6276 can run at a higher frequency when using 8 or less cores. For the purpose of measuring parallel performance, processor's turbo boost was disabled. It becomes very slow after disabling turbo boost.
Operating System: Windows Server 2008 R2
Compiler: gfortran with optimization -O3; The application links against neuloop4 for parallel processing.
Subroutine: laipe\$matmul_16 which performs matrix multiplication in parallel

TIMING RESULT

Timing results include "elapsed time", "CPU time in user mode", "CPU time in kernel mode", and "total CPU time". The timing result is as follows.

number of cores: 1
Elapsed Time (Seconds): 8062.54
CPU Time in User Mode (Seconds): 8061.76
CPU Time in Kernel Mode (Seconds): 0.72
Total CPU Time (Seconds): 8062.48

number of cores: 2
Elapsed Time (Seconds): 4041.61
CPU Time in User Mode (Seconds): 8076.83
CPU Time in Kernel Mode (Seconds): 1.09
Total CPU Time (Seconds): 8077.92

number of cores: 3
Elapsed Time (Seconds): 2715.17
CPU Time in User Mode (Seconds): 8136.34
CPU Time in Kernel Mode (Seconds): 0.84
Total CPU Time (Seconds): 8137.18

number of cores: 4
Elapsed Time (Seconds): 2036.36
CPU Time in User Mode (Seconds): 8131.47
CPU Time in Kernel Mode (Seconds): 0.90
Total CPU Time (Seconds): 8132.38

number of cores: 5
Elapsed Time (Seconds): 1639.68
CPU Time in User Mode (Seconds): 8180.47
CPU Time in Kernel Mode (Seconds): 1.14
Total CPU Time (Seconds): 8181.61

number of cores: 6
Elapsed Time (Seconds): 1365.85
CPU Time in User Mode (Seconds): 8173.42
CPU Time in Kernel Mode (Seconds): 0.94
Total CPU Time (Seconds): 8174.36

number of cores: 7
Elapsed Time (Seconds): 1175.58
CPU Time in User Mode (Seconds): 8204.72
CPU Time in Kernel Mode (Seconds): 1.05
Total CPU Time (Seconds): 8205.76

number of cores: 8
Elapsed Time (Seconds): 1032.18
CPU Time in User Mode (Seconds): 8221.86
CPU Time in Kernel Mode (Seconds): 1.03
Total CPU Time (Seconds): 8222.89

number of cores: 9
Elapsed Time (Seconds): 917.29
CPU Time in User Mode (Seconds): 8215.04
CPU Time in Kernel Mode (Seconds): 0.81
Total CPU Time (Seconds): 8215.86

number of cores: 10
Elapsed Time (Seconds): 827.82
CPU Time in User Mode (Seconds): 8231.94
CPU Time in Kernel Mode (Seconds): 0.89
Total CPU Time (Seconds): 8232.83

number of cores: 11
Elapsed Time (Seconds): 749.87
CPU Time in User Mode (Seconds): 8222.45
CPU Time in Kernel Mode (Seconds): 1.26
Total CPU Time (Seconds): 8223.72

number of cores: 12
Elapsed Time (Seconds): 691.04
CPU Time in User Mode (Seconds): 8236.59
CPU Time in Kernel Mode (Seconds): 1.05
Total CPU Time (Seconds): 8237.63

number of cores: 13
Elapsed Time (Seconds): 635.53
CPU Time in User Mode (Seconds): 8227.32
CPU Time in Kernel Mode (Seconds): 1.12
Total CPU Time (Seconds): 8228.44

number of cores: 14
Elapsed Time (Seconds): 592.84
CPU Time in User Mode (Seconds): 8238.65
CPU Time in Kernel Mode (Seconds): 1.01
Total CPU Time (Seconds): 8239.66

number of cores: 15
Elapsed Time (Seconds): 552.40
CPU Time in User Mode (Seconds): 8213.16
CPU Time in Kernel Mode (Seconds): 1.42
Total CPU Time (Seconds): 8214.58

number of cores: 16
Elapsed Time (Seconds): 518.03
CPU Time in User Mode (Seconds): 8243.12
CPU Time in Kernel Mode (Seconds): 1.06
Total CPU Time (Seconds): 8244.18

number of cores: 17
Elapsed Time (Seconds): 489.41
CPU Time in User Mode (Seconds): 8224.98
CPU Time in Kernel Mode (Seconds): 0.95
Total CPU Time (Seconds): 8225.93

number of cores: 18
Elapsed Time (Seconds): 462.68
CPU Time in User Mode (Seconds): 8224.64
CPU Time in Kernel Mode (Seconds): 1.25
Total CPU Time (Seconds): 8225.89

number of cores: 19
Elapsed Time (Seconds): 437.21
CPU Time in User Mode (Seconds): 8221.96
CPU Time in Kernel Mode (Seconds): 0.90
Total CPU Time (Seconds): 8222.86

number of cores: 20
Elapsed Time (Seconds): 417.83
CPU Time in User Mode (Seconds): 8232.24
CPU Time in Kernel Mode (Seconds): 1.26
Total CPU Time (Seconds): 8233.50

number of cores: 21
Elapsed Time (Seconds): 394.01
CPU Time in User Mode (Seconds): 8242.03
CPU Time in Kernel Mode (Seconds): 1.31
Total CPU Time (Seconds): 8243.34

number of cores: 22
Elapsed Time (Seconds): 378.58
CPU Time in User Mode (Seconds): 8245.62
CPU Time in Kernel Mode (Seconds): 1.11
Total CPU Time (Seconds): 8246.73

number of cores: 23
Elapsed Time (Seconds): 363.87
CPU Time in User Mode (Seconds): 8233.26
CPU Time in Kernel Mode (Seconds): 1.08
Total CPU Time (Seconds): 8234.34

number of cores: 24
Elapsed Time (Seconds): 348.13
CPU Time in User Mode (Seconds): 8233.81
CPU Time in Kernel Mode (Seconds): 1.11
Total CPU Time (Seconds): 8234.92

number of cores: 25
Elapsed Time (Seconds): 331.14
CPU Time in User Mode (Seconds): 8244.96
CPU Time in Kernel Mode (Seconds): 1.19
Total CPU Time (Seconds): 8246.15

number of cores: 26
Elapsed Time (Seconds): 321.61
CPU Time in User Mode (Seconds): 8232.24
CPU Time in Kernel Mode (Seconds): 1.31
Total CPU Time (Seconds): 8233.55

number of cores: 27
Elapsed Time (Seconds): 310.99
CPU Time in User Mode (Seconds): 8237.95
CPU Time in Kernel Mode (Seconds): 1.03
Total CPU Time (Seconds): 8238.97

number of cores: 28
Elapsed Time (Seconds): 298.26
CPU Time in User Mode (Seconds): 8247.18
CPU Time in Kernel Mode (Seconds): 1.59
Total CPU Time (Seconds): 8248.77

number of cores: 29
Elapsed Time (Seconds): 286.36
CPU Time in User Mode (Seconds): 8250.89
CPU Time in Kernel Mode (Seconds): 1.20
Total CPU Time (Seconds): 8252.09

number of cores: 30
Elapsed Time (Seconds): 277.34
CPU Time in User Mode (Seconds): 8257.98
CPU Time in Kernel Mode (Seconds): 1.50
Total CPU Time (Seconds): 8259.47

number of cores: 31
Elapsed Time (Seconds): 268.34
CPU Time in User Mode (Seconds): 8251.59
CPU Time in Kernel Mode (Seconds): 1.29
Total CPU Time (Seconds): 8252.89

number of cores: 32
Elapsed Time (Seconds): 259.57
CPU Time in User Mode (Seconds): 8254.29
CPU Time in Kernel Mode (Seconds): 1.76
Total CPU Time (Seconds): 8256.06

number of cores: 33
Elapsed Time (Seconds): 252.72
CPU Time in User Mode (Seconds): 8264.40
CPU Time in Kernel Mode (Seconds): 1.23
Total CPU Time (Seconds): 8265.63

number of cores: 34
Elapsed Time (Seconds): 249.07
CPU Time in User Mode (Seconds): 8238.41
CPU Time in Kernel Mode (Seconds): 1.65
Total CPU Time (Seconds): 8240.07

number of cores: 35
Elapsed Time (Seconds): 241.21
CPU Time in User Mode (Seconds): 8234.68
CPU Time in Kernel Mode (Seconds): 1.39
Total CPU Time (Seconds): 8236.07

number of cores: 36
Elapsed Time (Seconds): 232.65
CPU Time in User Mode (Seconds): 8237.26
CPU Time in Kernel Mode (Seconds): 1.25
Total CPU Time (Seconds): 8238.51

number of cores: 37
Elapsed Time (Seconds): 224.13
CPU Time in User Mode (Seconds): 8263.37
CPU Time in Kernel Mode (Seconds): 1.42
Total CPU Time (Seconds): 8264.79

number of cores: 38
Elapsed Time (Seconds): 223.33
CPU Time in User Mode (Seconds): 8243.84
CPU Time in Kernel Mode (Seconds): 1.19
Total CPU Time (Seconds): 8245.03

number of cores: 39
Elapsed Time (Seconds): 214.83
CPU Time in User Mode (Seconds): 8234.23
CPU Time in Kernel Mode (Seconds): 1.48
Total CPU Time (Seconds): 8235.71

number of cores: 40
Elapsed Time (Seconds): 213.29
CPU Time in User Mode (Seconds): 8250.14
CPU Time in Kernel Mode (Seconds): 1.34
Total CPU Time (Seconds): 8251.49

number of cores: 41
Elapsed Time (Seconds): 205.78
CPU Time in User Mode (Seconds): 8243.42
CPU Time in Kernel Mode (Seconds): 1.64
Total CPU Time (Seconds): 8245.06

number of cores: 42
Elapsed Time (Seconds): 201.71
CPU Time in User Mode (Seconds): 8260.17
CPU Time in Kernel Mode (Seconds): 1.64
Total CPU Time (Seconds): 8261.81

number of cores: 43
Elapsed Time (Seconds): 196.83
CPU Time in User Mode (Seconds): 8250.88
CPU Time in Kernel Mode (Seconds): 1.64
Total CPU Time (Seconds): 8252.51

number of cores: 44
Elapsed Time (Seconds): 193.58
CPU Time in User Mode (Seconds): 8263.03
CPU Time in Kernel Mode (Seconds): 1.54
Total CPU Time (Seconds): 8264.57

number of cores: 45
Elapsed Time (Seconds): 187.93
CPU Time in User Mode (Seconds): 8248.83
CPU Time in Kernel Mode (Seconds): 1.53
Total CPU Time (Seconds): 8250.36

number of cores: 46
Elapsed Time (Seconds): 186.81
CPU Time in User Mode (Seconds): 8261.49
CPU Time in Kernel Mode (Seconds): 1.62
Total CPU Time (Seconds): 8263.11

number of cores: 47
Elapsed Time (Seconds): 179.10
CPU Time in User Mode (Seconds): 8249.79
CPU Time in Kernel Mode (Seconds): 1.62
Total CPU Time (Seconds): 8251.41

number of cores: 48
Elapsed Time (Seconds): 178.71
CPU Time in User Mode (Seconds): 8261.38
CPU Time in Kernel Mode (Seconds): 1.70
Total CPU Time (Seconds): 8263.08

number of cores: 49
Elapsed Time (Seconds): 170.37
CPU Time in User Mode (Seconds): 8265.93
CPU Time in Kernel Mode (Seconds): 1.56
Total CPU Time (Seconds): 8267.49

number of cores: 50
Elapsed Time (Seconds): 170.12
CPU Time in User Mode (Seconds): 8258.32
CPU Time in Kernel Mode (Seconds): 1.65
Total CPU Time (Seconds): 8259.97

number of cores: 51
Elapsed Time (Seconds): 169.51
CPU Time in User Mode (Seconds): 8266.52
CPU Time in Kernel Mode (Seconds): 1.58
Total CPU Time (Seconds): 8268.10

number of cores: 52
Elapsed Time (Seconds): 161.40
CPU Time in User Mode (Seconds): 8271.14
CPU Time in Kernel Mode (Seconds): 1.76
Total CPU Time (Seconds): 8272.90

number of cores: 53
Elapsed Time (Seconds): 161.12
CPU Time in User Mode (Seconds): 8271.86
CPU Time in Kernel Mode (Seconds): 1.67
Total CPU Time (Seconds): 8273.53

number of cores: 54
Elapsed Time (Seconds): 160.59
CPU Time in User Mode (Seconds): 8262.76
CPU Time in Kernel Mode (Seconds): 1.72
Total CPU Time (Seconds): 8264.48

number of cores: 55
Elapsed Time (Seconds): 152.49
CPU Time in User Mode (Seconds): 8268.58
CPU Time in Kernel Mode (Seconds): 1.75
Total CPU Time (Seconds): 8270.33

number of cores: 56
Elapsed Time (Seconds): 152.29
CPU Time in User Mode (Seconds): 8265.14
CPU Time in Kernel Mode (Seconds): 2.01
Total CPU Time (Seconds): 8267.15

number of cores: 57
Elapsed Time (Seconds): 151.95
CPU Time in User Mode (Seconds): 8264.31
CPU Time in Kernel Mode (Seconds): 1.67
Total CPU Time (Seconds): 8265.98

number of cores: 58
Elapsed Time (Seconds): 143.72
CPU Time in User Mode (Seconds): 8273.98
CPU Time in Kernel Mode (Seconds): 1.89
Total CPU Time (Seconds): 8275.87

number of cores: 59
Elapsed Time (Seconds): 143.41
CPU Time in User Mode (Seconds): 8259.55
CPU Time in Kernel Mode (Seconds): 1.83
Total CPU Time (Seconds): 8261.38

number of cores: 60
Elapsed Time (Seconds): 143.26
CPU Time in User Mode (Seconds): 8262.69
CPU Time in Kernel Mode (Seconds): 1.97
Total CPU Time (Seconds): 8264.65

number of cores: 61
Elapsed Time (Seconds): 142.90
CPU Time in User Mode (Seconds): 8257.69
CPU Time in Kernel Mode (Seconds): 1.93
Total CPU Time (Seconds): 8259.63

number of cores: 62
Elapsed Time (Seconds): 134.61
CPU Time in User Mode (Seconds): 8264.65
CPU Time in Kernel Mode (Seconds): 1.89
Total CPU Time (Seconds): 8266.54

number of cores: 63
Elapsed Time (Seconds): 134.47
CPU Time in User Mode (Seconds): 8268.13
CPU Time in Kernel Mode (Seconds): 1.68
Total CPU Time (Seconds): 8269.82

number of cores: 64
Elapsed Time (Seconds): 134.36
CPU Time in User Mode (Seconds): 8265.95
CPU Time in Kernel Mode (Seconds): 2.15
Total CPU Time (Seconds): 8268.10

From the above, it can be seen elapsed time was reduced proportionally to the reciprocal of the number of cores used. For example, one core took 8062.54 seconds to complete the computing, and two cores cut the elapsed time into 4041.61 seconds, and three cores took 2715.17 seconds to complete the job, and so on.

64 cores completed the computing in 134.36 seconds, which is about 60 times faster than the speed by one core. In this testing example, 64 cores yielded a 60x speedup. We cannot see such highly efficient performance very often. As stated in the beginning, parallel performance is case by case. We cannot expect which computations could yield a 60x speedup on 64 cores.

This testing example also demonstrates a benefit of multicore application. One core took 8062.54 seconds, which is about 2 hours and 15 minutes; While 64 cores completed the computing in about 2 minutes. A 2-hour-and-15-minute job can be completed on 64 cores in 2 minutes. There is no reason to reject multicore. In the following, we are going to see parallel speedup and efficiency.

SPEEDUP AND EFFICIENCY

Speedup and efficiency is summarized in the following table. The first column is number of cores; The second column is elapsed time in seconds; The third column is parallel speedup. From the following table, it can be seen that the performance yielded an almost linear speedup. On 64 cores, the speed was improved to 60x. This kind of highly efficient performance cannot be seen very often.

The fourth column is parallel efficiency. It also shows that 64 cores yielded a parallel efficiency about 94%. The following table summarized the set of parallel performance.

 Numberof Cores ElapsedTime (sec) Speedup Efficiency(%) 1 8062.54 1.0000 100.00 2 4041.61 1.9949 99.74 3 2715.17 2.9694 98.98 4 2036.36 3.9593 98.98 5 1639.68 4.9171 98.34 6 1365.85 5.9029 98.38 7 1175.58 6.8584 97.98 8 1032.18 7.8112 97.64 9 917.29 8.7895 97.66 10 827.82 9.7395 97.39 11 749.87 10.7519 97.74 12 691.04 11.6673 97.23 13 635.53 12.6863 97.59 14 592.84 13.5999 97.14 15 552.40 14.5955 97.30 16 518.03 15.5638 97.27 17 489.41 16.4740 96.91 18 462.68 17.4257 96.81 19 437.21 18.4409 97.06 20 417.83 19.2962 96.48 21 394.01 20.4628 97.44 22 378.58 21.2968 96.80 23 363.87 22.1577 96.34 24 348.13 23.1596 96.50 25 331.14 24.3478 97.39 26 321.61 25.0693 96.42 27 310.99 25.9254 96.02 28 298.26 27.0319 96.54 29 286.36 28.1553 97.09 30 277.34 29.0710 96.90 31 268.34 30.0460 96.92 32 259.57 31.0611 97.07 33 252.72 31.9031 96.68 34 249.07 32.3706 95.21 35 241.21 33.4254 95.50 36 232.65 34.6552 96.26 37 224.13 35.9726 97.22 38 223.33 36.1015 95.00 39 214.83 37.5299 96.23 40 213.29 37.8008 94.50 41 205.78 39.1804 95.56 42 201.71 39.9709 95.17 43 196.83 40.9619 95.26 44 193.58 41.6497 94.66 45 187.93 42.9018 95.34 46 186.81 43.1590 93.82 47 179.10 45.0170 95.78 48 178.71 45.1152 93.99 49 170.37 47.3237 96.58 50 170.12 47.3933 94.79 51 169.51 47.5638 93.26 52 161.40 49.9538 96.06 53 161.12 50.0406 94.42 54 160.59 50.2057 92.97 55 152.49 52.8726 96.13 56 152.29 52.9420 94.54 57 151.95 53.0605 93.09 58 143.72 56.0989 96.72 59 143.41 56.2202 95.29 60 143.26 56.2791 93.80 61 142.90 56.4209 92.49 62 134.61 59.8956 96.61 63 134.47 59.9579 95.17 64 134.36 60.0070 93.76