Equation Solution High Performance by Design |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
How Fast 64 Cores Can Improve
[Posted by Jenn-Ching Luo on June 2, 2016 ]
This post presents a set of timing results for an answer how fast 64 cores can improve. A brief answer is that 64 cores can improve the computing speed up to 60x as compared with the speed by one core. In the following, we are going to see the parallel performance.
TESTING EXAMPLE
Perform [C]=[A][B], where matrices [A], [B] and [C] are 16-byte real matrix. Matrix [A] is of order (4700-by-3600), and matrix [B] is of order (3600-by-3700), and Matrix [C] is of order (4700-by-3700).
COMPUTING ENVIRONMENT
Computer: It is a Dell PoweredEdge R815 with quad Opteron 6276, 64 cores. Because Opteron 6276 can run at a higher frequency when using 8 or fewer cores. For the purpose of measuring parallel performance, the test disabled processor's turbo boost. It becomes slower after disabling turbo boost.
Operating System: Windows Server 2008 R2
Compiler: gfortran with optimization -O3; The application links against neuloop4 for parallel processing.
Subroutine: laipe$matmul_16 which performs matrix multiplication in parallel
TIMING RESULT Timing results include "elapsed time," "CPU time in user mode," "CPU time in kernel mode," and "total CPU time." The timing result is as follows.
number of cores: 1
Elapsed Time (Seconds): 8062.54 CPU Time in User Mode (Seconds): 8061.76 CPU Time in Kernel Mode (Seconds): 0.72 Total CPU Time (Seconds): 8062.48 number of cores: 2 Elapsed Time (Seconds): 4041.61 CPU Time in User Mode (Seconds): 8076.83 CPU Time in Kernel Mode (Seconds): 1.09 Total CPU Time (Seconds): 8077.92 number of cores: 3 Elapsed Time (Seconds): 2715.17 CPU Time in User Mode (Seconds): 8136.34 CPU Time in Kernel Mode (Seconds): 0.84 Total CPU Time (Seconds): 8137.18 number of cores: 4 Elapsed Time (Seconds): 2036.36 CPU Time in User Mode (Seconds): 8131.47 CPU Time in Kernel Mode (Seconds): 0.90 Total CPU Time (Seconds): 8132.38 number of cores: 5 Elapsed Time (Seconds): 1639.68 CPU Time in User Mode (Seconds): 8180.47 CPU Time in Kernel Mode (Seconds): 1.14 Total CPU Time (Seconds): 8181.61 number of cores: 6 Elapsed Time (Seconds): 1365.85 CPU Time in User Mode (Seconds): 8173.42 CPU Time in Kernel Mode (Seconds): 0.94 Total CPU Time (Seconds): 8174.36 number of cores: 7 Elapsed Time (Seconds): 1175.58 CPU Time in User Mode (Seconds): 8204.72 CPU Time in Kernel Mode (Seconds): 1.05 Total CPU Time (Seconds): 8205.76 number of cores: 8 Elapsed Time (Seconds): 1032.18 CPU Time in User Mode (Seconds): 8221.86 CPU Time in Kernel Mode (Seconds): 1.03 Total CPU Time (Seconds): 8222.89 number of cores: 9 Elapsed Time (Seconds): 917.29 CPU Time in User Mode (Seconds): 8215.04 CPU Time in Kernel Mode (Seconds): 0.81 Total CPU Time (Seconds): 8215.86 number of cores: 10 Elapsed Time (Seconds): 827.82 CPU Time in User Mode (Seconds): 8231.94 CPU Time in Kernel Mode (Seconds): 0.89 Total CPU Time (Seconds): 8232.83 number of cores: 11 Elapsed Time (Seconds): 749.87 CPU Time in User Mode (Seconds): 8222.45 CPU Time in Kernel Mode (Seconds): 1.26 Total CPU Time (Seconds): 8223.72 number of cores: 12 Elapsed Time (Seconds): 691.04 CPU Time in User Mode (Seconds): 8236.59 CPU Time in Kernel Mode (Seconds): 1.05 Total CPU Time (Seconds): 8237.63 number of cores: 13 Elapsed Time (Seconds): 635.53 CPU Time in User Mode (Seconds): 8227.32 CPU Time in Kernel Mode (Seconds): 1.12 Total CPU Time (Seconds): 8228.44 number of cores: 14 Elapsed Time (Seconds): 592.84 CPU Time in User Mode (Seconds): 8238.65 CPU Time in Kernel Mode (Seconds): 1.01 Total CPU Time (Seconds): 8239.66 number of cores: 15 Elapsed Time (Seconds): 552.40 CPU Time in User Mode (Seconds): 8213.16 CPU Time in Kernel Mode (Seconds): 1.42 Total CPU Time (Seconds): 8214.58 number of cores: 16 Elapsed Time (Seconds): 518.03 CPU Time in User Mode (Seconds): 8243.12 CPU Time in Kernel Mode (Seconds): 1.06 Total CPU Time (Seconds): 8244.18 number of cores: 17 Elapsed Time (Seconds): 489.41 CPU Time in User Mode (Seconds): 8224.98 CPU Time in Kernel Mode (Seconds): 0.95 Total CPU Time (Seconds): 8225.93 number of cores: 18 Elapsed Time (Seconds): 462.68 CPU Time in User Mode (Seconds): 8224.64 CPU Time in Kernel Mode (Seconds): 1.25 Total CPU Time (Seconds): 8225.89 number of cores: 19 Elapsed Time (Seconds): 437.21 CPU Time in User Mode (Seconds): 8221.96 CPU Time in Kernel Mode (Seconds): 0.90 Total CPU Time (Seconds): 8222.86 number of cores: 20 Elapsed Time (Seconds): 417.83 CPU Time in User Mode (Seconds): 8232.24 CPU Time in Kernel Mode (Seconds): 1.26 Total CPU Time (Seconds): 8233.50 number of cores: 21 Elapsed Time (Seconds): 394.01 CPU Time in User Mode (Seconds): 8242.03 CPU Time in Kernel Mode (Seconds): 1.31 Total CPU Time (Seconds): 8243.34 number of cores: 22 Elapsed Time (Seconds): 378.58 CPU Time in User Mode (Seconds): 8245.62 CPU Time in Kernel Mode (Seconds): 1.11 Total CPU Time (Seconds): 8246.73 number of cores: 23 Elapsed Time (Seconds): 363.87 CPU Time in User Mode (Seconds): 8233.26 CPU Time in Kernel Mode (Seconds): 1.08 Total CPU Time (Seconds): 8234.34 number of cores: 24 Elapsed Time (Seconds): 348.13 CPU Time in User Mode (Seconds): 8233.81 CPU Time in Kernel Mode (Seconds): 1.11 Total CPU Time (Seconds): 8234.92 number of cores: 25 Elapsed Time (Seconds): 331.14 CPU Time in User Mode (Seconds): 8244.96 CPU Time in Kernel Mode (Seconds): 1.19 Total CPU Time (Seconds): 8246.15 number of cores: 26 Elapsed Time (Seconds): 321.61 CPU Time in User Mode (Seconds): 8232.24 CPU Time in Kernel Mode (Seconds): 1.31 Total CPU Time (Seconds): 8233.55 number of cores: 27 Elapsed Time (Seconds): 310.99 CPU Time in User Mode (Seconds): 8237.95 CPU Time in Kernel Mode (Seconds): 1.03 Total CPU Time (Seconds): 8238.97 number of cores: 28 Elapsed Time (Seconds): 298.26 CPU Time in User Mode (Seconds): 8247.18 CPU Time in Kernel Mode (Seconds): 1.59 Total CPU Time (Seconds): 8248.77 number of cores: 29 Elapsed Time (Seconds): 286.36 CPU Time in User Mode (Seconds): 8250.89 CPU Time in Kernel Mode (Seconds): 1.20 Total CPU Time (Seconds): 8252.09 number of cores: 30 Elapsed Time (Seconds): 277.34 CPU Time in User Mode (Seconds): 8257.98 CPU Time in Kernel Mode (Seconds): 1.50 Total CPU Time (Seconds): 8259.47 number of cores: 31 Elapsed Time (Seconds): 268.34 CPU Time in User Mode (Seconds): 8251.59 CPU Time in Kernel Mode (Seconds): 1.29 Total CPU Time (Seconds): 8252.89 number of cores: 32 Elapsed Time (Seconds): 259.57 CPU Time in User Mode (Seconds): 8254.29 CPU Time in Kernel Mode (Seconds): 1.76 Total CPU Time (Seconds): 8256.06 number of cores: 33 Elapsed Time (Seconds): 252.72 CPU Time in User Mode (Seconds): 8264.40 CPU Time in Kernel Mode (Seconds): 1.23 Total CPU Time (Seconds): 8265.63 number of cores: 34 Elapsed Time (Seconds): 249.07 CPU Time in User Mode (Seconds): 8238.41 CPU Time in Kernel Mode (Seconds): 1.65 Total CPU Time (Seconds): 8240.07 number of cores: 35 Elapsed Time (Seconds): 241.21 CPU Time in User Mode (Seconds): 8234.68 CPU Time in Kernel Mode (Seconds): 1.39 Total CPU Time (Seconds): 8236.07 number of cores: 36 Elapsed Time (Seconds): 232.65 CPU Time in User Mode (Seconds): 8237.26 CPU Time in Kernel Mode (Seconds): 1.25 Total CPU Time (Seconds): 8238.51 number of cores: 37 Elapsed Time (Seconds): 224.13 CPU Time in User Mode (Seconds): 8263.37 CPU Time in Kernel Mode (Seconds): 1.42 Total CPU Time (Seconds): 8264.79 number of cores: 38 Elapsed Time (Seconds): 223.33 CPU Time in User Mode (Seconds): 8243.84 CPU Time in Kernel Mode (Seconds): 1.19 Total CPU Time (Seconds): 8245.03 number of cores: 39 Elapsed Time (Seconds): 214.83 CPU Time in User Mode (Seconds): 8234.23 CPU Time in Kernel Mode (Seconds): 1.48 Total CPU Time (Seconds): 8235.71 number of cores: 40 Elapsed Time (Seconds): 213.29 CPU Time in User Mode (Seconds): 8250.14 CPU Time in Kernel Mode (Seconds): 1.34 Total CPU Time (Seconds): 8251.49 number of cores: 41 Elapsed Time (Seconds): 205.78 CPU Time in User Mode (Seconds): 8243.42 CPU Time in Kernel Mode (Seconds): 1.64 Total CPU Time (Seconds): 8245.06 number of cores: 42 Elapsed Time (Seconds): 201.71 CPU Time in User Mode (Seconds): 8260.17 CPU Time in Kernel Mode (Seconds): 1.64 Total CPU Time (Seconds): 8261.81 number of cores: 43 Elapsed Time (Seconds): 196.83 CPU Time in User Mode (Seconds): 8250.88 CPU Time in Kernel Mode (Seconds): 1.64 Total CPU Time (Seconds): 8252.51 number of cores: 44 Elapsed Time (Seconds): 193.58 CPU Time in User Mode (Seconds): 8263.03 CPU Time in Kernel Mode (Seconds): 1.54 Total CPU Time (Seconds): 8264.57 number of cores: 45 Elapsed Time (Seconds): 187.93 CPU Time in User Mode (Seconds): 8248.83 CPU Time in Kernel Mode (Seconds): 1.53 Total CPU Time (Seconds): 8250.36 number of cores: 46 Elapsed Time (Seconds): 186.81 CPU Time in User Mode (Seconds): 8261.49 CPU Time in Kernel Mode (Seconds): 1.62 Total CPU Time (Seconds): 8263.11 number of cores: 47 Elapsed Time (Seconds): 179.10 CPU Time in User Mode (Seconds): 8249.79 CPU Time in Kernel Mode (Seconds): 1.62 Total CPU Time (Seconds): 8251.41 number of cores: 48 Elapsed Time (Seconds): 178.71 CPU Time in User Mode (Seconds): 8261.38 CPU Time in Kernel Mode (Seconds): 1.70 Total CPU Time (Seconds): 8263.08 number of cores: 49 Elapsed Time (Seconds): 170.37 CPU Time in User Mode (Seconds): 8265.93 CPU Time in Kernel Mode (Seconds): 1.56 Total CPU Time (Seconds): 8267.49 number of cores: 50 Elapsed Time (Seconds): 170.12 CPU Time in User Mode (Seconds): 8258.32 CPU Time in Kernel Mode (Seconds): 1.65 Total CPU Time (Seconds): 8259.97 number of cores: 51 Elapsed Time (Seconds): 169.51 CPU Time in User Mode (Seconds): 8266.52 CPU Time in Kernel Mode (Seconds): 1.58 Total CPU Time (Seconds): 8268.10 number of cores: 52 Elapsed Time (Seconds): 161.40 CPU Time in User Mode (Seconds): 8271.14 CPU Time in Kernel Mode (Seconds): 1.76 Total CPU Time (Seconds): 8272.90 number of cores: 53 Elapsed Time (Seconds): 161.12 CPU Time in User Mode (Seconds): 8271.86 CPU Time in Kernel Mode (Seconds): 1.67 Total CPU Time (Seconds): 8273.53 number of cores: 54 Elapsed Time (Seconds): 160.59 CPU Time in User Mode (Seconds): 8262.76 CPU Time in Kernel Mode (Seconds): 1.72 Total CPU Time (Seconds): 8264.48 number of cores: 55 Elapsed Time (Seconds): 152.49 CPU Time in User Mode (Seconds): 8268.58 CPU Time in Kernel Mode (Seconds): 1.75 Total CPU Time (Seconds): 8270.33 number of cores: 56 Elapsed Time (Seconds): 152.29 CPU Time in User Mode (Seconds): 8265.14 CPU Time in Kernel Mode (Seconds): 2.01 Total CPU Time (Seconds): 8267.15 number of cores: 57 Elapsed Time (Seconds): 151.95 CPU Time in User Mode (Seconds): 8264.31 CPU Time in Kernel Mode (Seconds): 1.67 Total CPU Time (Seconds): 8265.98 number of cores: 58 Elapsed Time (Seconds): 143.72 CPU Time in User Mode (Seconds): 8273.98 CPU Time in Kernel Mode (Seconds): 1.89 Total CPU Time (Seconds): 8275.87 number of cores: 59 Elapsed Time (Seconds): 143.41 CPU Time in User Mode (Seconds): 8259.55 CPU Time in Kernel Mode (Seconds): 1.83 Total CPU Time (Seconds): 8261.38 number of cores: 60 Elapsed Time (Seconds): 143.26 CPU Time in User Mode (Seconds): 8262.69 CPU Time in Kernel Mode (Seconds): 1.97 Total CPU Time (Seconds): 8264.65 number of cores: 61 Elapsed Time (Seconds): 142.90 CPU Time in User Mode (Seconds): 8257.69 CPU Time in Kernel Mode (Seconds): 1.93 Total CPU Time (Seconds): 8259.63 number of cores: 62 Elapsed Time (Seconds): 134.61 CPU Time in User Mode (Seconds): 8264.65 CPU Time in Kernel Mode (Seconds): 1.89 Total CPU Time (Seconds): 8266.54 number of cores: 63 Elapsed Time (Seconds): 134.47 CPU Time in User Mode (Seconds): 8268.13 CPU Time in Kernel Mode (Seconds): 1.68 Total CPU Time (Seconds): 8269.82 number of cores: 64 Elapsed Time (Seconds): 134.36 CPU Time in User Mode (Seconds): 8265.95 CPU Time in Kernel Mode (Seconds): 2.15 Total CPU Time (Seconds): 8268.10 The above shows that it can reduce elapsed time proportionally to the reciprocal of the number of cores used. For example, one core took 8062.54 seconds to complete the computing, and two cores cut the elapsed time into 4041.61 seconds, and three cores took 2715.17 seconds to complete the job, and so on. 64 cores completed the computing in 134.36 seconds, which is about 60 times faster than the speed by one core. In this testing example, 64 cores yielded a 60x speedup. We cannot see such highly efficient performance very often. As stated in the beginning, parallel performance is case by case. We cannot expect which computations could yield a 60x speedup on 64 cores. This testing example also shows a benefit of multicore application. One core took 8062.54 seconds, which is about 2 hours and 15 minutes; While 64 cores completed the computing in about 2 minutes. 64 cores can complete a 2-hour-and-15-minute job in 2 minutes. There is no reason to reject multicore. In the following, we are going to see parallel speedup and efficiency. SPEEDUP AND EFFICIENCY The following table summarizes speedup and efficiency. Number of cores is in the first column; Elapsed time in seconds is in the second column; The third column is parallel speedup. From the following table, it shows that the performance yielded an almost linear speedup. 64 cores improve the speed to 60x. We cannot often see this kind of highly efficient performance. Parallel efficiency is in the fourth column. It also shows that 64 cores yielded a parallel efficiency about 94%. The following table summarized the set of parallel performance.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||