Equation Solution High Performance by Design |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Parallel Performance of laipe$Decompose_DAG_8 on 48 Cores
[Posted by Jenn-Ching Luo on Feb. 18, 2016 ]
This post has a set of parallel performance of the LAIPE2 subroutine laipe$Decompose_DAG_8 on 48 cores. The subroutine laipe$Decompose_DAG_8 decomposes a 8-byte dense matrix [A] into [L][U]. Testing example is a 8-byte dense matrix of order 20,000-by-20,000.
This example was implemented on homogeneous softcore of neuLoop. A Dell PowerEdge R815 with quad 1.9GHZ 12-core Opterons on Windows Server 2008, a total of 48 cores, implemented the computing. Compiler is gfortran. Timing result is listed as follows. Timing Result
Core: 1
Elapsed Time (Seconds): 8943.55 CPU Time in User Mode (Seconds): 8943.30 CPU Time in Kernel Mode (Seconds): 0.25 Total CPU Time (Seconds): 8943.55 Cores: 2 Elapsed Time (Seconds): 4031.88 CPU Time in User Mode (Seconds): 8039.15 CPU Time in Kernel Mode (Seconds): 0.42 Total CPU Time (Seconds): 8039.57 Cores: 3 Elapsed Time (Seconds): 2692.89 CPU Time in User Mode (Seconds): 8024.25 CPU Time in Kernel Mode (Seconds): 0.39 Total CPU Time (Seconds): 8024.65 Cores: 4 Elapsed Time (Seconds): 2025.16 CPU Time in User Mode (Seconds): 8023.21 CPU Time in Kernel Mode (Seconds): 0.36 Total CPU Time (Seconds): 8023.57 Cores: 5 Elapsed Time (Seconds): 1627.99 CPU Time in User Mode (Seconds): 8025.97 CPU Time in Kernel Mode (Seconds): 0.59 Total CPU Time (Seconds): 8026.56 Cores: 6 Elapsed Time (Seconds): 1362.15 CPU Time in User Mode (Seconds): 8029.42 CPU Time in Kernel Mode (Seconds): 0.84 Total CPU Time (Seconds): 8030.26 Cores: 7 Elapsed Time (Seconds): 1185.90 CPU Time in User Mode (Seconds): 8126.01 CPU Time in Kernel Mode (Seconds): 0.67 Total CPU Time (Seconds): 8126.69 Cores: 8 Elapsed Time (Seconds): 1051.68 CPU Time in User Mode (Seconds): 8204.19 CPU Time in Kernel Mode (Seconds): 0.81 Total CPU Time (Seconds): 8205.00 Cores: 9 Elapsed Time (Seconds): 948.00 CPU Time in User Mode (Seconds): 8281.95 CPU Time in Kernel Mode (Seconds): 0.84 Total CPU Time (Seconds): 8282.80 Cores: 10 Elapsed Time (Seconds): 861.08 CPU Time in User Mode (Seconds): 8331.50 CPU Time in Kernel Mode (Seconds): 0.98 Total CPU Time (Seconds): 8332.48 Cores: 11 Elapsed Time (Seconds): 790.88 CPU Time in User Mode (Seconds): 8386.71 CPU Time in Kernel Mode (Seconds): 1.05 Total CPU Time (Seconds): 8387.75 Cores: 12 Elapsed Time (Seconds): 733.28 CPU Time in User Mode (Seconds): 8449.59 CPU Time in Kernel Mode (Seconds): 1.56 Total CPU Time (Seconds): 8451.15 Cores: 13 Elapsed Time (Seconds): 687.06 CPU Time in User Mode (Seconds): 8545.05 CPU Time in Kernel Mode (Seconds): 2.64 Total CPU Time (Seconds): 8547.69 Cores: 14 Elapsed Time (Seconds): 646.69 CPU Time in User Mode (Seconds): 8628.88 CPU Time in Kernel Mode (Seconds): 2.76 Total CPU Time (Seconds): 8631.64 Cores: 15 Elapsed Time (Seconds): 609.98 CPU Time in User Mode (Seconds): 8693.11 CPU Time in Kernel Mode (Seconds): 2.07 Total CPU Time (Seconds): 8695.18 Cores: 16 Elapsed Time (Seconds): 579.37 CPU Time in User Mode (Seconds): 8759.64 CPU Time in Kernel Mode (Seconds): 1.06 Total CPU Time (Seconds): 8760.70 Cores: 17 Elapsed Time (Seconds): 552.65 CPU Time in User Mode (Seconds): 8841.06 CPU Time in Kernel Mode (Seconds): 1.68 Total CPU Time (Seconds): 8842.75 Cores: 18 Elapsed Time (Seconds): 527.42 CPU Time in User Mode (Seconds): 8907.00 CPU Time in Kernel Mode (Seconds): 2.50 Total CPU Time (Seconds): 8909.50 Cores: 19 Elapsed Time (Seconds): 503.40 CPU Time in User Mode (Seconds): 8940.46 CPU Time in Kernel Mode (Seconds): 2.59 Total CPU Time (Seconds): 8943.05 Cores: 20 Elapsed Time (Seconds): 481.00 CPU Time in User Mode (Seconds): 8945.30 CPU Time in Kernel Mode (Seconds): 2.64 Total CPU Time (Seconds): 8947.94 Cores: 21 Elapsed Time (Seconds): 462.15 CPU Time in User Mode (Seconds): 8987.75 CPU Time in Kernel Mode (Seconds): 2.43 Total CPU Time (Seconds): 8990.18 Cores: 22 Elapsed Time (Seconds): 444.67 CPU Time in User Mode (Seconds): 9001.79 CPU Time in Kernel Mode (Seconds): 2.06 Total CPU Time (Seconds): 9003.85 Cores: 23 Elapsed Time (Seconds): 430.80 CPU Time in User Mode (Seconds): 9080.46 CPU Time in Kernel Mode (Seconds): 2.76 Total CPU Time (Seconds): 9083.22 Cores: 24 Elapsed Time (Seconds): 417.68 CPU Time in User Mode (Seconds): 9146.51 CPU Time in Kernel Mode (Seconds): 2.92 Total CPU Time (Seconds): 9149.43 Cores: 25 Elapsed Time (Seconds): 403.65 CPU Time in User Mode (Seconds): 9175.43 CPU Time in Kernel Mode (Seconds): 2.57 Total CPU Time (Seconds): 9178.01 Cores: 26 Elapsed Time (Seconds): 392.51 CPU Time in User Mode (Seconds): 9221.62 CPU Time in Kernel Mode (Seconds): 2.03 Total CPU Time (Seconds): 9223.65 Cores: 27 Elapsed Time (Seconds): 382.40 CPU Time in User Mode (Seconds): 9282.06 CPU Time in Kernel Mode (Seconds): 3.40 Total CPU Time (Seconds): 9285.46 Cores: 28 Elapsed Time (Seconds): 372.70 CPU Time in User Mode (Seconds): 9336.22 CPU Time in Kernel Mode (Seconds): 3.32 Total CPU Time (Seconds): 9339.55 Cores: 29 Elapsed Time (Seconds): 363.84 CPU Time in User Mode (Seconds): 9396.42 CPU Time in Kernel Mode (Seconds): 3.57 Total CPU Time (Seconds): 9400.00 Cores: 30 Elapsed Time (Seconds): 356.74 CPU Time in User Mode (Seconds): 9496.44 CPU Time in Kernel Mode (Seconds): 2.84 Total CPU Time (Seconds): 9499.28 Cores: 31 Elapsed Time (Seconds): 349.85 CPU Time in User Mode (Seconds): 9567.93 CPU Time in Kernel Mode (Seconds): 2.65 Total CPU Time (Seconds): 9570.58 Cores: 32 Elapsed Time (Seconds): 344.43 CPU Time in User Mode (Seconds): 9673.73 CPU Time in Kernel Mode (Seconds): 3.57 Total CPU Time (Seconds): 9677.30 Cores: 33 Elapsed Time (Seconds): 336.04 CPU Time in User Mode (Seconds): 9700.64 CPU Time in Kernel Mode (Seconds): 4.01 Total CPU Time (Seconds): 9704.65 Cores: 34 Elapsed Time (Seconds): 330.35 CPU Time in User Mode (Seconds): 9759.14 CPU Time in Kernel Mode (Seconds): 4.43 Total CPU Time (Seconds): 9763.57 Cores: 35 Elapsed Time (Seconds): 326.24 CPU Time in User Mode (Seconds): 9864.89 CPU Time in Kernel Mode (Seconds): 4.20 Total CPU Time (Seconds): 9869.09 Cores: 36 Elapsed Time (Seconds): 321.58 CPU Time in User Mode (Seconds): 10004.19 CPU Time in Kernel Mode (Seconds): 4.66 Total CPU Time (Seconds): 10008.85 Cores: 37 Elapsed Time (Seconds): 318.38 CPU Time in User Mode (Seconds): 10096.28 CPU Time in Kernel Mode (Seconds): 5.19 Total CPU Time (Seconds): 10101.47 Cores: 38 Elapsed Time (Seconds): 315.51 CPU Time in User Mode (Seconds): 10221.64 CPU Time in Kernel Mode (Seconds): 4.41 Total CPU Time (Seconds): 10226.05 Cores: 39 Elapsed Time (Seconds): 310.39 CPU Time in User Mode (Seconds): 10272.34 CPU Time in Kernel Mode (Seconds): 4.73 Total CPU Time (Seconds): 10277.06 Cores: 40 Elapsed Time (Seconds): 309.29 CPU Time in User Mode (Seconds): 10455.56 CPU Time in Kernel Mode (Seconds): 4.71 Total CPU Time (Seconds): 10460.27 Cores: 41 Elapsed Time (Seconds): 307.68 CPU Time in User Mode (Seconds): 10595.63 CPU Time in Kernel Mode (Seconds): 4.90 Total CPU Time (Seconds): 10600.53 Cores: 42 Elapsed Time (Seconds): 304.22 CPU Time in User Mode (Seconds): 10695.66 CPU Time in Kernel Mode (Seconds): 4.37 Total CPU Time (Seconds): 10700.03 Cores: 43 Elapsed Time (Seconds): 304.59 CPU Time in User Mode (Seconds): 10885.03 CPU Time in Kernel Mode (Seconds): 5.79 Total CPU Time (Seconds): 10890.82 Cores: 44 Elapsed Time (Seconds): 302.02 CPU Time in User Mode (Seconds): 10985.79 CPU Time in Kernel Mode (Seconds): 5.40 Total CPU Time (Seconds): 10991.19 Cores: 45 Elapsed Time (Seconds): 301.17 CPU Time in User Mode (Seconds): 11173.10 CPU Time in Kernel Mode (Seconds): 5.88 Total CPU Time (Seconds): 11178.98 Cores: 46 Elapsed Time (Seconds): 299.02 CPU Time in User Mode (Seconds): 11305.94 CPU Time in Kernel Mode (Seconds): 5.18 Total CPU Time (Seconds): 11311.12 Cores: 47 Elapsed Time (Seconds): 297.96 CPU Time in User Mode (Seconds): 11454.33 CPU Time in Kernel Mode (Seconds): 5.73 Total CPU Time (Seconds): 11460.05 Cores: 48 Elapsed Time (Seconds): 298.57 CPU Time in User Mode (Seconds): 11601.56 CPU Time in Kernel Mode (Seconds): 5.69 Total CPU Time (Seconds): 11607.25 Timing result includes elapsed time, CPU time in user mode, CPU time in kernel mode, and total CPU time. In general, total CPU time increases with number of cores, except when a superlinearity occurs. The reason is straightforward. More cores always maintain more local caches. More local caches would request more demands to keep the consistency of shared resource, which drives up the costs. That is the reason why total CPU time would increase with number of cores in general. From the above, we could see superlinearities. For example, one core took 8943.55 seconds of the total CPU time; Two cores took 8039.57 seconds, an occurrence of superlinearity; While, 48 cores took 11607.25 seconds of total CPU time. In general, total CPU time would increase with number of cores. Elapsed time measures speedup. Speedup and efficiency are listed in the following. Speedup and Efficiency
From the above table, speedup could be seen in the third column. The computing kept speeding up with more cores enabled. It also can be seen that, up to 23 cores, the subroutine laipe$Decompose_DAG_8 could yield an efficiency over 90%. It sped up to 21X on 23 cores. Further, it also can be seen that, with 33 cores, the example could yield an efficiency 80.65%; With 41 cores it could yield an efficiency 70.00%; With 48 cores it could yield an efficiency 62.41% |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||