Equation Solution
High Performance by Design
 Page: 3   Implementation of Constant-Bandwidth Solver of LAIPE2 on 48 Cores   Parallel Performance of Skyline Solver on Soft Cores (3)   Parallel Performance of Skyline Solver on Soft Cores (2)   Parallel Performance of Skyline Solver on Soft Cores (I)   Parallel Matrix Multiplication on Soft Cores

Parallel Performance of Skyline Solver on Soft Cores (2)

[Posted by Jenn-Ching Luo on Mar. 24, 2012 ]

There are two options to implement soft core computing. The first option is to let operating system places soft cores to balance workload on physical cores. Multiple soft cores may share a physical core. The performance in Part (I) is within the scope of the first option.

The second option is to stay soft core on the physical core where it was assigned by calling the function nlp\$staySoftCore.

A few days ago, this writer attempted to see the difference in performance between those two options. The first option has been implemented in Part (I). This writer used the same example to time the function laipe\$decompose_vsp_8 for a comparison. The timing results with the second option, as follows, also show an almost linear speedup, up to 95% efficiency.

 [A physical core is set for a specified soft core] number of cores elapsed time (sec.) speedup efficiency (%) 1 243.19 1.00 100.00 2 121.90 1.99 99.75 3 81.53 2.98 99.43 4 61.53 3.95 98.81 5 49.51 4.91 98.24 6 41.56 5.85 97.53 7 36.22 6.71 95.92 8 31.87 7.63 95.38

However, as compared with the performance where operating system balanced workload on physical cores and dynamically scheduled soft cores onto physical cores [copied from Part (I) in the following], we can see, when operating system balances workload among physical cores, soft cores run slightly faster. That contradicts to this writer's prediction. This writer thought setting a physical core for a particular soft core may run more efficiently. However, the timing results show the opposite in the example.

 [Operating System balances workload among physical cores] number of cores elapsed time (sec.) speedup efficiency (%) 1 242.93 1.00 100.00 2 121.82 1.99 99.71 3 81.48 2.98 99.38 4 61.49 3.95 98.77 5 49.61 4.90 97.94 6 41.72 5.82 97.05 7 36.05 6.74 96.27 8 31.84 7.63 95.37