Is parallel computing easy?

Is Parallel Computing Easy?

[Posted by Jenn-Ching Luo on May 10, 2011 ]

        This blog is showing a serious of performance tests of grandpa LAIPE on multicore computer. Some readers may have seen too many parallel performances of grandpa LAIPE. This post changes the flavor. We are going to see performances implemented by others. That could give us more sources how much "real applications" can speed up on multicore computers.

Real Application

        Here uses the words, "real applications". We are not going to see how a loop is parallelized. The current topic is parallel equation solvers. In the article "Grandpa on Multicores (2)", this blog showed some parallel performances of LAIPE sparse solvers. LAIPE sparse solver can yield a speedup up to 7.85x or 98% of efficiency on 8 cores. If we only see LAIPE performance that yields a 98% of efficiency on 8 cores, it seems parallel computing is easy.

        Is parallel computing easy? We can see other sparse solvers for an answer

Parallel Performance of SuperLU

        SuperLU is a general purpose library, in public domain, for a solution of sparse nonsymmetrical (or asymmetrical) systems of linear equations. That is a popular library, installed on computers worldwide.

        SuperLU and LAIPE sparse solvers belong to the same family, direct solvers, which decompose matrix of system equations into [L][U], and then performs forward and backward substitutions for the solution. But, they have different performances.

        Previous posts have shown highly efficient performances of LAIPE sparse solvers. For example, the article "Grandpa on Multicores (2)" shows grandpa LAIPE can speed up to 7.85x on 8 cores. As mentioned before, if we only see LAIPE performance, we might think parallel computing is easy.

        The following cites the article "Evaluation of SuperLU on multicore architectures" to show performances of SuperLU. The pdf version of that article can be read here.

        SuperLU has three related subroutine libraries: sequential SuperLU for uniprocessors, the multithreaded version (SuperLU_MT), and the MPI version (SuperLU_DIST). Because LAIPE is implemented in a memory-sharing environment, the following cites performances of the version SuperLU_MT from that evaluation article.

        The "Evaluation of SuperLU" tested SuperLU_MT with four examples, which are identified by the matrix, g7jac200, stomach, torsol and twotone. The following cites the timing results from Table 3:

matrix: g7jac200
number of cores	factorization time (sec)	speedup	efficiency (%)
1	32.78	1.00	100.00
2	17.91	1.83	91.51
4	12.41	2.64	66.04
8	10.60	3.09	38.66

matrix: stomach
number of cores	factorization time (sec)	speedup	efficiency (%)
1	64.38	1.00	100.00
2	37.15	1.73	86.65
4	20.39	3.16	78.94
8	17.24	3.73	46.68

matrix: torsol
number of cores	factorization time (sec)	speedup	efficiency (%)
1	9.43	1.00	100.00
2	4.92	1.92	95.83
4	2.87	3.29	82.14
8	2.20	4.29	53.58

matrix: twotone
number of cores	factorization time (sec)	speedup	efficiency (%)
1	6.80	1.00	100.00
2	4.05	1.68	83.95
4	2.32	2.93	73.28
8	1.83	3.72	46.45

On 8 cores, SuperLU_MT has a speedup around 3~4x. The best performance is the example, torsol, which yielded a speedup 4.29x on 8 cores, equivalent to a 53% of efficiency. On 8 cores, SuperLU_MT only reaches about 50% of efficiency. How about the other 50% system resources? The other 50% of system resources were wasted. If we look SuperLU_MT performance, we may think parallel computing is not easy.

Certainly, if we look at performances of LAIPE, which may speed up to 7.8x on 8 cores, equivalent to 98% of efficiency, we may say parallel computing is easy. SuperLU and LAIPE sparse solvers belong to the same family --- direct solvers. On 8 cores, SuperLU_MT shows about 54% of efficiency; while LAIPE can yield a 98% of efficiency. Whether parallel computing is easy depends on who did it.