Parallel Performance of Skyline Solver on Soft Cores (3)

[Posted by Jenn-Ching Luo on Apr. 22, 2012 ]

        Co-array is a FORTRAN standard, in which cooperative tasks (e.g., images) communicate with each other by passing messages. Message-passing is different from memory-sharing, e.g., OpenMP, which allows cooperative tasks to directly access common data.

        Some people claimed co-array, which is based on passing messages, is more efficient than OpenMP that is implemented in a memory-sharing environment. One of their concerns is memory-sharing environment requires an extra cost for maintaining cache coherence. However, we haven't seen an actual comparison to support such claim. We don't know if co-array is more efficient than OpenMP.

        As introduced before, neuLoop has two types of soft core, homogeneous core which is based on memory-sharing communication and heterogeneous core which is based on message-passing communication. This post does not directly compare co-array with OpenMP, but uses the laipe2 function laipe$decompose_vsp_8 to show different performance between message-passing and memory-sharing communications.

        It is easy for us to compare performances on message-passing and memory-sharing communications. First, we run the example by linking against homogeneous cores to get timing results, and then run the program again with heterogeneous cores. Timing results are as:

With Homogeneous Cores
number of cores	elapsed time (sec.)	speedup	efficiency (%)
1	242.93	1.00	100.00
2	121.82	1.99	99.71
3	81.48	2.98	99.38
4	61.49	3.95	98.77
5	49.61	4.90	97.94
6	41.72	5.82	97.05
7	36.05	6.74	96.27
8	31.84	7.63	95.37

With Heterogeneous Cores
number of cores	elapsed time (sec.)	speedup	efficiency (%)
1	248.91	1.00	100.00
2	125.02	1.99	99.55
3	83.77	2.97	99.05
4	63.32	3.93	98.27
5	51.11	4.87	97.40
6	43.06	5.78	96.34
7	37.38	6.66	95.13
8	33.31	7.47	93.41

From the above data, we can see directly accessing memory run faster than message-passing. It could be realized that, in memory-sharing environment, cooperative tasks can directly access common data. Passing message, which duplicates data, becomes an extra burden in memory-sharing environment, which definitely takes extra costs. Whether co-array is more efficient than OpenMP is unclear. In this writer's experience, if additional cost to maintain cache coherence is not significant, direct memory-access does not degrade parallel performance.