|
||||||||||||||
|
|
||||||||||||||
|
Parallel Performance of Skyline Solver
on Soft Cores (3) [Posted by Jenn-Ching Luo on Apr. 22, 2012 ]
Co-array is a FORTRAN standard, in which cooperative tasks (e.g., images) communicate with each other by passing messages. Message-passing is different from memory-sharing, e.g., OpenMP, which allows cooperative tasks to directly access common data.
Some people claimed co-array, which is based on passing messages, is more efficient than OpenMP that is implemented in a memory-sharing environment. One of their concerns is memory-sharing environment requires an extra cost for maintaining cache coherence. However, we haven't seen an actual comparison to support such claim. We don't know if co-array is more efficient than OpenMP. As introduced before, neuLoop has two types of soft core, homogeneous core which is based on memory-sharing communication and heterogeneous core which is based on message-passing communication. This post does not directly compare co-array with OpenMP, but uses the laipe2 function laipe2$decompose_vsp_8 to show different performance between message-passing and memory-sharing communications. It is easy for us to compare performances on message-passing and memory-sharing communications. First, we run the example by linking against homogeneous cores to get timing results, and then run the program again with heterogeneous cores. Timing results are as:
[With Homogeneous Cores by Direct Memory Access, e.g., memory-sharing]
number of cores elapsed time (sec.) speedup efficiency (%)
-------------------------------------------------------------------
1 242.93 1.00 100.00
2 121.82 1.99 99.71
3 81.48 2.98 99.38
4 61.49 3.95 98.77
5 49.61 4.90 97.94
6 41.72 5.82 97.05
7 36.05 6.74 96.27
8 31.84 7.63 95.37
[With Heterogeneous Cores by Passing Message, e.g., message-passing]
number of cores elapsed time (sec.) speedup efficiency (%)
-------------------------------------------------------------------
1 248.91 1.00 100.00
2 125.02 1.99 99.55
3 83.77 2.97 99.05
4 63.32 3.93 98.27
5 51.11 4.87 97.40
6 43.06 5.78 96.34
7 37.38 6.66 95.13
8 33.31 7.47 93.41
From the above data, we can see directly accessing memory run faster than message-passing. It could be realized that, in memory-sharing environment, cooperative tasks can directly access common data. Passing message, which duplicates data, becomes an extra burden in memory-sharing environment, which definitely takes extra costs. Whether co-array is more efficient than OpenMP is unclear. In this writer's experience, if additional cost to maintain cache coherence is not significant, direct memory-access does not degrade parallel performance.
|
|||||||||||||