Equation Solution High Performance by Design |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Parallel Performance of Skyline Solver on Soft Cores (3)
[Posted by Jenn-Ching Luo on Apr. 22, 2012 ]
Co-array is a FORTRAN standard, in which cooperative tasks (e.g., images) communicate with each other by passing messages. Message-passing is different from memory-sharing, e.g., OpenMP, which allows cooperative tasks to directly access common data.
Some people claimed co-array, which is based on passing messages, is more efficient than OpenMP that is implemented in a memory-sharing environment. One of their concerns is memory-sharing environment requires an extra cost for maintaining cache coherence. However, we haven't seen an actual comparison to support such claim. We don't know if co-array is more efficient than OpenMP. As introduced before, neuLoop has two types of soft core, homogeneous core which is based on memory-sharing communication and heterogeneous core which is based on message-passing communication. This post does not directly compare co-array with OpenMP, but uses the laipe2 function laipe$decompose_vsp_8 to show different performance between message-passing and memory-sharing communications. It is easy for us to compare performances on message-passing and memory-sharing communications. First, we run the example by linking against homogeneous cores to get timing results, and then run the program again with heterogeneous cores. Timing results are as:
From the above data, we can see directly accessing memory run faster than message-passing. It could be realized that, in memory-sharing environment, cooperative tasks can directly access common data. Passing message, which duplicates data, becomes an extra burden in memory-sharing environment, which definitely takes extra costs. Whether co-array is more efficient than OpenMP is unclear. In this writer's experience, if additional cost to maintain cache coherence is not significant, direct memory-access does not degrade parallel performance. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||