Equation Solution
Search the Web  
                                     
Navigation Tree  
Home
'- - Programming Tools
'- - Parallel Computing
'- - Blog
'- - In-Situ Evaluation
'- - Numerical Analysis
'- - Structural Mechanics
|     '- - IFAS
|     '- - mCable
'- - Write Us
|     '- - Feedback  
|     '- - Info  
|     '- - Support  
|     '- - Webmaster  
'- - Privacy Policy


Parallel Performance of Skyline Solver
on Soft Cores (3)


[Posted by Jenn-Ching Luo on Apr. 22, 2012 ]

      Co-array is a FORTRAN standard, in which cooperative tasks (e.g., images) communicate with each other by passing messages. Message-passing is different from memory-sharing, e.g., OpenMP, which allows cooperative tasks to directly access common data.

      Some people claimed co-array, which is based on passing messages, is more efficient than OpenMP that is implemented in a memory-sharing environment. One of their concerns is memory-sharing environment requires an extra cost for maintaining cache coherence. However, we haven't seen an actual comparison to support such claim. We don't know if co-array is more efficient than OpenMP.

      As introduced before, neuLoop has two types of soft core, homogeneous core which is based on memory-sharing communication and heterogeneous core which is based on message-passing communication. This post does not directly compare co-array with OpenMP, but uses the laipe2 function laipe$decompose_vsp_8 to show different performance between message-passing and memory-sharing communications.

      It is easy for us to compare performances on message-passing and memory-sharing communications. First, we run the example by linking against homogeneous cores to get timing results, and then run the program again with heterogeneous cores. Timing results are as:
 [With Homogeneous Cores by Direct Memory Access, e.g., memory-sharing]
 number of cores    elapsed time (sec.)    speedup    efficiency (%)
 -------------------------------------------------------------------
        1                 242.93             1.00        100.00
        2                 121.82             1.99         99.71
        3                  81.48             2.98         99.38
        4                  61.49             3.95         98.77
        5                  49.61             4.90         97.94
        6                  41.72             5.82         97.05
        7                  36.05             6.74         96.27
        8                  31.84             7.63         95.37
 [With Heterogeneous Cores by Passing Message, e.g., message-passing]
 number of cores    elapsed time (sec.)    speedup    efficiency (%)
 -------------------------------------------------------------------
        1                 248.91             1.00        100.00
        2                 125.02             1.99         99.55
        3                  83.77             2.97         99.05
        4                  63.32             3.93         98.27
        5                  51.11             4.87         97.40
        6                  43.06             5.78         96.34
        7                  37.38             6.66         95.13
        8                  33.31             7.47         93.41
From the above data, we can see directly accessing memory run faster than message-passing. It could be realized that, in memory-sharing environment, cooperative tasks can directly access common data. Passing message, which duplicates data, becomes an extra burden in memory-sharing environment, which definitely takes extra costs. Whether co-array is more efficient than OpenMP is unclear. In this writer's experience, if additional cost to maintain cache coherence is not significant, direct memory-access does not degrade parallel performance.