Equation Solution
Search the Web  
                                         
Navigation Tree  
Home  
'- - Programming Tools  
'- - Blog: List of Contents  
'- - In-Situ Evaluation  
'- - Numerical Analysis  
'- - Parallel Computing
|     '- - LAIPE  
|     '- - MTASK  
'- - Structural Mechanics
|     '- - IFAS  
|     '- - mCable  
'- - Write Us
|     '- - Feedback  
|     '- - Info  
|     '- - Support  
|     '- - Webmaster  
'- - Privacy Policy  



Parallel Computing: Web Log


Mar. 18
2009
A 64-bit LAIPE benchmark for Windows compiled with gfortran is available for download at here.

Dec. 03
2008
An article entitled "OpenMP: Parallel Computing for Dummies" is posted. For the article, click here.

Nov. 08
2008
An article entitled "Is Co-Array the Future of Parallel Computing?" is available. For the article, click here.

Aug. 21
2008
Cache coherency may degrade parallel performance on memory-sharing machine. To see an illustration, click here.

June 19
2008
In our previous log, we indicated there is a workaround for Silverfrost FTN95 users to link against LAIPE library for ifort. That is not always correct. The workaround only works when object code of LAIPE subroutine has references to the same symbols in ifort and FTN95. Every compiler has its own supporting symbols. If a reference is different, there is no way for FTN95 program to link against LAIPE library for ifort.

June 15
2008
Equatrion Solution had attempted to port the package of parallel solvers, LAIPE, with MTASK, to Silverfrost FTN95 (formally Salford FTN95). For more information about Silverfrost FTN95, please visit their website at www.silverfrost.com. We have good news, and also have bad news.

Good News

There is no problem to port the parallel packages to Silverfrost FTN95. Highly efficient parallel performance can be expected. For example, the following is a set of timing result from solving a sparse system [A]{X}={B} of order 20000-by-20000 where [A] is a constant band matrix with a half bandwidth = 600:
       E:\examples\salford>f1
       Enter order: 20000
       Enter half bandwidth: 600
  
       Processor:  1
       Elapsed Time (Seconds):   406.64
       CPU Time in User Mode (Seconds):   406.55
       CPU Time in Kernel Mode (Seconds):     0.08
       Total CPU Time (Seconds):   406.63
       Lower and upper bound of relative error:
           -4.070835E-03
            1.824608E-02
  
       Processors:  2
       Elapsed Time (Seconds):   205.48
       CPU Time in User Mode (Seconds):   408.89
       CPU Time in Kernel Mode (Seconds):     0.23
       Total CPU Time (Seconds):   409.13
       Lower and upper bound of relative error:
           -4.070835E-03
            1.824608E-02
One processor took 406 seconds to complete the job. LAIPE allows two processors to cut the elapsed time into 205 seconds, almost a half. This is an almost perfect parallel performance. The above timing result was implemented on a 200MHZ dual Pentium Pro (no kidding), and executable is generated without the option /optimise.

Bad News

The bad news is Equation Solution could not find a way to control Silverfrost FTN95 optimization. FTN95's optimization may re-order statements. Parallel codes have statements for synchronization and to control critical section. If statements are reordered across synchronization or out of critical section, the code cannot function correctly. For example, the following is an implementation of optimized paralle code by FTN95:
       E:\examples\salford>f1
       Enter order: 20000
       Enter half bandwidth: 600
  
       Processor:  1
       Elapsed Time (Seconds):   127.38
       CPU Time in User Mode (Seconds):   127.28
       CPU Time in Kernel Mode (Seconds):     0.09
       Total CPU Time (Seconds):   127.38
       Lower and upper bound of relative error:
           -4.070835E-03
            1.824608E-02
  
       Processors:  2
       Elapsed Time (Seconds):  16.50
       CPU Time in User Mode (Seconds):   16.50
       CPU Time in Kernel Mode (Seconds):     0.0
       Total CPU Time (Seconds):   16.50
       Elapsed Time (Seconds):     1.45
       The system is not positive definite.
When one processor was employed, which is equivalent to sequential execution, the output is correct. The optimized code took 127 seconds to complete the job (unoptimized code required 406 seconds). FTN95 optimization significantly improves the speed. When two processors were employed, the execution was terminated before completion. Silverfrost FTN95's optimization re-ordered the statements, which has altered parallel procedure. The program after being re-ordered cannot function correctly in parallel.

Workaround

Silverfrost FTN95 is good at compilation speed. Since there is a problem to invoke FTN95 to produce optimized library of LAIPE, FTN95 can link against the LAIPE library for ifort (Intel Fortran compiler). For the time being, LAIPE library for ifort is compatible with FTN95. When FTN95 links the example against the LAIPE library for ifort, the executable works perfect. The following is a timing result.
       E:\examples\salford>f1
       Enter order: 20000
       Enter half bandwidth: 600
  
       Processor:  1
       Elapsed Time (Seconds):   102.26
       CPU Time in User Mode (Seconds):   102.08
       CPU Time in Kernel Mode (Seconds):     0.19
       Total CPU Time (Seconds):   102.27
       Lower and upper bound of relative error:
           -6.082316E-03
            2.158996E-02
  
       Processors:  2
       Elapsed Time (Seconds):   55.88
       CPU Time in User Mode (Seconds):   110.97
       CPU Time in Kernel Mode (Seconds):     0.27
       Total CPU Time (Seconds):   111.23
       Lower and upper bound of relative error:
           -6.082316E-03
            2.158996E-02
When two processors present, the elapsed time is reduced from 102.26 seconds to 55.88 seconds, almost a half. This is a workaround for FTN95 users to apply LAIPE to their applications.