High Performance by Design
Whether Parallelizing Loops Is The Future of Parallel Computing?
[Posted by Jenn-Ching Luo on May 04, 2009 ]
Recently, we have heard a demand of new programming methods or technologies for multicores. The concern was raised when multicore has become a common feature of today's personal computers. That makes people to demand applications should be capable of multicore computing (i.e., parallel computing).
The original parallel computing never intended to parallelize every application, and is "selective". Parallel computing needs to partition data or functions of an application, and then distributes subset of data or functions onto computing elements (or cores) so as to speed up a computation. Applications whose data or functions cannot be partitioned are not a target for original parallel computing.
Parallel computing is not a new concept. Researchers have investigated parallel computing for more than two decades. In the past, parallel algorithms were developed under mathematical derivations, for example, domain decomposition or space decomposition. Those original parallel computing was not for a programmer who did not have a background in mathematics, and is not for every programmer.
Now, parallel computing is for every programmer. We need to find a new way to parallelize an application, instead of looking the problem from the beginning with mathematic formulations.
A potential alternative is to parallelize loops or independent blocks, which can be done on the face of a program. Parallelizing loops has become a trend because most today's parallel programs heavily rely on parallelizing loops. That is the most straightforward method for most programmers. No matter whether parallelizing loops is the future of parallel computing, we have seen parallelizing loops is a demand. We cannot ignore the fact.
Our concern is efficiency. Today's parallelizing loops is inefficient. Many users have a similar experience that parallelizing loops, for example, in OpenMP, could be sped up on 2 cores, but showed no speedup on 4 cores and was slowed down when employing 8 cores.
Dissatisfaction does not mean it is wrong to parallelize loops. As mentioned before, most parallel programs rely on parallelizing loops. An efficient way to parallelize loops definitely benefits programmers. Recently, there are reports showing GPU-assisted computing can parallelize loops more efficiently than multicores. I personally have not done any comparisons. Those impressed result seems to tell us GPU seems an answer to parallelize loops.