High Performance by Design
Powerful Multicores Need Powerful Methods
[Posted by Jenn-Ching Luo on Apr. 21, 2009 ]
Multicores (or multi-processors) provide extra computing power. Some users may think the extra computing power may allow us to have a brute-force search, instead of having a rational analysis. Brute-force search on a powerful multicore system may be defeated by a rational analysis on a single core.
I can give you an example.
Bandwidth of a sparse matrix can be reduced by a proper permutation of columns and rows of matrix. For a matrix of order nxn, there exist n! possible permutations. Theoretically, we can have a brute-force search to find the minimal bandwidth among the n! possible permutations. However, brute-force search is impractical, because brute-force search may have too many possibilities. For example,
1) When n=10, we have 3628800 possibilities;
2) When n=20, we have 2.43x1018 possibilities;
3) When n=100000, how many possibilities we might have? 100000! What a value!
Brute-force search is not always possible no matter how many cores are available.
As compared with a rational analysis, brute-force search is very time-consuming. Brute-force search on multicores may not catch up a rational analysis on a single core. For example, the web page (http://www.equation.com/servlet/equation.cmd?fa=jclexact) has a small example of n=37. A rational analysis could find the minimal bandwidth in 0.02 seconds; While brute-force search required 74.58 seconds. Indeed, the brute-force search that found the minimal bandwidth among the n! possible permutations could be implemented in parallel. The point is how many cores are required to complete the search in 0.02 seconds. It required 74.58/0.02=3729 cores for the brute-force search to catch up a rational analysis that took only 0.02 seconds.
Another example, in the same reference of web page, of size n=48, a rational analysis can find minimal bandwidth in 0.03 seconds; While brute-force search was aborted after 4 hours (not finding the minimal bandwidth yet), which is about 480000 times more cpu time. That means even we have 480000 cores, a brute-force search could not catch up a rational analysis on a single core on this example.
Powerful multicores also need a rational method, and do not allow us to have a brute-force search.