Equation Solution  
    High Performance by Design
Navigation Tree  

 
neuLoop  
'- - Basic Concept
'- - Limitations & Status
'- - Do Subroutine
'- - Functions
|     '- - nlp$use
|     '- - nlp$done
|     '- - nlp$staysoftcore
|     '- - nlp$loop
|     '- - nlp$syncloop
|     '- - nlp$barrier
|     '- - Parallel Locks
|     '- - nlp$getce
|     '- - User Times
'- - Where To Download
'- - Manual Tool!
 



Basic Concept


DO LOOPS

        FORTRAN is most well suitable for scientific and engineering programming. neuLoop is initially for FORTRAN programming. Other programming languages also can call neuLoop. neuLoop uses FORTRAN as the basic language for instructions.

        A loop construct contains two components: do variable with initial value, terminating bound, and step, and statements (or called "do statements"), which may include a few statements or hundreds or thousands statements or more. FORTRAN has a loop construct as:

          do var = start, stop, step
                statements
          end do

where
var -- loop variable, also called the loop index
start -- the initial value of var
stop -- the terminating bound of var
step -- either the increment or the decrement of var. By default, step is one.

        When applying neuLoop, do-loop should be re-written in a way soft cores can access. First, statements need to be rewritten into a subroutine. In other word, in neuLoop, do-statements must be a subroutine (or we call it do-subroutine).

DO-SUBROUTINE

        In neuLoop, do-statements is written in a subroutine (e.g., do-subroutine). We use the following do-loop as an example, and assume all variables are INTEGER.
   do i = 1, 10000
      a(i) = b(i)+c(i) 
      d(i) = i 
   end do
The do-loop has two statements. A direct conversion re-writes the two statements into a subroutine as

     recursive subroutine sub(i,a,b,c,d)
     integer :: i,a(1),b(1),c(1),d(1)
     a(i) = b(i)+c(i)
     d(i) = i
     return
     end subroutine sub

Then, the original do-loop becomes
   do i = 1, 10000
      call sub(i,a,b,c,d) 
   end do
Once the do-subroutine is ready, the do-loop can be rewritten in neuLoop in the following statement:
   call nlp$loop_8(sub,1,10000,1,a,b,c,d)
Then, soft cores can access the loop to execute the loop in parallel. Syntax to call neuLoop will be described lately. neuLoop provides a simple, straightforward and efficient way to parallelize loops. The only work programmer should provide is do-subroutine. A direct conversion, e.g., the above example, uses the default step (e.g., 1). In the following, we consider a more general conversion.

        We apply a "do-step" other than 1. For example, we use 222 as the do-step. The loop is then re-written as
   do i = 1, 10000, 222
      do j = i, min0(i+222-1,10000) 
         a(j) = b(j)+c(j) 
         d(j) = j 
      end do 
   end do
        A do-step other than 1 can improve parallel performance. The do-statements include an inner loop with two statements. A determination of do-step is based on the computing counts in statements. With a do-step, the do-statements are rewritten into a subroutine as:

     recursive subroutine sub(i,k,a,b,c,d)
     integer :: i,k,a(1),b(1),c(1),d(1),j
     do j = i,min0(i+k-1,10000)
            a(j) = b(j)+c(j)
            d(j) = j
     end do
     return
     end subroutine sub

The loop then becomes
   do i = 1, 10000, 222
      call sub(i,222,a,b,c,d) 
   end do
The above do-loop is then rewritten in neuLoop in the following statement:
   call nlp$loop_9(sub,1,10000,222,222,a,b,c,d)
Applying neuLoop to parallelize loops is simple and straightforward.

        When applying neuLoop, the caller, e.g., which calls nlp$loop_9 in the above example, is dispatcher. The dispatcher dispatches loop to soft cores.

        The dispatcher has two options. First, the dispatcher could block itself until the jobs, dispatched to soft cores, are done; Or if the output of the loop does not contribute to the subsequent computing, the dispatcher could immediately return to continue dispatching other jobs or execute other statements.

        Details on neuLoop calling syntax will be described lately.

APPLICATIONS ARE BEYOND PARALLELIZING LOOPS

        The above uses a do-loop as example to introduce neuLoop. Application of neuLoop is beyond parallelizing loops. neuLoop also can parallelize independent sections. For example, if we have three independent sections:
   [Independent Section 1]
   [Independent Section 2]
   [Independent Section 3]
Each independent section can be viewed as a loop with one iteration. For example,
   do i = 1, 1, 1
      [Independent Section 1] 
   end do
   
   do i = 1, 1, 1
      [Independent Section 2] 
   end do
   
   do i = 1, 1, 1
      [Independent Section 3] 
   end do
Each independent section can be written into a subroutine. For example,
   do i = 1, 1, 1
      call section_1(,,,) 
   end do
   
   do i = 1, 1, 1
      call section_2(,,,) 
   end do
   
   do i = 1, 1, 1
      call section_3(,,,) 
   end do
Those three independent sections can be rewritten in neuLoop as follow:
   call nlp$loop(section_1,1,1,1,,,)
   call nlp$loop(section_2,1,1,1,,,)
   call nlp$loop(section_3,1,1,1,,,)
   call nlp$barrier
The caller dispatches those three independent sections to soft cores, and blocks itself until the jobs are complete. neuLoop could be for general parallel programming in multicore and multiple-processor environment.