neuLoop that parallelizes loops

Basic Concept

DO LOOPS

FORTRAN is most well suitable for scientific and engineering programming. neuLoop is initially for FORTRAN programming. Other programming languages also can call neuLoop. neuLoop uses FORTRAN as the basic language for instructions.

A loop construct contains two components: do variable with initial value, terminating bound, and step, and statements (or called "do statements"), which may include a few statements or hundreds or thousands statements or more. FORTRAN has a loop construct as:

do var = start, stop, step
statements
end do

where

var -- loop variable, also called the loop index

start -- the initial value of var

stop -- the terminating bound of var

step -- either the increment or the decrement of var. By default, step is one.

When applying neuLoop, do-loop should be re-written in a way soft cores can access. First, statements need to be rewritten into a subroutine. In other word, in neuLoop, do-statements must be a subroutine (or we call it do-subroutine).

DO-SUBROUTINE

In neuLoop, do-statements is written in a subroutine (e.g., do-subroutine). We use the following do-loop as an example, and assume all variables are INTEGER.

   do i = 1, 10000
      a(i) = b(i)+c(i) 
      d(i) = i 
   end do

The do-loop has two statements. A direct conversion re-writes the two statements into a subroutine as

     recursive subroutine sub(i,a,b,c,d)
     integer :: i,a(1),b(1),c(1),d(1)
     a(i) = b(i)+c(i)
     d(i) = i
     return
     end subroutine sub

Then, the original do-loop becomes

   do i = 1, 10000
      call sub(i,a,b,c,d) 
   end do

Once the do-subroutine is ready, the do-loop can be rewritten in neuLoop in the following statement:
call nlp$loop_8(sub,1,10000,1,a,b,c,d)
Then, soft cores can access the loop to execute the loop in parallel. Syntax to call neuLoop will be described lately. neuLoop provides a simple, straightforward and efficient way to parallelize loops. The only work programmer should provide is do-subroutine. A direct conversion, e.g., the above example, uses the default step (e.g., 1). In the following, we consider a more general conversion.

We apply a "do-step" other than 1. For example, we use 222 as the do-step. The loop is then re-written as

   do i = 1, 10000, 222
      do j = i, min0(i+222-1,10000) 
         a(j) = b(j)+c(j) 
         d(j) = j 
      end do 
   end do

A do-step other than 1 can improve parallel performance. The do-statements include an inner loop with two statements. A determination of do-step is based on the computing counts in statements. With a do-step, the do-statements are rewritten into a subroutine as:

     recursive subroutine sub(i,k,a,b,c,d)
     integer :: i,k,a(1),b(1),c(1),d(1),j
     do j = i,min0(i+k-1,10000)
            a(j) = b(j)+c(j)
            d(j) = j
     end do
     return
     end subroutine sub

The loop then becomes

   do i = 1, 10000, 222
      call sub(i,222,a,b,c,d) 
   end do

The above do-loop is then rewritten in neuLoop in the following statement:
   call nlp$loop_9(sub,1,10000,222,222,a,b,c,d)
Applying neuLoop to parallelize loops is simple and straightforward.

        When applying neuLoop, the caller, e.g., which calls nlp$loop_9 in the above example, is dispatcher. The dispatcher dispatches loop to soft cores.

        The dispatcher has two options. First, the dispatcher could block itself until the jobs, dispatched to soft cores, are done; Or if the output of the loop does not contribute to the subsequent computing, the dispatcher could immediately return to continue dispatching other jobs or execute other statements.

        Details on neuLoop calling syntax will be described lately.

APPLICATIONS ARE BEYOND PARALLELIZING LOOPS

The above uses a do-loop as example to introduce neuLoop. Application of neuLoop is beyond parallelizing loops. neuLoop also can parallelize independent sections. For example, if we have three independent sections:

   [Independent Section 1]
   [Independent Section 2]
   [Independent Section 3]

Each independent section can be viewed as a loop with one iteration. For example,

   do i = 1, 1, 1
      [Independent Section 1] 
   end do
   
   do i = 1, 1, 1
      [Independent Section 2] 
   end do
   
   do i = 1, 1, 1
      [Independent Section 3] 
   end do

Each independent section can be written into a subroutine. For example,

   do i = 1, 1, 1
      call section_1(,,,) 
   end do
   
   do i = 1, 1, 1
      call section_2(,,,) 
   end do
   
   do i = 1, 1, 1
      call section_3(,,,) 
   end do

Those three independent sections can be rewritten in neuLoop as follow:

   call nlp$loop(section_1,1,1,1,,,)
   call nlp$loop(section_2,1,1,1,,,)
   call nlp$loop(section_3,1,1,1,,,)
   call nlp$barrier

The caller dispatches those three independent sections to soft cores, and blocks itself until the jobs are complete. neuLoop could be for general parallel programming in multicore and multiple-processor environment.