Equation Solution
High Performance by Design

Basic Concept

DO LOOPS

FORTRAN is most well suitable for scientific and engineering programming. neuLoop is initially for FORTRAN programming. Other programming languages also can call neuLoop. neuLoop uses FORTRAN as the basic language for instructions.

A loop construct contains two components: do variable with initial value, terminating bound, and step, and statements (or called "do statements"), which may include a few statements or hundreds or thousands statements or more. FORTRAN has a loop construct as:

do var = start, stop, step
statements
end do

where
var -- loop variable, also called the loop index
start -- the initial value of var
stop -- the terminating bound of var
step -- either the increment or the decrement of var. By default, step is one.

When applying neuLoop, do-loop should be re-written in a way soft cores can access. First, statements need to be rewritten into a subroutine. In other word, in neuLoop, do-statements must be a subroutine (or we call it do-subroutine).

DO-SUBROUTINE

In neuLoop, do-statements is written in a subroutine (e.g., do-subroutine). We use the following do-loop as an example, and assume all variables are INTEGER.
```   do i = 1, 10000
a(i) = b(i)+c(i)
d(i) = i
end do
```
The do-loop has two statements. A direct conversion re-writes the two statements into a subroutine as

recursive subroutine sub(i,a,b,c,d)
integer :: i,a(1),b(1),c(1),d(1)
a(i) = b(i)+c(i)
d(i) = i
return
end subroutine sub

Then, the original do-loop becomes
```   do i = 1, 10000
call sub(i,a,b,c,d)
end do
```
Once the do-subroutine is ready, the do-loop can be rewritten in neuLoop in the following statement:
call nlp\$loop_8(sub,1,10000,1,a,b,c,d)
Then, soft cores can access the loop to execute the loop in parallel. Syntax to call neuLoop will be described lately. neuLoop provides a simple, straightforward and efficient way to parallelize loops. The only work programmer should provide is do-subroutine. A direct conversion, e.g., the above example, uses the default step (e.g., 1). In the following, we consider a more general conversion.

We apply a "do-step" other than 1. For example, we use 222 as the do-step. The loop is then re-written as
```   do i = 1, 10000, 222
do j = i, min0(i+222-1,10000)
a(j) = b(j)+c(j)
d(j) = j
end do
end do
```
A do-step other than 1 can improve parallel performance. The do-statements include an inner loop with two statements. A determination of do-step is based on the computing counts in statements. With a do-step, the do-statements are rewritten into a subroutine as:

recursive subroutine sub(i,k,a,b,c,d)
integer :: i,k,a(1),b(1),c(1),d(1),j
do j = i,min0(i+k-1,10000)
a(j) = b(j)+c(j)
d(j) = j
end do
return
end subroutine sub

The loop then becomes
```   do i = 1, 10000, 222
call sub(i,222,a,b,c,d)
end do
```
The above do-loop is then rewritten in neuLoop in the following statement:
call nlp\$loop_9(sub,1,10000,222,222,a,b,c,d)
Applying neuLoop to parallelize loops is simple and straightforward.

When applying neuLoop, the caller, e.g., which calls nlp\$loop_9 in the above example, is dispatcher. The dispatcher dispatches loop to soft cores.

The dispatcher has two options. First, the dispatcher could block itself until the jobs, dispatched to soft cores, are done; Or if the output of the loop does not contribute to the subsequent computing, the dispatcher could immediately return to continue dispatching other jobs or execute other statements.

Details on neuLoop calling syntax will be described lately.

APPLICATIONS ARE BEYOND PARALLELIZING LOOPS

The above uses a do-loop as example to introduce neuLoop. Application of neuLoop is beyond parallelizing loops. neuLoop also can parallelize independent sections. For example, if we have three independent sections:
```   [Independent Section 1]
[Independent Section 2]
[Independent Section 3]
```
Each independent section can be viewed as a loop with one iteration. For example,
```   do i = 1, 1, 1
[Independent Section 1]
end do

do i = 1, 1, 1
[Independent Section 2]
end do

do i = 1, 1, 1
[Independent Section 3]
end do
```
Each independent section can be written into a subroutine. For example,
```   do i = 1, 1, 1
call section_1(,,,)
end do

do i = 1, 1, 1
call section_2(,,,)
end do

do i = 1, 1, 1
call section_3(,,,)
end do
```
Those three independent sections can be rewritten in neuLoop as follow:
```   call nlp\$loop(section_1,1,1,1,,,)
call nlp\$loop(section_2,1,1,1,,,)
call nlp\$loop(section_3,1,1,1,,,)
call nlp\$barrier
```
The caller dispatches those three independent sections to soft cores, and blocks itself until the jobs are complete. neuLoop could be for general parallel programming in multicore and multiple-processor environment.