High Performance by Design
FORTRAN is most well suitable for scientific and engineering programming. neuLoop is initially for FORTRAN programming. Other programming languages also can call neuLoop. neuLoop uses FORTRAN as the basic language for instructions.
A loop construct contains two components: do variable with initial value, terminating bound, and step, and statements (or called "do statements"), which may include a few statements or hundreds or thousands statements or more. FORTRAN has a loop construct as:
do var = start, stop, step
whereAPPLICATIONS ARE BEYOND PARALLELIZING LOOPS
var -- loop variable, also called the loop index
start -- the initial value of var
stop -- the terminating bound of var
step -- either the increment or the decrement of var. By default, step is one.
When applying neuLoop, do-loop should be re-written in a way soft cores can access. First, statements need to be rewritten into a subroutine. In other word, in neuLoop, do-statements must be a subroutine (or we call it do-subroutine).
In neuLoop, do-statements is written in a subroutine (e.g., do-subroutine). We use the following do-loop as an example, and assume all variables are INTEGER.
do i = 1, 10000 a(i) = b(i)+c(i) d(i) = i end do
recursive subroutine sub(i,a,b,c,d)
integer :: i,a(1),b(1),c(1),d(1)
a(i) = b(i)+c(i)
d(i) = i
end subroutine sub
Then, the original do-loop becomes
do i = 1, 10000 call sub(i,a,b,c,d) end do
Then, soft cores can access the loop to execute the loop in parallel. Syntax to call neuLoop will be described lately. neuLoop provides a simple, straightforward and efficient way to parallelize loops. The only work programmer should provide is do-subroutine. A direct conversion, e.g., the above example, uses the default step (e.g., 1). In the following, we consider a more general conversion.
We apply a "do-step" other than 1. For example, we use 222 as the do-step. The loop is then re-written as
do i = 1, 10000, 222 do j = i, min0(i+222-1,10000) a(j) = b(j)+c(j) d(j) = j end do end do
recursive subroutine sub(i,k,a,b,c,d)
integer :: i,k,a(1),b(1),c(1),d(1),j
do j = i,min0(i+k-1,10000)
a(j) = b(j)+c(j)
d(j) = j
end subroutine sub
do i = 1, 10000, 222 call sub(i,222,a,b,c,d) end do
Applying neuLoop to parallelize loops is simple and straightforward.
When applying neuLoop, the caller, e.g., which calls nlp$loop_9 in the above example, is dispatcher. The dispatcher dispatches loop to soft cores.
The dispatcher has two options. First, the dispatcher could block itself until the jobs, dispatched to soft cores, are done; Or if the output of the loop does not contribute to the subsequent computing, the dispatcher could immediately return to continue dispatching other jobs or execute other statements.
Details on neuLoop calling syntax will be described lately.
The above uses a do-loop as example to introduce neuLoop. Application of neuLoop is beyond parallelizing loops. neuLoop also can parallelize independent sections. For example, if we have three independent sections:
[Independent Section 1] [Independent Section 2] [Independent Section 3]
do i = 1, 1, 1 [Independent Section 1] end do do i = 1, 1, 1 [Independent Section 2] end do do i = 1, 1, 1 [Independent Section 3] end do
do i = 1, 1, 1 call section_1(,,,) end do do i = 1, 1, 1 call section_2(,,,) end do do i = 1, 1, 1 call section_3(,,,) end do
call nlp$loop(section_1,1,1,1,,,) call nlp$loop(section_2,1,1,1,,,) call nlp$loop(section_3,1,1,1,,,) call nlp$barrier