Equation Solution  
    High Performance by Design
Navigation Tree  

 
neuLoop
'- - Basic Concept
'- - Limitations & Status
'- - Do Subroutine
'- - Functions
|     '- - nlp$use
|     '- - nlp$done
|     '- - nlp$staysoftcore
|     '- - nlp$loop
|     '- - nlp$syncloop
|     '- - nlp$barrier
|     '- - Parallel Locks
|     '- - nlp$getce
|     '- - User Times
'- - Where To Download
'- - Manual Tool!
 



neuLoop that parallelizes loops


        neuLoop is a parallel programming language for multiple cores and multi-processors.

Soft Core Technology

        neuLoop uses an innovative technology, soft core, to execute programs. Soft cores are virtual concept of physical cores. Application programs do not direct access physical cores, and soft cores do computations for application programs.

        Soft core is different from traditionally multithreaded applications. In traditionally threaded applications, application programs must be coded in threads, each of which is a unit to be executed simultaneously. OpenMP is an example of multithreaded applications.

        With soft cores, applications are not coded in threads, but written in a way soft cores can access. The program of soft-core application is sequential in nature, but soft cores execute it simultaneously. In traditionally multithreaded applications, each thread has a predefined function; while, soft core is opened for any computing requests, without a predefined function. That is the main difference from traditionally threaded application.

        neuLoop allows the operating system to dynamically schedule soft cores to physical cores, and also allows programs to set each soft core onto a physical core.

        Application program should be coded in a way such that soft cores can access data and computing instructions of the program. Application programs do not need to understand how the computing instructions are executed. Soft cores take care of the computing.

        neuLoop includes two types of soft cores, homogeneous and heterogeneous. The goal is to see which architecture is more efficient.

        One type is based on homogeneous architecture, where soft cores communicate with each other by "direct memory access". In the other type, soft cores communicate each other by passing messages in a heterogeneous environment. Up to date, we have no conclusion which architecture provides a better platform. User can link application against a type of soft cores.

        In order to reduce overhead, soft cores are designed to have minimal functions that allow them to perform. Definitely, under the consideration of less overhead, soft core has limitations. It could be realized that, in order to reduce overhead, soft cores cannot have sophisticated functions. From some preliminary tests, neuLoop has less overhead than OpenMP, and also can yield a better speedup.

New Tool for Parallel Computing

        neuLoop uses soft cores to parallelize loops. Even the word "loop" is in neuLoop. Applications of neuLoop are beyond parallelizing loops. For example, a block of statements can be viewed as a loop without iteration. neuLoop also can parallelize independent blocks.

        Scientific and engineering computing includes lots of do-loops. neuLoop is designed for parallelizing loops, and provides a simple, straightforward and efficient parallel programming method.

        In 1980s, when parallel computing was introduced to speed up time-consuming scientific and engineering computing, parallel computing was treated as a mathematic problem. It needed to break down a complex problem into a system of related subproblems, with a certain degree of data dependence, and then to compute the related subproblems simultaneously.

        Breaking down a complex problem into a system of subproblems in mathematic view is not a simple issue, and not every complex problem can be broken down into a system of related subproblems. Instead of breaking down a problem into a system of related subproblems, an alternative for parallel computing is to parallel do loops.

        neuLoop was released in December 2011, and is designed to parallelize loops in memory-sharing environment for multi-core and multi-processor programming.