Changeset 30 for anr/section-3.1.tex


Ignore:
Timestamp:
Jan 12, 2010, 3:27:48 PM (15 years ago)
Author:
coach
Message:

M anr/section-2.tex
M anr/section-2.2.tex
M anr/section-3.1.tex

File:
1 edited

Legend:

Unmodified
Added
Removed
  • anr/section-3.1.tex

    r12 r30  
    131131% Paul je ne suis pas sur que ce soit vraiment un etat de l'art
    132132% Christophe, ce que tu m'avais envoye se trouve dans obsolete/body.tex
    133 \mustbecompleted{
    134 Hardware is inherently parallel. On the other hand, high level languages,
    135 like C or Fortran, are abstractions of the processors of the 1970s, and
    136 hence are sequential. One of the aims of an HLS tool is therefore to
    137 extract hidden parallelism from the source program, and to infer enough
    138 hardaware operators for its efficient exploitation.
    139 \\
    140 Present day HLS tools search for parallelism in linear pieces of code
    141 acting only on scalars -- the so-called basic blocs. On the other hand,
    142 it is well known that most programs, especially in the fields of signal
    143 processing and image processing, spend most of their time executing loops
    144 acting on arrays. Efficient use of the large amount of hardware available
    145 in the next generation of FPGA chips necessitates parallelism far beyond
    146 what can be extracted from basic blocs only.
    147 \\
    148 The Compsys team of LIP has built an automatic parallelizer, Syntol, which
    149 handle restricted C programs -- the well known polyhedral model --,
    150 computes dependences and build a symbolic schedule. The schedule is
    151 a specification for a parallel program. The parallelism itself can be
    152 expressed in several ways: as a system of threads, or as data-parallel
    153 operations, or as a pipeline. In the context of the COACH project, one
    154 of the task will be to decide which form of parallelism is best suited
    155 to hardware, and how to convey the results of Syntol to the actual
    156 synthesis tools. One of the advantages of this approach is that the
    157 resulting degree of parallelism can be easilly controlled, e.g. by
    158 adjusting the number of threads, as a mean of exploring the
    159 area / performance tradeoff of the resulting design.
    160 \\
    161 Another point is that potentially parallel programs necessarily involve
    162 arrays: two operations which write to the same location must be executed
    163 in sequence. In synthesis, arrays translate to memory. However, in FPGAs,
    164 the amount of on-chip memory is limited, and access to an external memory
    165 has a high time penalty. Hence the importance of reducing the size of
    166 temporary arrays to the minimum necessary to support the requested degree
    167 of parallelism. Compsys has developped a stand-alone tool, Bee, based
    168 on research by A. Darte, F. Baray and C. Alias, which can be extended
    169 into a memory optimizer for COACH.
    170 }
     133%\mustbecompleted{
     134%Hardware is inherently parallel. On the other hand, high level languages,
     135%like C or Fortran, are abstractions of the processors of the 1970s, and
     136%hence are sequential. One of the aims of an HLS tool is therefore to
     137%extract hidden parallelism from the source program, and to infer enough
     138%hardware operators for its efficient exploitation.
     139%\\
     140%Present day HLS tools search for parallelism in linear pieces of code
     141%acting only on scalars -- the so-called basic blocs. On the other hand,
     142%it is well known that most programs, especially in the fields of signal
     143%processing and image processing, spend most of their time executing loops
     144%acting on arrays. Efficient use of the large amount of hardware available
     145%in the next generation of FPGA chips necessitates parallelism far beyond
     146%what can be extracted from basic blocs only.
     147\\
     148%The Compsys team of LIP has built an automatic parallelizer, Syntol, which
     149%handle restricted C programs -- the well known polyhedral model --,
     150%computes dependences and build a symbolic schedule. The schedule is
     151%a specification for a parallel program. The parallelism itself can be
     152%expressed in several ways: as a system of threads, or as data-parallel
     153%operations, or as a pipeline. In the context of the COACH project, one
     154%of the task will be to decide which form of parallelism is best suited
     155%to hardware, and how to convey the results of Syntol to the actual
     156%synthesis tools. One of the advantages of this approach is that the
     157%resulting degree of parallelism can be easilly controlled, e.g. by
     158%adjusting the number of threads, as a mean of exploring the
     159%area / performance tradeoff of the resulting design.
     160\\
     161%Another point is that potentially parallel programs necessarily involve
     162%arrays: two operations which write to the same location must be executed
     163%in sequence. In synthesis, arrays translate to memory. However, in FPGAs,
     164%the amount of on-chip memory is limited, and access to an external memory
     165%has a high time penalty. Hence the importance of reducing the size of
     166%temporary arrays to the minimum necessary to support the requested degree
     167%of parallelism. Compsys has developped a stand-alone tool, Bee, based
     168%on research by A. Darte, F. Baray and C. Alias, which can be extended
     169%into a memory optimizer for COACH.
     170%}
     171
     172The problem of compiling sequential programs for parallel computers
     173has been studied since the advent of the first parallel architectures
     174in the 1970s. The basic approach consists in applying program transformations
     175which exhibit or increase the potential parallelism, while guaranteeing
     176the preservation of the program semantics. Most of these transformations
     177just reorder the operations of the program; some of them modify its
     178data structures. Dpendences (exact or conservative) are checked to guarantee
     179the legality of the transformation.
     180
     181This has lead to the invention of many loop transformations (loop fusion,
     182loop splitting, loop skewing, loop interchange, loop unrolling, ...)
     183which interact in a complicated way. More recently, it has been noticed
     184that all of these are just changes of basis in the iteration domain of
     185the program. This has lead to the invention of the polyhedral model, in
     186which the combination of two transformation is simply a matrix product.
     187
     188As a side effect, it has been observed that the polytope model is a useful
     189tool for many other optimization, like memory reduction and locality
     190improvement. Another point is
     191that the polyhedral domain \emph{stricto sensu} applies only to
     192very regular programs. Its extension to more general programs is
     193an active research subject.
    171194
    172195\subsubsection{Interfaces}
Note: See TracChangeset for help on using the changeset viewer.