source: anr/section-etat-de-art.tex @ 374

Last change on this file since 374 was 369, checked in by coach, 14 years ago

Anglais. Voir deux questions de fond.

  • Property svn:eol-style set to native
  • Property svn:keywords set to Revision HeadURL Id Date
File size: 16.2 KB
RevLine 
[289]1% vim:set spell:
2% vim:spell spelllang=en:
3\anrdoc{\begin{itemize}
4\item Presenter un etat de l’art national et international, en dressant l’etat des
5      connaissances sur le sujet.
6\item Faire apparaître d’eventuelles contributions des partenaires de la proposition
7      de projet a cet etat de l’art.
8\item Faire apparaître d’eventuels resultats preliminaires.
9\item Inclure les references bibliographiques necessaires en annexe 7.1.
10\end{itemize}}
11
[310]12%Our project covers several critical domains in system design in order
13%to achieve high performance computing. Starting from a high level description we aim
14%at generating automatically both hardware and software components of the system.
[289]15
16\subsubsection{High Performance Computing}
[310]17\label{soa:hpc}
[289]18% Un marché bouffé par les archi GPGPU tel que le FERMI de NvidiaCUDA programming language
19The High-Performance Computing (HPC) world is composed of three main families of architectures:
20many-core, GPGPU (General Purpose computation on Graphics Unit Processing) and FPGA.
[369]21Today, the first  two families are dominating the market by taking benefit
[289]22of the strength and influence of mass-market leaders (Intel, Nvidia).
23%such as Intel for many-core CPU and Nvidia for GPGPU.
24In this market, FPGA architectures are emerging and very promising.
25By adapting architecture to the software, % (the opposite is done in the others families)
26FPGAs architectures enable better performance
[356]27(typically an acceleration factor between 10 and 100)
[369]28while using smaller size and less energy (and generating less heat).
[289]29However, using FPGAs presents significant challenges~\cite{hpc06a}.
30First, the operating frequency of an FPGA is low compared to a high-end microprocessor.
[369]31Second, % based on Amdahl law,
32 HPC/FPGA application performance is unusually sensitive
[289]33to the implementation quality~\cite{hpc06b}.
34% Thus, the performance strongly relies on the detected parallelism.
35% (pour résumer les 2 derniers points)
36Finally, efficient design methodology are required in order to
37hide FPGA complexity and the underlying implantation subtleties to HPC users,
38so that they do not have to change their habits and can have equivalent design productivity
39than in others families~\cite{hpc07a}.
40
41%état de l'art FPGA
42HPC/FPGA hardware is only now emerging and in early commercial stages,
43but these techniques have not yet caught up.
44Industrial (Mitrionics~\cite{hpc08}, Gidel~\cite{hpc09}, Convey Computer~\cite{hpc10}) and academic (CHREC)
45researches on HPC-FPGA are mainly conducted in the USA.
46None of the approaches developed in these researches are fulfilling entirely the
[369]47challenges described above. For example, Convey Computer proposes application-specific instruction
48set extension of x86 cores in an FPGA accelerator,
[289]49but extension generation is not automated and requires hardware design skills.
50Mitrionics has an elegant solution based on a compute engine specifically
51developed for high-performance execution in FPGAs. Unfortunately, the design flow
52is based on a new programming language (mitrionC) implying important designer efforts and poor portability.
53% tool relying on operator libraries (XtremeData), 
54% Parle t-on de l'OPenFPGA consortium, dont le but est : "to accelerate the incorporation of reconfigurable computing technology in high-performance and enterprise applications" ?
55
56Thus, much effort is required to develop design tools that translate high level
57language programs to FPGA configurations.
58Moreover, as already remarked in~\cite{hpc11}, Dynamic Partial Reconfiguration~\cite{hpc12}
59(DPR, which enables changing a part of the FPGA, while the rest is still working)
60appears very interesting for improving HPC performance as well as reducing required area.
61
[369]62%oui, mais il me semble que COACH ne va rien faire à ce sujet. Est-ce la peine de
63%donner des verges pour nous faire battre? En outre, je ne vois pas bien l'intérêt.
64%
65%Paul
66
[289]67\subsubsection{System Synthesis}
[310]68\label{soa:system:synthesis}
[289]69Today, several solutions for system design are proposed and commercialized.
70The existing commercial or free tools do not
[369]71cover the whole system synthesis process in a fully automatic way. Moreover,
72they are bound to a particular device family and to an IP library.
[289]73The most commonly used are provided by \altera and \xilinx to promote their
74FPGA devices. These representative tools used to synthesize SoC on FPGA
75are introduced below.
76\\
77The \xilinx System Generator for DSP~\cite{system-generateur-for-dsp} is a
78plug-in to Simulink that enables designers to develop high-performance DSP
79systems for \xilinx FPGAs.
[369]80Designers can specify and simulate a system using MATLAB and Simulink. The
[289]81tool will then automatically generate synthesizable Hardware Description
82Language (HDL) code mapped to \xilinx pre-optimized algorithms.
[369]83However, this tool targets only signal processing algorithms, \xilinx FPGAs and
[289]84cannot handle a complete SoC. Thus, it is not really a system synthesis tool.
85\\
86In the opposite, SOPC Builder~\cite{spoc-builder} from \altera and \xilinx 
[339]87Platform Studio XPS from \xilinx allow to describe a system, to synthesize it,
[289]88to program it into a target FPGA and to upload a software application.
[369]89Both SOPC Builder and XPS allow designers to select and parameterize components from
[289]90an extensive drop-down list of IP cores (I/O core, DSP, processor,  bus core, ...)
[369]91as well as to incorporate their own IP. Nevertheless, all the previously introduced tools
[289]92do not provide any facilities to synthesize coprocessors and to simulate the platform
93at a high level (SystemC).
[369]94A system designer must provide the synthesizable description of its own IP-cores with
95a feasible bus interface.%
96%qu'est-ce que c'est qu'un ``feasible bus interface''? a *standard* bus interface? Paul
97%
98Design Space Exploration is thus limited
99and SystemC simulation is not possible either at transactional or at cycle
[289]100accurate level.
101\\
102In addition, \xilinx System Generator, XPS and SOPC Builder are closed world
103since each one imposes their own IPs which are not interchangeable.
104Designers can then only generate a synthesized netlist, VHDL/Verilog simulation test
105bench and custom software library that reflect the hardware configuration.
106
107Consequently, a designer developing an embedded system needs to master four different
108design environments:
109\begin{enumerate}
110  \item a virtual prototyping environment (in SystemC) for system level exploration,
111  \item an architecture compiler to define the hardware architecture (Verilog/VHDL),
112  \item one or several third-party HLS tools for coprocessor synthesis (C to RTL),
113  \item and finally back-end synthesis tools for the bit-stream generation (RTL to bitstream).
114\end{enumerate}
115Furthermore, mixing these tools requires an important interfacing effort and this makes
116the design process very complex and achievable only by designers skilled in many domains.
117
118\subsubsection{High Level Synthesis}
[310]119\label{soa:hls}
[289]120High Level Synthesis translates a sequential algorithmic description and a
121set of constraints (area, power, frequency, ...) to a micro-architecture at
122Register Transfer Level (RTL).
123Several academic and commercial tools are today available. The most common
124tools are SPARK~\cite{spark04}, GAUT~\cite{gaut08}, UGH~\cite{ugh08} in the
125academic world and CATAPULTC~\cite{catapult-c}, PICO~\cite{pico} and
126CYNTHETIZER~\cite{cynthetizer} in the commercial world.  Despite their
127maturity, their usage is restrained by \cite{IEEEDT} \cite{CATRENE} \cite{HLSBOOK}:
128\begin{itemize}
129\item HLS tools are not integrated into an architecture and system exploration tool.
130Thus, a designer who needs to accelerate a software part of the system, must adapt it manually
131to the HLS input dialect and perform engineering work to exploit the synthesis result
132at the system level,
[369]133\item Current HLS tools cannot target control AND data oriented applications,
[289]134\item HLS tools take into account mainly a unique constraint while realistic design
135is multi-constrained.
[369]136The power consumption constraint which is mandatory for embedded systems is not yet
[289]137well handled or not handled at all by the HLS tools already available,
[369]138\item The parallelism is limited to that present in the initial specification.
[289]139To get more parallelism or to reduce the amount of required memory in the SoC, the user
[356]140must re-write the algorithmic specification while there are techniques such as polyhedral
[369]141transformations that can automate this process.
[289]142\item While they support limited loop transformations like loop unrolling and loop
[319]143pipelining, current HLS tools do not provide support for design space exploration, either
[369]144through automatic loop transformations or for improving the memory mapping,
[289]145\item Despite having the same input language (C/C++), they are sensitive to the style in
[319]146which the algorithm is written. Consequently, engineering work is required to swap from
[289]147a tool to another,
148\item They do not respect accurately the frequency constraint when they target an FPGA device.
149Their error is about 10 percent. This is annoying when the generated component is integrated
150in a SoC since it will slow down the whole system.
151\end{itemize}
152Regarding these limitations, it is necessary to create a new tool generation reducing the gap
153between the specification of an heterogeneous system and its hardware implementation \cite{HLSBOOK} \cite{IEEEDT}.
154
155\subsubsection{Application Specific Instruction Processors}
[310]156\label{soa:asip}
[289]157ASIP (Application-Specific Instruction-Set Processor) are programmable
[319]158processors in which both the instruction set and the micro architecture have
[289]159been tailored to a given application domain or to a
160specific application.  This specialization usually offers a good compromise
161between performance (w.r.t a pure software implementation on an embedded
162CPU) and flexibility (w.r.t an application specific hardware co-processor).
163In spite of their obvious advantages, using/designing ASIPs remains a
164difficult task, since it involves designing both a micro-architecture and a
165compiler for this architecture. Besides, to our knowledge, there is still
166no available open-source design flow for ASIP design even if such a tool
167 would be valuable in the
168context of a System Level design exploration tool.
169\par
170In this context, ASIP design based on Instruction Set Extensions (ISEs) has
[369]171received a lot of interest~\cite{NIOS2}, as it makes micro-architecture synthesis
[289]172more tractable \footnote{ISEs rely on a template micro-architecture in which
173only a small fraction of the architecture has to be specialized}, and help ASIP
174designers to focus on compilers, for which there are still many open
175problems\cite{ARC08}.
176This approach however has a severe weakness, since it also significantly reduces
177opportunities for achieving good speedups (most speedups remain between 1.5x and
[339]1782.5x), since ISEs performance is generally limited by I/O constraints as
[289]179they generally rely on the main CPU register file to access data.
180
181% (
182%automaticcaly extraction ISE candidates for application code \cite{CODES04},
183%performing efficient instruction selection and/or storage resource (register)
184%allocation \cite{FPGA08}). 
185To cope with this issue, recent approaches~\cite{DAC09,CODES08,TVLSI06} advocate the use of
186micro-architectural ISE models in which the coupling between the processor micro-architecture
187and the ISE component is tightened up so as to allow the ISE to overcome the register
188I/O limitations. However these approaches generally tackle the problem from a compiler/simulation
189point of view and do not address the problem of generating synthesizable representations for
190these models.
191
192We therefore strongly believe that there is a need for an open-framework which
193would allow researchers and system designers to :
194\begin{itemize}
195\item Explore the various level of interactions between the original CPU micro-architecture
196and its extension (for example through a Domain Specific Language targeted at micro-architecture
197specification and synthesis).
198\item Retarget the compiler instruction-selection pass
199(or prototype new passes) so as to be able to take advantage of this ISEs.
200\item Provide  a complete System-level Integration for using ASIP as SoC building blocks
201(integration with application specific blocks, MPSoc, etc.)
202\end{itemize}
203
204\subsubsection{Automatic Parallelization}
[310]205\label{soa:automatic:parallelization}
[289]206The problem of compiling sequential programs for parallel computers
207has been studied since the advent of the first parallel architectures
208in the 1970s. The basic approach consists in applying program transformations
209which exhibit or increase the potential parallelism, while guaranteeing
210the preservation of the program semantics. Most of these transformations
211just reorder the operations of the program; some of them modify its
212data structures. Dependences (exact or conservative) are checked to guarantee
213the legality of the transformation.
214
215This has lead to the invention of many loop transformations (loop fusion,
216loop splitting, loop skewing, loop interchange, loop unrolling, ...)
217which interact in a complicated way. More recently, it has been noticed
218that all of these are just changes of basis in the iteration domain of
219the program. This has lead to the introduction of the polyhedral model
220\cite{FP:96,DRV:2000}, in which the combination of two transformations is
221simply a matrix product.
222
223Since hardware is inherently parallel, finding parallelism in sequential
224programs in an important prerequisite for HLS. The large FPGA chips of
[356]225today can accommodate much more parallelism than is available in basic blocks.
[289]226The polyhedral model is the ideal tool for finding more parallelism in
227loops.
228
229As a side effect, it has been observed that the polyhedral model is a useful
230tool for many other optimization, like memory reduction and locality
[356]231improvement. It should be noted
[319]232that the polyhedral model \emph{stricto sensu} applies only to
[289]233very regular programs. Its extension to more general programs is
234an active research subject.
235
[307]236\subsubsection{SoC design flow automation using IP-XACT}
[310]237\label{soa:ip-xact}
[313]238% EV: Industrial IP integration flows based on IP-XACT standards: \cite{mds1}\\
239% EV: SPIRIT IP-XACT Controlled ESL Design Tool Applied to a Network-on-Chip Platform: \cite{mds2}\\
240% EV: SocKET design flow and Application on industrial use cases: \cite{socketflow}\\
[315]241% IA: http://www.design-reuse.com/articles/19895/ip-xact-xml.html \cite{dandr}\\
[307]242IP-XACT is an XML based open standard defined by the Accellera consortium.
243This non-profit organisation provides a unified set of high quality IP-XACT
244specifications for documenting IP using meta-data. This meta-data will be
245used for configuring, integrating, and verifying IP in advanced SoC design
246and interfacing tools using TGI (Tight Generator Interface is a software API)
247that can be used to access design meta-data descriptions of complete system designs.
248The specification for the schema is tailored to the requirements of the industry,
249and focused on enabling technologies for the efficient design of electronic
250systems from concept to production. The last IEEE 1685 release of IP-XACT incorporates
251both RTL and TLM (transaction level modelling) capabilities. Thus it can be used to
[315]252package IP portfolios~\cite{dandr} and describe their assembly in complex hardware architectures.~\cite{mds1}~\cite{mds2} 
[307]253These description files are the basis for tool interoperability and data exchange
[315]254through a common structured data management\cite{socketflow}. Today more than two hundred companies
[307]255are members of the consortium and the board is incorporating top actors
256(STM, NXP, TI, ARM, FREESCALE, LSI, Mentor, Synopsys and Cadence), ensuring the
[339]257wide adoption by industry. Initiatives have already
[319]258attempted to extend this standard
[339]259to the AMS IPs packaging domain (MEDEA+ Beyond Dreams Project) and to Hardware Dependent
[307]260Software layers (MEDEA+ SoftSoc project) and Accellera is reusing these results for
261further releases.
[310]262\parlf
[356]263In IP-XACT the flow automation and data consistency is ensured by generators, which
[307]264are program modules that process IP-XACT XML data into something useful
265for the design. They are key portable mechanism for encapsulating specialist design
266knowledge and enable designers to deploy specialist knowledge in their design. It is
267always possible to create generators in order to link several design or analysis tools
[356]268around a centric representation of meta-data in IP-XACT. This kind of XML schema for
269meta-data management is a good solution for the federation of heterogeneous design domains
[307]270(models, tools, languages, methodologies, etc.).
271
Note: See TracBrowser for help on using the repository browser.