Context Navigation

source: anr/section-etat-de-art.tex @ 333

Last change on this file since 333 was 319, checked in by coach, 14 years ago

template for Christophe CV, minor language modifications

anr/annexe-cv.tex
anr/section-consortium-people.tex
anr/section-objectif.tex
anr/section-1.tex
anr/section-2.tex
anr/section-position.tex
anr/section-etat-de-art.tex
anr/section-issues.tex

Property svn:eol-style set to native
Property svn:keywords set to Revision HeadURL Id Date

File size: 16.9 KB

Rev	Line
[289]	1	% vim:set spell:
	2	% vim:spell spelllang=en:
	3	\anrdoc{\begin{itemize}
	4	\item Presenter un etat de lâart national et international, en dressant lâetat des
	5	connaissances sur le sujet.
	6	\item Faire apparaÃ®tre dâeventuelles contributions des partenaires de la proposition
	7	de projet a cet etat de lâart.
	8	\item Faire apparaÃ®tre dâeventuels resultats preliminaires.
	9	\item Inclure les references bibliographiques necessaires en annexe 7.1.
	10	\end{itemize}}
	11
[310]	12	%Our project covers several critical domains in system design in order
	13	%to achieve high performance computing. Starting from a high level description we aim
	14	%at generating automatically both hardware and software components of the system.
[289]	15
	16	\subsubsection{High Performance Computing}
[310]	17	\label{soa:hpc}
[289]	18	% Un marchÃ© bouffÃ© par les archi GPGPU tel que le FERMI de NvidiaCUDA programming language
	19	The High-Performance Computing (HPC) world is composed of three main families of architectures:
	20	many-core, GPGPU (General Purpose computation on Graphics Unit Processing) and FPGA.
	21	The first two families are dominating the market by taking benefit
	22	of the strength and influence of mass-market leaders (Intel, Nvidia).
	23	%such as Intel for many-core CPU and Nvidia for GPGPU.
	24	In this market, FPGA architectures are emerging and very promising.
	25	By adapting architecture to the software, % (the opposite is done in the others families)
	26	FPGAs architectures enable better performance
	27	(typically between x10 and x100 accelerations)
	28	while using smaller size and less energy (and heat).
	29	However, using FPGAs presents significant challenges~\cite{hpc06a}.
	30	First, the operating frequency of an FPGA is low compared to a high-end microprocessor.
	31	Second, based on Amdahl law, HPC/FPGA application performance is unusually sensitive
	32	to the implementation quality~\cite{hpc06b}.
	33	% Thus, the performance strongly relies on the detected parallelism.
	34	% (pour rÃ©sumer les 2 derniers points)
	35	Finally, efficient design methodology are required in order to
	36	hide FPGA complexity and the underlying implantation subtleties to HPC users,
	37	so that they do not have to change their habits and can have equivalent design productivity
	38	than in others families~\cite{hpc07a}.
	39
	40	%Ã©tat de l'art FPGA
	41	HPC/FPGA hardware is only now emerging and in early commercial stages,
	42	but these techniques have not yet caught up.
	43	Industrial (Mitrionics~\cite{hpc08}, Gidel~\cite{hpc09}, Convey Computer~\cite{hpc10}) and academic (CHREC)
	44	researches on HPC-FPGA are mainly conducted in the USA.
	45	None of the approaches developed in these researches are fulfilling entirely the
	46	challenges described above. For example, Convey Computer proposes application-specific instruction set extension of x86 cores in FPGA accelerator,
	47	but extension generation is not automated and requires hardware design skills.
	48	Mitrionics has an elegant solution based on a compute engine specifically
	49	developed for high-performance execution in FPGAs. Unfortunately, the design flow
	50	is based on a new programming language (mitrionC) implying important designer efforts and poor portability.
	51	% tool relying on operator libraries (XtremeData),
	52	% Parle t-on de l'OPenFPGA consortium, dont le but est : "to accelerate the incorporation of reconfigurable computing technology in high-performance and enterprise applications" ?
	53
	54	Thus, much effort is required to develop design tools that translate high level
	55	language programs to FPGA configurations.
	56	Moreover, as already remarked in~\cite{hpc11}, Dynamic Partial Reconfiguration~\cite{hpc12}
	57	(DPR, which enables changing a part of the FPGA, while the rest is still working)
	58	appears very interesting for improving HPC performance as well as reducing required area.
	59
	60	\subsubsection{System Synthesis}
[310]	61	\label{soa:system:synthesis}
[289]	62	Today, several solutions for system design are proposed and commercialized.
	63	The existing commercial or free tools do not
	64	cover the whole system synthesis process in a full automatic way. Moreover,
	65	they are bound to a particular device family and to IPs library.
	66	The most commonly used are provided by \altera and \xilinx to promote their
	67	FPGA devices. These representative tools used to synthesize SoC on FPGA
	68	are introduced below.
	69	\\
	70	The \xilinx System Generator for DSP~\cite{system-generateur-for-dsp} is a
	71	plug-in to Simulink that enables designers to develop high-performance DSP
	72	systems for \xilinx FPGAs.
	73	Designers can design and simulate a system using MATLAB and Simulink. The
	74	tool will then automatically generate synthesizable Hardware Description
	75	Language (HDL) code mapped to \xilinx pre-optimized algorithms.
	76	However, this tool targets only DSP based algorithms, \xilinx FPGAs and
	77	cannot handle a complete SoC. Thus, it is not really a system synthesis tool.
	78	\\
	79	In the opposite, SOPC Builder~\cite{spoc-builder} from \altera and \xilinx
[319]	80	Platform Studio XPS from \xilinx allows to describe a system, to synthesize it,
[289]	81	to program it into a target FPGA and to upload a software application.
	82	Both SOPC Builder and XPS, allow designers to select and parameterize components from
	83	an extensive drop-down list of IP cores (I/O core, DSP, processor, bus core, ...)
	84	as well as incorporate their own IP. Nevertheless, all the previously introduced tools
	85	do not provide any facilities to synthesize coprocessors and to simulate the platform
	86	at a high level (SystemC).
	87	System designer must provide the synthesizable description of its own IP-cores with
	88	the feasible bus interface. Design Space Exploration is thus limited
	89	and SystemC simulation is not possible neither at transactional nor at cycle
	90	accurate level.
	91	\\
	92	In addition, \xilinx System Generator, XPS and SOPC Builder are closed world
	93	since each one imposes their own IPs which are not interchangeable.
	94	Designers can then only generate a synthesized netlist, VHDL/Verilog simulation test
	95	bench and custom software library that reflect the hardware configuration.
	96
	97	Consequently, a designer developing an embedded system needs to master four different
	98	design environments:
	99	\begin{enumerate}
	100	\item a virtual prototyping environment (in SystemC) for system level exploration,
	101	\item an architecture compiler to define the hardware architecture (Verilog/VHDL),
	102	\item one or several third-party HLS tools for coprocessor synthesis (C to RTL),
	103	\item and finally back-end synthesis tools for the bit-stream generation (RTL to bitstream).
	104	\end{enumerate}
	105	Furthermore, mixing these tools requires an important interfacing effort and this makes
	106	the design process very complex and achievable only by designers skilled in many domains.
	107
	108	\subsubsection{High Level Synthesis}
[310]	109	\label{soa:hls}
[289]	110	High Level Synthesis translates a sequential algorithmic description and a
	111	set of constraints (area, power, frequency, ...) to a micro-architecture at
	112	Register Transfer Level (RTL).
	113	Several academic and commercial tools are today available. The most common
	114	tools are SPARK~\cite{spark04}, GAUT~\cite{gaut08}, UGH~\cite{ugh08} in the
	115	academic world and CATAPULTC~\cite{catapult-c}, PICO~\cite{pico} and
	116	CYNTHETIZER~\cite{cynthetizer} in the commercial world. Despite their
	117	maturity, their usage is restrained by \cite{IEEEDT} \cite{CATRENE} \cite{HLSBOOK}:
	118	\begin{itemize}
	119	\item HLS tools are not integrated into an architecture and system exploration tool.
	120	Thus, a designer who needs to accelerate a software part of the system, must adapt it manually
	121	to the HLS input dialect and perform engineering work to exploit the synthesis result
	122	at the system level,
	123	\item Current HLS tools can not target control AND data oriented applications,
	124	\item HLS tools take into account mainly a unique constraint while realistic design
	125	is multi-constrained.
	126	Low power consumption constraint which is mandatory for embedded systems is not yet
	127	well handled or not handled at all by the HLS tools already available,
	128	\item The parallelism is extracted from initial specification.
	129	To get more parallelism or to reduce the amount of required memory in the SoC, the user
[319]	130	must re-write the algorithmic specification while there are techniques such as polyedric
[289]	131	transformations to increase the intrinsic parallelism,
	132	\item While they support limited loop transformations like loop unrolling and loop
[319]	133	pipelining, current HLS tools do not provide support for design space exploration, either
	134	through automatic loop transformations or through memory mapping,
[289]	135	\item Despite having the same input language (C/C++), they are sensitive to the style in
[319]	136	which the algorithm is written. Consequently, engineering work is required to swap from
[289]	137	a tool to another,
	138	\item They do not respect accurately the frequency constraint when they target an FPGA device.
	139	Their error is about 10 percent. This is annoying when the generated component is integrated
	140	in a SoC since it will slow down the whole system.
	141	\end{itemize}
	142	Regarding these limitations, it is necessary to create a new tool generation reducing the gap
	143	between the specification of an heterogeneous system and its hardware implementation \cite{HLSBOOK} \cite{IEEEDT}.
	144
	145	\subsubsection{Application Specific Instruction Processors}
[310]	146	\label{soa:asip}
[289]	147	ASIP (Application-Specific Instruction-Set Processor) are programmable
[319]	148	processors in which both the instruction set and the micro architecture have
[289]	149	been tailored to a given application domain or to a
	150	specific application. This specialization usually offers a good compromise
	151	between performance (w.r.t a pure software implementation on an embedded
	152	CPU) and flexibility (w.r.t an application specific hardware co-processor).
	153	In spite of their obvious advantages, using/designing ASIPs remains a
	154	difficult task, since it involves designing both a micro-architecture and a
	155	compiler for this architecture. Besides, to our knowledge, there is still
	156	no available open-source design flow for ASIP design even if such a tool
	157	would be valuable in the
	158	context of a System Level design exploration tool.
	159	\par
	160	In this context, ASIP design based on Instruction Set Extensions (ISEs) has
	161	received a lot of interest~\cite{NIOS2}, as it makes micro architecture synthesis
	162	more tractable \footnote{ISEs rely on a template micro-architecture in which
	163	only a small fraction of the architecture has to be specialized}, and help ASIP
	164	designers to focus on compilers, for which there are still many open
	165	problems\cite{ARC08}.
	166	This approach however has a severe weakness, since it also significantly reduces
	167	opportunities for achieving good speedups (most speedups remain between 1.5x and
	168	2.5x), since ISEs performance is generally tied down by I/O constraints as
	169	they generally rely on the main CPU register file to access data.
	170
	171	% (
	172	%automaticcaly extraction ISE candidates for application code \cite{CODES04},
	173	%performing efficient instruction selection and/or storage resource (register)
	174	%allocation \cite{FPGA08}).
	175	To cope with this issue, recent approaches~\cite{DAC09,CODES08,TVLSI06} advocate the use of
	176	micro-architectural ISE models in which the coupling between the processor micro-architecture
	177	and the ISE component is tightened up so as to allow the ISE to overcome the register
	178	I/O limitations. However these approaches generally tackle the problem from a compiler/simulation
	179	point of view and do not address the problem of generating synthesizable representations for
	180	these models.
	181
	182	We therefore strongly believe that there is a need for an open-framework which
	183	would allow researchers and system designers to :
	184	\begin{itemize}
	185	\item Explore the various level of interactions between the original CPU micro-architecture
	186	and its extension (for example through a Domain Specific Language targeted at micro-architecture
	187	specification and synthesis).
	188	\item Retarget the compiler instruction-selection pass
	189	(or prototype new passes) so as to be able to take advantage of this ISEs.
	190	\item Provide a complete System-level Integration for using ASIP as SoC building blocks
	191	(integration with application specific blocks, MPSoc, etc.)
	192	\end{itemize}
	193
	194	\subsubsection{Automatic Parallelization}
[310]	195	\label{soa:automatic:parallelization}
[289]	196	The problem of compiling sequential programs for parallel computers
	197	has been studied since the advent of the first parallel architectures
	198	in the 1970s. The basic approach consists in applying program transformations
	199	which exhibit or increase the potential parallelism, while guaranteeing
	200	the preservation of the program semantics. Most of these transformations
	201	just reorder the operations of the program; some of them modify its
	202	data structures. Dependences (exact or conservative) are checked to guarantee
	203	the legality of the transformation.
	204
	205	This has lead to the invention of many loop transformations (loop fusion,
	206	loop splitting, loop skewing, loop interchange, loop unrolling, ...)
	207	which interact in a complicated way. More recently, it has been noticed
	208	that all of these are just changes of basis in the iteration domain of
	209	the program. This has lead to the introduction of the polyhedral model
	210	\cite{FP:96,DRV:2000}, in which the combination of two transformations is
	211	simply a matrix product.
	212
	213	Since hardware is inherently parallel, finding parallelism in sequential
	214	programs in an important prerequisite for HLS. The large FPGA chips of
	215	today can accomodate much more parallelism than is available in basic blocks.
	216	The polyhedral model is the ideal tool for finding more parallelism in
	217	loops.
	218
	219	As a side effect, it has been observed that the polyhedral model is a useful
	220	tool for many other optimization, like memory reduction and locality
	221	improvement. Another point is
[319]	222	that the polyhedral model \emph{stricto sensu} applies only to
[289]	223	very regular programs. Its extension to more general programs is
	224	an active research subject.
	225
[307]	226	\subsubsection{SoC design flow automation using IP-XACT}
[310]	227	\label{soa:ip-xact}
[313]	228	% EV: Industrial IP integration flows based on IP-XACT standards: \cite{mds1}\\
	229	% EV: SPIRIT IP-XACT Controlled ESL Design Tool Applied to a Network-on-Chip Platform: \cite{mds2}\\
	230	% EV: SocKET design flow and Application on industrial use cases: \cite{socketflow}\\
[315]	231	% IA: http://www.design-reuse.com/articles/19895/ip-xact-xml.html \cite{dandr}\\
[307]	232	IP-XACT is an XML based open standard defined by the Accellera consortium.
	233	This non-profit organisation provides a unified set of high quality IP-XACT
	234	specifications for documenting IP using meta-data. This meta-data will be
	235	used for configuring, integrating, and verifying IP in advanced SoC design
	236	and interfacing tools using TGI (Tight Generator Interface is a software API)
	237	that can be used to access design meta-data descriptions of complete system designs.
	238	The specification for the schema is tailored to the requirements of the industry,
	239	and focused on enabling technologies for the efficient design of electronic
	240	systems from concept to production. The last IEEE 1685 release of IP-XACT incorporates
	241	both RTL and TLM (transaction level modelling) capabilities. Thus it can be used to
[315]	242	package IP portfolios~\cite{dandr} and describe their assembly in complex hardware architectures.~\cite{mds1}~\cite{mds2}
[307]	243	These description files are the basis for tool interoperability and data exchange
[315]	244	through a common structured data management\cite{socketflow}. Today more than two hundred companies
[307]	245	are members of the consortium and the board is incorporating top actors
	246	(STM, NXP, TI, ARM, FREESCALE, LSI, Mentor, Synopsys and Cadence), ensuring the
[319]	247	wide adoption by industry. Initiatives have already% work for (paul)
	248	attempted to extend this standard
[307]	249	to AMS IPs packaging domain (MEDEA+ Beyond Dreams Project) and to Hardware Dependent
	250	Software layers (MEDEA+ SoftSoc project) and Accellera is reusing these results for
	251	further releases.
[310]	252	\parlf
[307]	253	In IP-XACT the flow automation and data constistency is ensured by generators, which
	254	are program modules that process IP-XACT XML data into something useful
	255	for the design. They are key portable mechanism for encapsulating specialist design
	256	knowledge and enable designers to deploy specialist knowledge in their design. It is
	257	always possible to create generators in order to link several design or analysis tools
	258	around a centric representation of metadata in IP-XACT. This kind of XML schema for
	259	metadata management is a good solution for the federation of heterogeneous design domains
	260	(models, tools, languages, methodologies, etc.).
	261
[289]	262	%\subsubsection{High Performance Computing}
	263	%Accelerating high-performance computing (HPC) applications with field-programmable
	264	%gate arrays (FPGAs) can potentially improve performance.
	265	%However, using FPGAs presents significant challenges~\cite{hpc06a}.
	266	%First, the operating frequency of an FPGA is low compared to a high-end microprocessor.
	267	%Second, based on Amdahl law, HPC/FPGA application performance is unusually sensitive
	268	%to the implementation quality~\cite{hpc06b}.
	269	%Finally, High-performance computing programmers are a highly sophisticated but scarce
	270	%resource. Such programmers are expected to readily use new technology but lack the time
	271	%to learn a completely new skill such as logic design~\cite{hpc07a} .
	272	%\\
	273	%HPC/FPGA hardware is only now emerging and in early commercial stages,
	274	%but these techniques have not yet caught up.
	275	%Thus, much effort is required to develop design tools that translate high level
	276	%language programs to FPGA configurations.
	277

Note: See TracBrowser for help on using the repository browser.

Download in other formats: