1 | \section{Project context} |
---|
2 | \hspace{2cm}\begin{scriptsize}\begin{verbatim} |
---|
3 | % 1. CONTEXTE ET POSITIONNEMENT DU PROJET |
---|
4 | % (1 page maximum) Prᅵsentation gᅵnᅵrale du problᅵme qu'il est proposᅵ de traiter |
---|
5 | % dans le projet et du cadre de travail (recherche fondamentale, industrielle ou |
---|
6 | % dï¿œveloppement expï¿œrimental). |
---|
7 | \end{verbatim} |
---|
8 | \end{scriptsize} |
---|
9 | An embedded system is an application integrated into one or several chips |
---|
10 | in order to accelerate it or to embedd it into a small device such as a personal |
---|
11 | digital assistant (PDA). |
---|
12 | This topic is investigated since 80s using Applications Specific Integrated Circuits (ASIC), |
---|
13 | Digital Signal Processing (DSP) and parallel computing on multiprocessor machines or networks. |
---|
14 | More recently, since end of 90s, other technologies appeared like Very Large Instruction Word (VLIW), |
---|
15 | Application Specific Instruction Processors (ASIP), System on Chip (SoC), |
---|
16 | Multi-Processors SoC (MPSoC). |
---|
17 | \\ |
---|
18 | During these last decades embedded system was reserved to major industrial companies targeting high volume market |
---|
19 | due to the design and fabrication costs. |
---|
20 | Nowadays Field Programmable Gate Arrays (FPGA), like Virtex5 from Xilinx and Stratix4 from Altera, |
---|
21 | can implement a SoC with multiple processors and several coprocessors for less than 10K euros |
---|
22 | per item. In addition, High Level Synthesis (HLS) becomes more mature and allows to automate |
---|
23 | design and to drastically decrease its cost in terms of man power. Thus, both FPGA and HLS |
---|
24 | tend to spread over HPC for small companies targeting low volume markets. |
---|
25 | \par |
---|
26 | To get an efficient embedded system, designer has to take into account application characteristics when it |
---|
27 | chooses one of the former technologies. |
---|
28 | This choice is not easy and in most cases designer has to try different technologies to retain the |
---|
29 | most adapted one. |
---|
30 | \\ |
---|
31 | The first objective of COACH is to provide an open-source framework to design embedded system |
---|
32 | on FPGA device. |
---|
33 | COACH framework allows designer to explore various software/hardware partitions of the |
---|
34 | target application, to run timing and functional simulations and to generate automatically both |
---|
35 | the software and the synthesizable description of the hardware. |
---|
36 | The main topics of the project are: |
---|
37 | \begin{itemize} |
---|
38 | \item |
---|
39 | Design space exploration: It consists in analysing the application runnig on FPGA, defining the target |
---|
40 | technology (SoC, MPSoC, ASIP, ...) and hardware/software partitioning of tasks depending on |
---|
41 | technology choice. This exploration is driven basically by throughput, latency and power consumption |
---|
42 | criteria. |
---|
43 | \item |
---|
44 | Micro-architectural exploration: When hardware components are required, the HLS tools of the framework |
---|
45 | generate them automatically. At this stage the framework provides various HLS tools allowing the |
---|
46 | micro-architectural space design exploration. The exploration criteria are also throughput, latency |
---|
47 | and power consumption. |
---|
48 | % FIXME |
---|
49 | %CA At this stage, preliminary source-level transformations will be |
---|
50 | %CA required to improve the efficiency of the target component. |
---|
51 | %CA COACH will also provide such facilities, such as automatic parallelization |
---|
52 | %CA and memory optimisation. |
---|
53 | \item |
---|
54 | Performance measurement: For each point of design space exploration, metrics of criteria are available |
---|
55 | such as throughput, latency, power consumption, area, memory allocation and data locality. |
---|
56 | They are evaluated using virtual prototyping, estimation or analysing methodologies. |
---|
57 | \item |
---|
58 | Targeted hardware technology: The COACH description of system is independent of the FPGA family. |
---|
59 | Every point of the design exploration space can be implemented on any FPGA having the required resources. |
---|
60 | Basically, COACH handles both Altera and Xilinx FPGA families. |
---|
61 | \end{itemize} |
---|
62 | As an extension of embedded system design, COACH deals also with High Performance Computing (HPC). |
---|
63 | In HPC, the kind of targeted application is an existing one running on PC. COACH helps designer |
---|
64 | to accelerate it by migrating critical parts into a SoC implemented on a FPGA plugged to the PC bus. |
---|
65 | \par |
---|
66 | COACH is the result of the will of several laboratory to unify their know how and skills in the |
---|
67 | following domains: Operating system and hardware communication (TIMA, SITI), SoC and MPSoC (LIP6 and TIMA), |
---|
68 | ASIP (IRISA) and HLS (LIP6, Lab-STIC and LIP). The project objective is to integrate these various |
---|
69 | domains into a unique free framework (licence ...) masking as much as possible these domains and its |
---|
70 | different tools to the user. |
---|
71 | |
---|
72 | |
---|
73 | \subsection{Economical context and interest} |
---|
74 | \hspace{2cm}\begin{scriptsize}\begin{verbatim} |
---|
75 | % 1.1. CONTEXTE ET ENJEUX ECONOMIQUES ET SOCIETAUX |
---|
76 | % (2 pages maximum) |
---|
77 | % Dï¿œcrire le contexte ï¿œconomique, social, rï¿œglementaire. dans lequel se situe |
---|
78 | % le projet en prï¿œsentant une analyse des enjeux sociaux, ï¿œconomiques, environnementaux, |
---|
79 | % industriels. Donner si possible des arguments chiffrï¿œs, par exemple, pertinence et |
---|
80 | % portᅵe du projet par rapport ᅵ la demande ᅵconomique (analyse du marchᅵ, analyse des |
---|
81 | % tendances), analyse de la concurrence, indicateurs de rï¿œduction de coï¿œts, perspectives |
---|
82 | % de marchï¿œs (champs d'application, .). Indicateurs des gains environnementaux, cycle |
---|
83 | % de vie. |
---|
84 | \end{verbatim} |
---|
85 | \end{scriptsize} |
---|
86 | Microelectronic allows to integrate complicated functions into products, to increase their |
---|
87 | commercial attractivity and to improve their competitivity. Multimedia and communication |
---|
88 | sectors have taken advantage from microelectronics facilities thanks to developpment of |
---|
89 | design methodologies and tools for real time embedded systems. Many other sectors could |
---|
90 | benefit from microelectronics if these methologies and tools are adapted to their features. |
---|
91 | The Non Recurring Engineering (NRE) costs involded in designing and manufacturing an ASIC is |
---|
92 | very high. It costs several milliars of euros for IC factory and several millions to fabricate |
---|
93 | a specific circuit for example a conservative estimate for a 65nm ASIC project is 10 million USD. |
---|
94 | Consequently, it is generally unfeasible to design and fabricate ASICs in |
---|
95 | low volumes and ICs are designed to cover a broad applications spectrum at the cost of |
---|
96 | performance degradation. |
---|
97 | \\ |
---|
98 | Today, FPGAs become important actors in the computational domain that was originally dominated |
---|
99 | by microprocessors and ASICs. Just like microprocessors FPGA based systems can be reprogrammed |
---|
100 | on a per-application basis. At the same time, FPGAs offer significant performance benefits over |
---|
101 | microprocessors implementation for a number of applications. Although these benefits are still |
---|
102 | generally an order of magnitude less than equivalent ASIC implementations, low costs |
---|
103 | (500 euros to 10K euros), fast time to market and flexibility of FPGAs make them an attractive |
---|
104 | choice for low-to-medium volume applications. |
---|
105 | Since their introduction in the mid eighties, FPGAs evolved from a simple, |
---|
106 | low-capacity gate array technology to devices (Altera STRATIX III, Xilinx Virtex V) that |
---|
107 | provide a mix of coarse-grained data path units, memory blocks, microprocessor cores, |
---|
108 | on chip A/D conversion, and gate counts by millions. This high logic capacity allows to implement |
---|
109 | complex systems like multi-processors platform with application dedicated coprocessors. |
---|
110 | Table~\ref{fpga_market} shows the estimation of FPGA worldwide market in the next years covering |
---|
111 | various application domains. The ``high end'' lines concern only FPGA with high logic capacity able |
---|
112 | to implement complex systems. |
---|
113 | This market is in significant expansion and is estimated to 914\,M\$ in 2012. |
---|
114 | Using FPGA limits the NRE costs to design cost. This boosts the developpment of methodologies |
---|
115 | and tools to automize design and reduce its cost. |
---|
116 | \begin{table}\leavevmode\center |
---|
117 | \begin{tabular}{|l|l|l|l|}\hline |
---|
118 | Segment & 2010 & 2011 & 2012 \\\hline\hline |
---|
119 | Communications & 1,867 & 1,946 & 2,096 \\ |
---|
120 | High end & 467 & 511 & 550 \\\hline |
---|
121 | Consumer & 550 & 592 & 672 \\ |
---|
122 | High end & 53 & 62 & 75 \\\hline |
---|
123 | Automotive & 243 & 286 & 358 \\ |
---|
124 | High end & - & - & - \\\hline |
---|
125 | Industrial & 1,102 & 1,228 & 1,406 \\ |
---|
126 | High end & 177 & 188 & 207 \\\hline |
---|
127 | Military/Aereo & 566 & 636 & 717 \\ |
---|
128 | High end & 56 & 65 & 82 \\\hline\hline |
---|
129 | Total FPGA/PLD & 4,659 & 5,015 & 5,583 \\ |
---|
130 | Total High-End FPGA & 753 & 826 & 914 \\\hline |
---|
131 | \end{tabular} |
---|
132 | \caption{\label{fga_market} Gartner estimation of worldwide FPGA/PLD consumption (Millions \$)} |
---|
133 | \end{table} |
---|
134 | \par |
---|
135 | Today, several companies (atipa, blue-arc, Bull, Chelsio, Convey, CRAY, DataDirect, DELL, hp, |
---|
136 | Wild Systems, IBM, Intel, Microsoft, Myricom, NEC, nvidia etc) are making systems where demand |
---|
137 | for very high performance (HPC) primes over other requirements. They tend to use the highest |
---|
138 | performing devices like Multi-core CPUs, GPUs, large FPGAs, custom ICs and the most innovative |
---|
139 | architectures and algorithms. Companies show up in different "traditional" applications and market |
---|
140 | segments like computing clusters (ad-hoc), servers and storage, networking and Telecom, ASIC |
---|
141 | emulation and prototyping, Mil/aero etc. HPC market size is estimated today by FPGA providers |
---|
142 | to 214\,M\$. |
---|
143 | This market is dominated by Multi-core CPUs and GPUs based solutions and the expansion |
---|
144 | of FPGA-based solutions is limited by the flow automation. Nowadays, there are neither commercial |
---|
145 | nor free tools covering the whole design process. |
---|
146 | For instance, with SOPC Builder from Altera, users can select and parameterize IP components |
---|
147 | from an extensive drop-down list of communication, digital signal processor (DSP), microprocessor |
---|
148 | and bus interface cores, as well as incorporate their own IP. Designers can then generate |
---|
149 | a synthesized netlist, simulation test bench and custom software library that reflect the hardware |
---|
150 | configuration. |
---|
151 | Nevertheless, SOPC Builder does not provide any facilities to synthesize coprocessors\emph{I |
---|
152 | (Steven) disagree : the C2H compiler bundled with SOPCBuilder does a pretty good job at this} and to |
---|
153 | simulate the platform at a high design level (system C). |
---|
154 | In addition, SOPC Builder is proprietary and only works together with Altera's Quartus compilation |
---|
155 | tool to implement designs on Altera devices (Stratix, Arria, Cyclone). |
---|
156 | PICO [CITATION] and CATAPULT [CITATION] allow to synthesize coprocessors from a C++ description. |
---|
157 | Nevertheless, they can only deal with data dominated applications and they do not handle the |
---|
158 | platform level. |
---|
159 | The Xilinx System Generator for DSP [http://www.xilinx.com/tools/sysgen.htm] is a plug-in to |
---|
160 | Simulink that enables designers to develop high-performance DSP systems for Xilinx FPGAs. |
---|
161 | Designers can design and simulate a system using MATLAB and Simulink. The tool will then |
---|
162 | automatically generate synthesizable Hardware Description Language (HDL) code mapped to Xilinx |
---|
163 | pre-optimized algorithms. |
---|
164 | However, this tool targets only DSP based algorithms. |
---|
165 | \\ |
---|
166 | Consequently, designers developping an embedded system needs to master for example |
---|
167 | SoCLib for design exploration, |
---|
168 | SOPC Builde at the platform level, |
---|
169 | PICO for synthesizing the data dominated coprocessors |
---|
170 | and Quartus for design implementation. |
---|
171 | This requires an important tools interfacing effort and makes the design process very complex |
---|
172 | and achievable only by designers skilled in many domains. |
---|
173 | COACH project integrates all these tools in the same framework masking them to the user. |
---|
174 | The objective is to allow \textbf{pure software} developpers to realize embedded systems. |
---|
175 | \par |
---|
176 | The combination of the framework dedicated to software developpers and FPGA target, allows to gain |
---|
177 | market share over Multi-core CPUs and GPUs HPC based solutions. |
---|
178 | Moreover, one can expect that small and even very small companies will be able to propose embedded |
---|
179 | system and accelerating solutions for standard software applications with acceptable prices, thanks |
---|
180 | to the elimination of huge hardware investment in opposite to ASIC based solution. |
---|
181 | \\ |
---|
182 | This new market may explose like it was done by micro-computing in eighties. This success were due |
---|
183 | to the low cost of first micro-computers (compared to main frame) and the advent of high level |
---|
184 | programming languages that allow a high number of programmers to launch start-ups in software |
---|
185 | engineering. |
---|
186 | |
---|
187 | \subsection{Project position} |
---|
188 | \hspace{2cm}\begin{scriptsize}\begin{verbatim} |
---|
189 | % 1.2. POSITIONNEMENT DU PROJET |
---|
190 | % (2 pages maximum) |
---|
191 | % Prï¿œciser : |
---|
192 | % - positionnement du projet par rapport au contexte dᅵveloppᅵ prᅵcᅵdemment : |
---|
193 | % vis- ᅵ-vis des projets et recherches concurrents, complᅵmentaires ou antᅵrieurs, |
---|
194 | % des brevets et standards. |
---|
195 | % - positionnement du projet par rapport aux axes thᅵmatiques de l'appel ᅵ projets. |
---|
196 | % - positionnement du projet aux niveaux europï¿œen et international. |
---|
197 | \end{verbatim} |
---|
198 | \end{scriptsize} |
---|
199 | The aim of this project is to propose an open-source framework for architecture synthesis |
---|
200 | targeting mainly field programmable gate array circuits (FPGA). |
---|
201 | \\% LIP6/TIMA |
---|
202 | To evaluate the different architectures, the project uses the prototyping platform |
---|
203 | of the SoCLIB ANR project (2006-2009). |
---|
204 | \\% IRISA |
---|
205 | The project will also borrow from the ROMA ANR project (2007-2009) and the ongoing |
---|
206 | joint INRIA-STMicro Nano2012 project. In particular we will adapt existing pattern |
---|
207 | extraction algorithms and datapath merging techniques to the synthesis of customized |
---|
208 | ASIP processors. |
---|
209 | \\ |
---|
210 | \textcolor{gris75}{Steven : Je propose de rajouter un lien avec le projet BioWic~:~on the HPC |
---|
211 | application side, we also hope to benefit from the experience in hardware acceleration of |
---|
212 | bioinformatic algorithms/workfows gathered by the CAIRN group in the context of the ANR |
---|
213 | BioWic project (2009-2011), so as to be able to validate the framework on |
---|
214 | real-life HPC applications.} |
---|
215 | |
---|
216 | \par |
---|
217 | %%% 1 -- POUVEZ VOUS CHACUN AJOUTER SVP (SI POSSIBLE) UNE LIGNE |
---|
218 | %%% 1 -- REFERANT UN PROJET ANR OU EUROPEEN |
---|
219 | %%% 1 -- Projets europï¿œens ou ANR rï¿œutilisï¿œs ou continuï¿œs |
---|
220 | %%% 1 LIP6/TIMA/LAB-STIC OK |
---|
221 | Regarding the expertise in High Level Synthesis (HLS), the project leverages on know-how acquired over 15 years |
---|
222 | with GAUT project developped in Lab-STIC laboratory and UGH project developped in LIP6 |
---|
223 | and TIMA laboratories. \\ |
---|
224 | Regarding architecture synthesis skills, the project is based on a know-how acquired over 10 years |
---|
225 | with the COSY European project (1998-2000) and the DISYDENT project developped in LIP6. \\ |
---|
226 | %%% 1 IRISA OK |
---|
227 | Regarding Application Specific Instruction Processor (ASIP) design, the CAIRN group at INRIA Bretagne |
---|
228 | Atlantique benefits from several years of expertise in the domain of retargetable compiler (Armor/Calife |
---|
229 | since 1996, and the Gecos compilers since 2002). |
---|
230 | |
---|
231 | |
---|
232 | % LIP FIXME:UN:PEU:LONG ET HORS:SUJET |
---|
233 | %CA% The source-level transformations required by the HLS tools will be |
---|
234 | %CA% designed in the {\em polyhedral model}, a general framework |
---|
235 | %CA% initiated by Paul Feautrier 20 years ago. The programs handled in |
---|
236 | %CA% the polyhedral model are such that loop iterators describe a |
---|
237 | %CA% polyhedron (hence the name). This includes most of the kernels used |
---|
238 | %CA% in embedded applications. This property allows to design precise |
---|
239 | %CA% analysis by means of integer programming techniques. |
---|
240 | %CA% %communaute active & internationale |
---|
241 | %CA% %transfert techno (Reservoir) |
---|
242 | %CA% The polyhedral community is very active, and the technological |
---|
243 | %CA% transfer has now started. Reservoir Labs inc., a company based in |
---|
244 | %CA% New-York, is currently integrating the last polyhedral developments |
---|
245 | %CA% in its commercial compiler. |
---|
246 | %CA% %transfert techno (gcc) |
---|
247 | %CA% Also, polyhedra are progressively migrating into the {\sc GNU Gcc} |
---|
248 | %CA% compiler, via {\sc Graphite}, a module initially developed by |
---|
249 | %CA% Sebastian Pop. |
---|
250 | %CA% %outils existants |
---|
251 | %CA% Several tools have been developed in the polyhedral community, |
---|
252 | %CA% such as {\sc Piplib} (parameter integer programming library), and |
---|
253 | %CA% {\sc Polylib}, a library providing set operations on polyhedra. Both |
---|
254 | %CA% tools are almost mandatory in polyhedral tools, and have reached |
---|
255 | %CA% a sufficient level of maturity to be considered as standard. |
---|
256 | %syntol & bee ??? |
---|
257 | % FIN |
---|
258 | % and on more than 15 years of experience on parallel hardware generation |
---|
259 | % in the polyedral model in the CAIRN group (MMAlpha software |
---|
260 | % developped in the group since 1996). |
---|
261 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
---|
262 | %%% 2 -- A COMPLETER (COURT) |
---|
263 | %%% 2 -- For polyedric transformation and memory optimization ... LIP |
---|
264 | %%% 2 -- For ASIP IRISA |
---|
265 | %%% 2 -- For ... CITI |
---|
266 | %%% 2 -- For ... TIMA |
---|
267 | \par |
---|
268 | The SoCLIB ANR platform were developped by 11 laboratories and 6 companies. It allows to |
---|
269 | describe hardware architectures with shared memory space and to deploy software |
---|
270 | applications on them to evaluate their performance. |
---|
271 | The heart of this platform is a library containing simulation models (in SystemC) |
---|
272 | of hardware IP cores such as processors, buses, networks, memories, IO controller. |
---|
273 | The platform provides also embedded operating systems and software/hardware |
---|
274 | communication components useful to implement applications quickly. |
---|
275 | However, the synthesisable description of IPs have to be provided by users. \\ |
---|
276 | This project enhances SoCLib by providing synthesisable VHDL of standard IPs. |
---|
277 | In addition, HLS tools such as UGH and GAUT allow to get automatically a synthesisable |
---|
278 | description of an IP (coprocessor) from a sequential algorithm. |
---|
279 | %\par |
---|
280 | %%% 2 IRISA ? |
---|
281 | %%% 2 ASIP tool such as ... IRISA |
---|
282 | %%% 2 ... |
---|
283 | %%% 2 Coach uses pattern extractions from ROMA |
---|
284 | %\par |
---|
285 | %%% 2 LIP ? |
---|
286 | \par |
---|
287 | The different points proposed in this project cover priorities defined by the commission |
---|
288 | experts in the field of Information Technolgies Society (IST) for Embedded |
---|
289 | systems: <<Concepts, methods and tools for designing systems dealing with systems complexity |
---|
290 | and allowing to apply efficiently applications and various products on embedded platforms, |
---|
291 | considering resources constraints (delais, power, memory, etc.), security and quality |
---|
292 | services>>. |
---|
293 | \\ |
---|
294 | Our team aims at covering all the steps of the design flow of architecture synthesis. |
---|
295 | Our project overcomes the complexity of using various synthesis tools and description |
---|
296 | languages required today to design architectures. |
---|
297 | |
---|
298 | \section{Scientific and Technical Description} |
---|
299 | \subsection{State of the art} |
---|
300 | \hspace{2cm}\begin{scriptsize}\begin{verbatim} |
---|
301 | % 2. DESCRIPTION SCIENTIFIQUE ET TECHNIQUE |
---|
302 | % 2.1. ï¿œTAT DE L'ART |
---|
303 | % (3 pages maximum) |
---|
304 | % Dï¿œcrire le contexte et les enjeux scientifiques dans lequel se situe le projet |
---|
305 | % en prï¿œsentant un ï¿œtat de l'art national et international dressant l'ï¿œtat des |
---|
306 | % connaissances sur le sujet. Faire apparaï¿œtre d'ï¿œventuels rï¿œsultats prï¿œliminaires. |
---|
307 | % Inclure les rï¿œfï¿œrences bibliographiques nï¿œcessaires en annexe 7.1. |
---|
308 | \end{verbatim} |
---|
309 | \end{scriptsize} |
---|
310 | Our project covers several critical domains in system design in order |
---|
311 | to achieve high performance computing. Starting from a high level description we aim |
---|
312 | at generating automatically both hardware and software components of the system. |
---|
313 | |
---|
314 | \subsubsection{High Performance Computing} |
---|
315 | Accelerating high-performance computing (HPC) applications with field-programmable |
---|
316 | gate arrays (FPGAs) can potentially improve performance. |
---|
317 | However, using FPGAs presents significant challenges [1]. |
---|
318 | First, the operating frequency of an FPGA is low compared to a high-end microprocessor. |
---|
319 | Second, based on Amdahl law, HPC/FPGA application performance is unusually sensitive |
---|
320 | to the implementation quality [2]. |
---|
321 | Finally, High-performance computing programmers are a highly sophisticated but scarce |
---|
322 | resource. Such programmers are expected to readily use new technology but lack the time |
---|
323 | to learn a completely new skill such as logic design [3]. |
---|
324 | \\ |
---|
325 | HPC/FPGA hardware is only now emerging and in early commercial stages, |
---|
326 | but these techniques have not yet caught up. |
---|
327 | Thus, much effort is required to develop design tools that translate high level |
---|
328 | language programs to FPGA configurations. |
---|
329 | |
---|
330 | \hspace{2cm}\begin{scriptsize}\begin{verbatim} |
---|
331 | [1] M.B. Gokhale et al., Promises and Pitfalls of Reconfigurable |
---|
332 | Supercomputing, Proc. 2006 Conf. Eng. of Reconfigurable |
---|
333 | Systems and Algorithms, CSREA Press, 2006, pp. 11-20; |
---|
334 | http://nis-www.lanl.gov/~maya/papers/ersa06_gokhale_paper. |
---|
335 | pdf. |
---|
336 | [2] D. Buell, Programming Reconfigurable Computers: Language |
---|
337 | Lessons Learned, keynote address, Reconfigurable Systems |
---|
338 | Summer Institute 2006, 12 July 2006; http://gladiator. |
---|
339 | ncsa.uiuc.edu/PDFs/rssi06/presentations/00_Duncan_Buell.pdf |
---|
340 | [3] T. Van Court et al., Achieving High Performance |
---|
341 | with FPGA-Based Computing, Computer, vol. 40, no. 3, |
---|
342 | pp. 50-57, Mar. 2007, doi:10.1109/MC.2007.79 |
---|
343 | \end{verbatim} |
---|
344 | \end{scriptsize} |
---|
345 | |
---|
346 | \subsubsection{System Synthesis} |
---|
347 | Today, several solutions for system design are proposed and commercialized. The most common are |
---|
348 | those provided by Altera and Xilinx to promote their FPGA devices. |
---|
349 | \\ |
---|
350 | The Xilinx System Generator for DSP [http://www.xilinx.com/tools/sysgen.htm] is a plug-in to |
---|
351 | Simulink that enables designers to develop high-performance DSP systems for Xilinx FPGAs. |
---|
352 | Designers can design and simulate a system using MATLAB and Simulink. The tool will then |
---|
353 | automatically generate synthesizable Hardware Description Language (HDL) code mapped to Xilinx |
---|
354 | pre-optimized algorithms. |
---|
355 | However, this tool targets only DSP based algorithms, Xilinx FPGAs and cannot handle complete |
---|
356 | SoC. Thus, it is not really a system synthesis tool. |
---|
357 | \\ |
---|
358 | In the opposite, SOPC Builder [CITATION] allows to describe a system, to synthesis it, |
---|
359 | to programm it into a target FPGA and to upload a software application. |
---|
360 | % FIXME(C2H from Altera, marche vite mais ressource monstrueuse) |
---|
361 | Nevertheless, SOPC Builder does not provide any facilities to synthesize coprocessors. |
---|
362 | Users have to provide the synthesizable description with the feasible bus interface. |
---|
363 | \\ |
---|
364 | In addition, Xilinx System Generator and SOPC are closed world since each one imposes |
---|
365 | their own IPs which are not interchangeable. |
---|
366 | We can conclude that the existing commercial or free tools does not coverthe whole system |
---|
367 | synthesis process in a full automatic way. Moreover, they are bound to a particular device family |
---|
368 | and to IPs library. |
---|
369 | |
---|
370 | \subsubsection{High Level Synthesis} |
---|
371 | High Level Synthesis translates a sequential algorithmic description and a constraints set |
---|
372 | (area, power, frequency, ...) to a micro-architecture at Register Transfer Level (RTL). |
---|
373 | Several academic and commercial tools are today available. |
---|
374 | Most common tools are SPARK [HLS1], GAUT [HLS2], UGH [HLS3] in the academic world |
---|
375 | and catapultC [HLS4], PICO [HLS5] and Cynthesizer [HLS6] in commercial world. |
---|
376 | Despite their maturity, their usage is restrained by: |
---|
377 | \begin{itemize} |
---|
378 | \item They do not respect accurately the frequency constraint when they target an FPGA device. |
---|
379 | Their error is about 10 percent. This is annoying when the generated component is integrated |
---|
380 | in a SoC since it will slow down the hole system. |
---|
381 | \item These tools take into account only one or few constraints simultaneously while realistic |
---|
382 | designs are multi-constrained. |
---|
383 | Moreover, low power consumption constraint is mandatory for embedded systems. |
---|
384 | However, it is not yet well handled by common synthesis tools. |
---|
385 | \item The parallelism is extracted from initial algorithm. To get more parallelism or to reduce |
---|
386 | the amout of required memory, the user must re-write it while there is techniques as polyedric |
---|
387 | transformations to increase the intrinsec parallelism. |
---|
388 | \item Despite they have the same input language (C/C++), they are sensitive to the style in |
---|
389 | which the algorithm is written. Consequently, engineering work is required to swap from |
---|
390 | a tool to another. |
---|
391 | \item The HLS tools are not integrated into an architecture and system exploration tool. |
---|
392 | Thus, a designer who needs to accelerate a software part of the system, must adapt it manually |
---|
393 | to the HLS input dialect and performs engineering work to exploit the synthesis result |
---|
394 | at the system level. |
---|
395 | \end{itemize} |
---|
396 | Regarding these limitations, it is necessary to create a new tool generation reducing the gap |
---|
397 | between the specification of an heterogenous system and its hardware implementation. |
---|
398 | |
---|
399 | \hspace{2cm}\begin{scriptsize}\begin{verbatim} |
---|
400 | [HLS1] SPARK universite de californie San Diego |
---|
401 | [HLS2] GAUT UBS/Lab-STIC |
---|
402 | [HLS3] UGH |
---|
403 | [HLS4] catapultC Mentor |
---|
404 | [HLS5] PICO synfora |
---|
405 | [HLS6] Cynthesizer Forte design system |
---|
406 | \end{verbatim} |
---|
407 | \end{scriptsize} |
---|
408 | |
---|
409 | \subsubsection{Application Specific Instruction Processors} |
---|
410 | |
---|
411 | ASIP (Application-Specific Instruction-Set Processor) are programmable processors in |
---|
412 | which both the instruction and the micro architecture have been tailored to a given |
---|
413 | application domain (eg. video processing), or to a specific application. |
---|
414 | This specialization usually offers a good compromise between performance (w.r.t a pure software |
---|
415 | implementation on an embeded CPU) and flexibility (w.r.t an application specific |
---|
416 | hardware co-processor). |
---|
417 | In spite of their obvious advantages, using/designing ASIPs remains a difficult |
---|
418 | task, since it involves designing both a micro-architecture and a compiler for this |
---|
419 | architecture. Besides, to our knowledge, there is still no available open-source |
---|
420 | design flow\footnote{There are commercial tools such a } for ASIP design even if such a tool would |
---|
421 | be valuable in the context of a System Level design exploration tool. |
---|
422 | |
---|
423 | In this context, ASIP design based on Instruction Set Extensions (ISEs) has |
---|
424 | received a lot of interest [NIOSII,TENSILICA]%~\cite{NIOS2,ST70}, |
---|
425 | as it makes micro architecture synthesis |
---|
426 | more tractable \footnote{ISEs rely on a template micro-architecture in which |
---|
427 | only a small fraction of the architecture has to be specialized}, and help ASIP |
---|
428 | designers to focus on compilers, for which there are still many open problems |
---|
429 | [CODES04,FPGA08]. |
---|
430 | This approach however has a strong weakness, since it also significantly reduces |
---|
431 | opportunities for achieving good seedups (most speedup remain between 1.5x and |
---|
432 | 2.5x), since ISEs performance is generally tied down by I/O constraints as |
---|
433 | they generally rely on the main CPU register file to access data. |
---|
434 | |
---|
435 | % ( |
---|
436 | %automaticcaly extraction ISE candidates for application code \cite{CODES04}, |
---|
437 | %performing efficient instruction selection and/or storage resource (register) |
---|
438 | %allocation \cite{FPGA08}). |
---|
439 | |
---|
440 | |
---|
441 | To cope with this issue, recent approaches~[DAC09,DAC08]%\cite{DAC09,DAC08} |
---|
442 | advocate the use of |
---|
443 | micro-architectural ISE models in which the coupling between the processor micro-architecture |
---|
444 | and the ISE component is thightened up so as to allow the ISE to overcome the register |
---|
445 | I/O limitations, however these approaches tackle the problem for a compiler/simulation |
---|
446 | point of view and not address the problem of generating synthesizable representations for |
---|
447 | these models. |
---|
448 | |
---|
449 | We therefore strongly believe that there is a need for an open-framework which |
---|
450 | would allow researchers and system designers to : |
---|
451 | \begin{itemize} |
---|
452 | \item Explore the various level of interactions between the original CPU micro-architecure |
---|
453 | and its extension (for example throught a Domain Specific Language targeted at micro-architecture |
---|
454 | specification and synthesis). |
---|
455 | \item Retarget the compiler instruction-selection (or prototype nex passes) passes so as |
---|
456 | to be able to take advantage of this ISEs. |
---|
457 | \item Provide a complete System-level Integration for using ASIP as SoC building blocks |
---|
458 | (integration with application specific blocks, MPSoc, etc.) |
---|
459 | \end{itemize} |
---|
460 | |
---|
461 | \hspace{2cm} |
---|
462 | \begin{scriptsize}\begin{verbatim} |
---|
463 | |
---|
464 | [CODES08] Theo Kluter, Philip Brisk, Paolo Ienne, and Edoardo Charbon, Speculative DMA for |
---|
465 | Architecturally Visible Storage in Instruction Set Extensions |
---|
466 | |
---|
467 | [DAC09] Theo Kluter, Philip Brisk, Paolo Ienne, Edoardo Charbon, Way Stealing: Cache-assisted |
---|
468 | Automatic Instruction Set Extensions. |
---|
469 | |
---|
470 | [CODES04] Pan Yu, Tulika Mitra, Scalable Custom Instructions Identification for |
---|
471 | Instruction Set Extensible Processors. |
---|
472 | |
---|
473 | [FPGA08] Quang Dinh, Deming Chen, Martin D. F. Wong, Efficient ASIP Design for Configurable |
---|
474 | Processors with Fine-Grained Resource Sharing. |
---|
475 | |
---|
476 | [NIOSII] Nios II Custom Instruction User Guide |
---|
477 | |
---|
478 | \end{verbatim} |
---|
479 | |
---|
480 | \end{scriptsize} |
---|
481 | %, either |
---|
482 | %because the target architecture is proprietary, or because the compiler |
---|
483 | %technology is closed/commercial. |
---|
484 | |
---|
485 | |
---|
486 | |
---|
487 | |
---|
488 | % We propose to explore how to tighten the coupling of the extensions and |
---|
489 | % the underlyoing template micro-architecture. |
---|
490 | % * Thightne Even if such |
---|
491 | % an approach offers less flexiblity and forbids very tight coupling |
---|
492 | % between the extensions and the template micro-architecture, it makes the |
---|
493 | % design of the micro-architecture more tractable and amenable to a fully |
---|
494 | % automated flow. |
---|
495 | % \\ |
---|
496 | % \\ |
---|
497 | % In the context of the COACH project, we propose to add to the |
---|
498 | % infra-structure a design flow targeted to automatic instruction set |
---|
499 | % extension for the MIPS-based CPU, which will come as a complement or an |
---|
500 | % alternative to the other proposed approaches (hardware accelerator, |
---|
501 | % multi processors). |
---|
502 | % |
---|
503 | |
---|
504 | \subsubsection{Automatic Parallelization} |
---|
505 | \begin{Large}\begin{verbatim} |
---|
506 | -- A COMPLETER LIP |
---|
507 | \end{verbatim} |
---|
508 | \end{Large} |
---|
509 | %CA% Parallel machines are often difficult and painful to program |
---|
510 | %CA% directly, and one would like the compiler to %do the job, that is to |
---|
511 | %CA% turn automatically a sequential program into a parallel form. This |
---|
512 | %CA% transformation is referred as {\em automatic parallelization}, and has |
---|
513 | %CA% been widely addressed since the 70s. Automatic parallelization |
---|
514 | %CA% relies on data dependences, which cannot be computed in general.%, as |
---|
515 | %CA% %one cannot predict at compile time the variable values on a given |
---|
516 | %CA% %execution point. |
---|
517 | %CA% This negative result led researchers to (i) find a |
---|
518 | %CA% program model in which no approximation is needed (ie polyhedral |
---|
519 | %CA% model), (ii) make conservative approximations (iii) remark that |
---|
520 | %CA% variable values are known at runtime, and make the decisions during |
---|
521 | %CA% program execution. The latter approach is obviously not suitable |
---|
522 | %CA% there, as we target hardware generation. We will give there a short |
---|
523 | %CA% history of the approaches that fall in the first category. |
---|
524 | %CA% |
---|
525 | %CA%% In the real world, we deal with a limited amount of processors, |
---|
526 | %CA%% and the communication between processors takes time, and is |
---|
527 | %CA%% critical for performance. %Whenever we have synchronisation-free |
---|
528 | %CA%% parallelism, like for embarrassingly parallel kernels, this is not an |
---|
529 | %CA%% issue. But in case of pipelined parallelism, we need to reduce |
---|
530 | %CA%% communications as much as possible. |
---|
531 | %CA%% So we also need to find parallelism toghether with a proper mapping |
---|
532 | %CA%% of operations and data on physical processors. |
---|
533 | %CA% |
---|
534 | %CA% As programs spend most of there time in loops, the community has |
---|
535 | %CA% focused on loop transformations that reveal parallelism. |
---|
536 | %CA%%unimodulaire |
---|
537 | %CA% The first approaches worked on perfect loop nests, where the tree |
---|
538 | %CA% formed by the nested loops is linear. In this program model, the |
---|
539 | %CA% loops can be seen as a basis that drive the way the iteration |
---|
540 | %CA% domain will be described. Hence, a first idea was to change this |
---|
541 | %CA% basis such that one vector (one loop) at least is parallel. To ease |
---|
542 | %CA% the code generation, the area of defined by the news vectors must |
---|
543 | %CA% be a unit volume. %Otherwise, one would produce an homothetic |
---|
544 | %CA%% expansion of the iteration domain, which will force to put modulos |
---|
545 | %CA%% in the target code. |
---|
546 | %CA% For this reason, these transformations are called {\em unimodular |
---|
547 | %CA% transformations}. |
---|
548 | %CA%%tiling |
---|
549 | %CA% |
---|
550 | %CA% The next approaches include {\em loop tiling}, a simple |
---|
551 | %CA% partitioning of the iteration domain, whose initial purpose is to |
---|
552 | %CA% execute every partition on a different processor. %In the same way, |
---|
553 | %CA% The execution order is modified with a proper unimodular |
---|
554 | %CA% transformation, then the tiles are obtained by cutting the |
---|
555 | %CA% iteration domain with the hyperplanes directed by every vector of |
---|
556 | %CA% the new (unimodular) basis, at regular intervals. When the tiling |
---|
557 | %CA% hyperplanes are properly chosen, we can both improve data-locality |
---|
558 | %CA% on every processor, and reduce the communication between two |
---|
559 | %CA% different tiles (which will be mapped on processors). This last |
---|
560 | %CA% property implying that one tend to find a degree of parallelism as |
---|
561 | %CA% great as possible. |
---|
562 | %CA% |
---|
563 | %CA%%affine scheduling |
---|
564 | %CA% The previous approaches were restricted to kernels with perfect |
---|
565 | %CA% loop nests (linear loop tree), and unimodular transformations. The |
---|
566 | %CA% last generation of approaches broke with these limitations. We now |
---|
567 | %CA% choose a different basis for every assignment, without the |
---|
568 | %CA% unimodularity restriction. A dual way to present the things is the |
---|
569 | %CA% notion of {\em affine schedule}, introduced by Feautrier [part1], |
---|
570 | %CA% that simply assigns an abstract execution date to every assignment |
---|
571 | %CA% execution. As an assignment execution is exactly characterised by |
---|
572 | %CA% the current value of the loops counters (iteration vector), the |
---|
573 | %CA% affine schedule will be defined as an affine form of the iteration |
---|
574 | %CA% vector (hence the 'affine'). The affine property allows to use |
---|
575 | %CA% integer programming techniques to compute the schedule. With this |
---|
576 | %CA% approach, additional techniques are required to allocate the |
---|
577 | %CA% parallel operations and the data to processor in an efficient way |
---|
578 | %CA% [griebl, feautrier]. |
---|
579 | %CA% |
---|
580 | %CA%%modularity?? |
---|
581 | %CA%%% As loop nests are no longer perfect, we deal with (transformed) |
---|
582 | %CA%%% iteration domains of different dimensions, which can possibly (and |
---|
583 | %CA%%% certainly) overlap. At this point, a new code generation technique |
---|
584 | %CA%%% was needed. The first attempt is due to Chamsky et al. [??], and |
---|
585 | %CA%%% was improved by Quillere et al. [QRW]. The code is now implemented |
---|
586 | %CA%%% in an efficient tool [cloog], that gave a new life to polyhedral |
---|
587 | %CA%%% techniques. |
---|
588 | %CA% |
---|
589 | %CA%%pluto's tiling |
---|
590 | %CA% The tiling techniques were extended to non-perfect loop nest with |
---|
591 | %CA% {\em affine partitioning}. Affine partitioning is to affine |
---|
592 | %CA% scheduling what (original) tiling was to unimodular |
---|
593 | %CA% transformations. An affine partitioning assigns to every assignment |
---|
594 | %CA% its coordinates in the basis defined by the normals to the tiling |
---|
595 | %CA% hyperplanes. Recently, a way to compute efficient hyperplanes were |
---|
596 | %CA% found [uday], with a good data locality, and communications |
---|
597 | %CA% confined in a small neighborhood around every processor. |
---|
598 | %CA% |
---|
599 | %CA%\subsubsection{Source-level Memory Optimisation} |
---|
600 | %CA% The HLS process allows to customise memory, which impacts on final |
---|
601 | %CA% circuit size and power consumption. Though most HLS tools already |
---|
602 | %CA% try to optimise memory usage, it is better to provide an independent |
---|
603 | %CA% source-level pass, that could be reused for different tools and in |
---|
604 | %CA% other contexts. |
---|
605 | %CA% |
---|
606 | %CA% There exists many approaches to evaluate and reduce the memory |
---|
607 | %CA% requirement of a program. The first approaches are concerned with |
---|
608 | %CA% {\em memory size estimation}, which can be defined as the maximum |
---|
609 | %CA% number of memory cells used at the same time [clauss,zhao]. These |
---|
610 | %CA% approaches provide an estimation as a symbolic expression of program |
---|
611 | %CA% parameters, which can be used further to guide loop optimisations. |
---|
612 | %CA% However, no explicit way to reduce the memory size is given. {\em |
---|
613 | %CA% Intra-array reuse} approaches brake with this limitation, and |
---|
614 | %CA% collapse the array cells which are not alive at the same time. The |
---|
615 | %CA% collapse is done by means of a data layout transformation, specified |
---|
616 | %CA% with a linear (modular) mapping. The first approaches were |
---|
617 | %CA% developed at IMEC [balasa,catthoor], and basically try to linearize |
---|
618 | %CA% the arrays and fold them using a modulo operator. Then, Lefebvre et |
---|
619 | %CA% al. propose a solution to fold independently the array dimensions |
---|
620 | %CA% [lefebvre]. Finally, Darte et al. provide a general formalisation of |
---|
621 | %CA% the problem, together with a solution that subsumes the previous |
---|
622 | %CA% approaches [darte]. A first implementation was made with the tool |
---|
623 | %CA% {\sc Bee}, but there are still many limitations. |
---|
624 | %CA% |
---|
625 | %CA% \begin{itemize} |
---|
626 | %CA% \item The tool is restricted to regular programs, whereas more |
---|
627 | %CA% general programs could be handled with a conservative array liveness |
---|
628 | %CA% analysis. |
---|
629 | %CA% |
---|
630 | %CA% \item Programs depending on parameters (inputs) are not handled, |
---|
631 | %CA% which forbids to handle, for example, the body of tiled loops. |
---|
632 | %CA% |
---|
633 | %CA% \item The new array layout can brake spatial locality, and then impact |
---|
634 | %CA% performance and power consumption. One would like to get a mapping |
---|
635 | %CA% that improve or, at least, preserve the spatial locality of the |
---|
636 | %CA% program. |
---|
637 | %CA% |
---|
638 | %CA% \item Finally, the final memory compaction strongly depends on the |
---|
639 | %CA% program schedule, and is naturally hindered by the |
---|
640 | %CA% parallelism. Consequently, there is a trade-off to find with |
---|
641 | %CA% automatic parallelization. An ideal solution would be to reduce |
---|
642 | %CA% memory usage, while preserving parallelism. |
---|
643 | %CA% \end{itemize} |
---|
644 | |
---|
645 | \subsubsection{Interfaces} |
---|
646 | \begin{Large}\begin{verbatim} |
---|
647 | -- A COMPLETER INSA Etat de l'art |
---|
648 | \end{verbatim} |
---|
649 | \end{Large} |
---|
650 | % |
---|
651 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
---|
652 | \subsection{Objectives and innovation aspects} |
---|
653 | \hspace{2cm}\begin{scriptsize}\begin{verbatim} |
---|
654 | % 2.2. OBJECTIFS ET CARACTERE AMBITIEUX/NOVATEUR DU PROJET |
---|
655 | % (2 pages maximum) |
---|
656 | % Dï¿œcrire les objectifs scientifiques/techniques du projet. |
---|
657 | % Prᅵsenter l'avancᅵe scientifique attendue. Prᅵciser l'originalitᅵ et le caractᅵre |
---|
658 | % ambitieux du projet. |
---|
659 | % Dᅵtailler les verrous scientifiques et techniques ᅵ lever par la rᅵalisation du projet. |
---|
660 | % Dᅵcrire ᅵventuellement le ou les produits finaux dᅵveloppᅵs ᅵ l'issue du projet |
---|
661 | % montrant le caractï¿œre innovant du projet. |
---|
662 | % Prï¿œsenter les rï¿œsultats escomptï¿œs en proposant si possible des critï¿œres de rï¿œussite |
---|
663 | % et d'ï¿œvaluation adaptï¿œs au type de projet, permettant d'ï¿œvaluer les rï¿œsultats en |
---|
664 | % fin de projet. |
---|
665 | % Le cas ᅵchᅵant (programmes exigeant la pluridisciplinaritᅵ), dᅵmontrer l'articulation |
---|
666 | % entre les disciplines scientifiques. |
---|
667 | \end{verbatim} |
---|
668 | \end{scriptsize} |
---|
669 | |
---|
670 | % les objectifs scientifiques/techniques du projet. |
---|
671 | The objectives of COACH project are to develop a complete framework to |
---|
672 | HPC (accelerating solutions for existing software applications) |
---|
673 | and embedded applications (implementing an application on a low power standalone device). |
---|
674 | The design steps are presented figure 1. |
---|
675 | \begin{figure}[hbtp]\leavevmode\center |
---|
676 | \includegraphics[width=.8\linewidth]{flow} |
---|
677 | \caption{\label{coach-flow} COACH flow.} |
---|
678 | \end{figure} |
---|
679 | \begin{description} |
---|
680 | \item[HPC setup] Here the user splits the application into 2 parts: the host application |
---|
681 | which remains on PC and the SoC application which migrates on SoC. |
---|
682 | The framework provides a simulation model allowing to evaluate the partitioning. |
---|
683 | \item[SoC design] In this phase, |
---|
684 | The user can obtain simulators at different abstraction levels of the SoC by giving to COACH framework |
---|
685 | a SoC description. |
---|
686 | This description consists of a process network corresponding to the SoC application, |
---|
687 | an OS, an instance of a generic hardware platform |
---|
688 | and a mapping of processes on the platform components. The supported mapping are |
---|
689 | software (the process runs on a SoC processor), |
---|
690 | XXXpeci (the process runs on a SoC processor enhanced with dedicated instructions), |
---|
691 | and hardware (the process runs into a coprocessor generated by HLS and plugged on the SoC bus). |
---|
692 | \item[Application compilation] Once SoC description is validated, COACH generates automatically |
---|
693 | an FPGA bitstream containing the hardware platform with SoC application software and |
---|
694 | an executable containing the host application. The user can launch the application by |
---|
695 | loading the bitstream on FPGA and running the executable on PC. |
---|
696 | \end{description} |
---|
697 | |
---|
698 | % l'avancee scientifique attendue. Preciser l'originalite et le caractere |
---|
699 | % ambitieux du projet. |
---|
700 | The main scientific contribution of the project is to unify various synthesis techniques |
---|
701 | (same input and output formats) allowing the user to swap without engineering effort |
---|
702 | from one to an other and even to chain them, for example, to run polyedric transformation |
---|
703 | before synthesis. |
---|
704 | Another advantage of this framework is to provide different abstraction levels from |
---|
705 | a single description. |
---|
706 | Finally, this description is device family independent and its hardware implementation |
---|
707 | is automatically generated. |
---|
708 | |
---|
709 | % Detailler les verrous scientifiques et techniques a lever par la realisation du projet. |
---|
710 | System design is a very complicated task and in this project we try to simplify it |
---|
711 | as much as possible. For this purpose we have to deal with the following scientific |
---|
712 | and technological barriers. |
---|
713 | \begin{itemize} |
---|
714 | \item The main problem in HPC is the communication between the PC and the SoC. |
---|
715 | This problem has 2 aspects. The first one is the efficiency. The second is to |
---|
716 | eliminate enginnering effort to implement it at different abstract levels. |
---|
717 | \item COACH design flow has a top-down approach. In the such case, |
---|
718 | the required performance of a coprocessor (run frequency, maximum cycles for |
---|
719 | a given computation, power consumption, etc) are imposed by the other system |
---|
720 | components. The challenge is to allow user to control accurately the synthesis |
---|
721 | process. For instance, the run frequency must not be a result of the RTL synthesis |
---|
722 | but a strict synthesis constraint. |
---|
723 | \item HLS tools are sensitive to the style in which the algorithm is written. |
---|
724 | In addition, they are are not integrated into an architecture and system |
---|
725 | exploration tool. |
---|
726 | Consequently, engineering work is required to swap from a tool to another, |
---|
727 | to integrate the resulting simulation model to an architectural exploration tool |
---|
728 | and to synthesize the generated RTL description. |
---|
729 | %CA Additionnal preprocessing, source-level transformations, are thus |
---|
730 | %CA required to improve the process. |
---|
731 | %CA Particularly, this includes parallelism exposure and efficient memory mapping. |
---|
732 | \item Most HLS tools translate a sequential algorithm into a coprocessor |
---|
733 | containing a single data-path and finite state machine (FSM). In this way, |
---|
734 | only the fine grained parallelism is exploited (ILP parallelism). |
---|
735 | The challenge is to identify the coarse grained parallelism and to generate, |
---|
736 | from a sequential algorithm, coprocessor containing multiple communicating |
---|
737 | tasks (data-paths and FSMs). |
---|
738 | \end{itemize} |
---|
739 | |
---|
740 | %Presenter les resultats escomptes en proposant si possible des criteres de reussite |
---|
741 | %et d'evaluation adaptes au type de projet, permettant d'evaluer les resultats en |
---|
742 | %fin de projet. |
---|
743 | The main result is the framework. It is composed concretely of: |
---|
744 | 2 HPC communication shemes with their implementation, |
---|
745 | 5 HLS tools (control dominated HLS, data dominated HLS, Coarse grained HLS, |
---|
746 | Memory optimisation HLS and ASIP), |
---|
747 | 3 systemC based virtual prototyping environment extended with synthesizable |
---|
748 | RTL IP cores (generic, ALTERA/NIOS/AVALON, XILINX/MICROBLAZE/OPB), |
---|
749 | one design space exploration tool, |
---|
750 | one operating system (OS). |
---|
751 | \\ |
---|
752 | The framework fonctionality will be demonstrated with XXX-EXAMPLE1, XXX-EXAMPLE2 |
---|
753 | and XXX-EXAMPLE3 on 4 archictures (generic/XILINX, generic/ALTERA, |
---|
754 | proprietary/XILINX, proprietary/ALTERA). |
---|
755 | |
---|
756 | %% \section{} |
---|
757 | %% %3. PROGRAMME SCIENTIFIQUE ET TECHNIQUE, ORGANISATION DU PROJET |
---|
758 | %% \subsection{} |
---|
759 | %% %3.1. PROGRAMME SCIENTIFIQUE ET STRUCTURATION DU PROJET |
---|
760 | %% %(2 pages maximum) |
---|
761 | %% %Prï¿œsentez le programme scientifique et justifiez la dï¿œcomposition en tï¿œches du |
---|
762 | %% %programme de travail en cohï¿œrence avec les objectifs poursuivis. |
---|
763 | %% %Utilisez un diagramme pour prï¿œsenter les liens entre les diffï¿œrentes tï¿œches |
---|
764 | %% %(organigramme technique) |
---|
765 | %% %Les tᅵches reprᅵsentent les grandes phases du projet. Elles sont en nombre limitᅵ. |
---|
766 | %% %N'oubliez pas les activitᅵs et actions correspondant ᅵ la dissᅵmination et ᅵ la |
---|
767 | %% %valorisation. |
---|
768 | %% |
---|
769 | %% %METTRE UNE FIGURE ICI DECRIVANT LES TACHES ET LEURS INTERACTION (AVEC LE FLOT |
---|
770 | %% %EN FILIGRANE ? ) |
---|
771 | %% \subsection{} |
---|
772 | %% %3.2. MANAGEMENT DU PROJET |
---|
773 | %% %(2 pages maximum) |
---|
774 | %% %Prï¿œciser les aspects organisationnels du projet et les modalitï¿œs de coordination |
---|
775 | %% %(si possible individualisation d'une tï¿œche coordination : cf. tï¿œche 0 du document |
---|
776 | %% %de soumission A). |
---|
777 | %% \subsection{} |
---|
778 | %% %3.3. DESCRIPTION DES TRAVAUX PAR TACHE |
---|
779 | %% %(idï¿œalement 1 ou 2 pages par tï¿œche) |
---|
780 | %% %Pour chaque tï¿œche, dï¿œcrire : |
---|
781 | %% %- les objectifs de la tï¿œche et ï¿œventuels indicateurs de succï¿œs, |
---|
782 | %% %- le responsable de la tᅵche et les partenaires impliquᅵs (possibilitᅵ de |
---|
783 | %% %l'indiquer sous forme graphique), |
---|
784 | %% %- le programme dᅵtaillᅵ des travaux par tᅵche, |
---|
785 | %% %- les livrables de la tï¿œche, |
---|
786 | %% %- les contributions des partenaires (le " qui fait quoi "), |
---|
787 | %% %- la description des mï¿œthodes et des choix techniques et de la maniï¿œre dont |
---|
788 | %% %les solutions seront apportï¿œes, |
---|
789 | %% %- les risques de la tï¿œche et les solutions de repli envisagï¿œes. |
---|
790 | |
---|
791 | |
---|
792 | |
---|
793 | |
---|
794 | |
---|
795 | |
---|