| 1 | % vim:set spell: | 
|---|
| 2 | % vim:spell spelllang=en: | 
|---|
| 3 | \anrdoc{\begin{itemize} | 
|---|
| 4 | \item Presenter un etat de lâart national et international, en dressant lâetat des | 
|---|
| 5 | connaissances sur le sujet. | 
|---|
| 6 | \item Faire apparaître dâeventuelles contributions des partenaires de la proposition | 
|---|
| 7 | de projet a cet etat de lâart. | 
|---|
| 8 | \item Faire apparaître dâeventuels resultats preliminaires. | 
|---|
| 9 | \item Inclure les references bibliographiques necessaires en annexe 7.1. | 
|---|
| 10 | \end{itemize}} | 
|---|
| 11 |  | 
|---|
| 12 | %Our project covers several critical domains in system design in order | 
|---|
| 13 | %to achieve high performance computing. Starting from a high level description we aim | 
|---|
| 14 | %at generating automatically both hardware and software components of the system. | 
|---|
| 15 |  | 
|---|
| 16 | \subsubsection{High Performance Computing} | 
|---|
| 17 | \label{soa:hpc} | 
|---|
| 18 | % Un marché bouffé par les archi GPGPU tel que le FERMI de NvidiaCUDA programming language | 
|---|
| 19 | The High-Performance Computing (HPC) world is composed of three main families of architectures: | 
|---|
| 20 | many-core, GPGPU (General Purpose computation on Graphics Unit Processing) and FPGA. | 
|---|
| 21 | Today, the first  two families are dominating the market by taking benefit | 
|---|
| 22 | of the strength and influence of mass-market leaders (Intel, Nvidia). | 
|---|
| 23 | %such as Intel for many-core CPU and Nvidia for GPGPU. | 
|---|
| 24 | In this market, FPGA architectures are emerging and very promising. | 
|---|
| 25 | By adapting architecture to the software, % (the opposite is done in the others families) | 
|---|
| 26 | FPGAs architectures enable better performance | 
|---|
| 27 | (typically an acceleration factor between 10 and 100) | 
|---|
| 28 | while using smaller size and less energy (and generating less heat). | 
|---|
| 29 | However, using FPGAs presents significant challenges~\cite{hpc06a}. | 
|---|
| 30 | First, the operating frequency of an FPGA is low compared to a high-end microprocessor. | 
|---|
| 31 | Second, % based on Amdahl law, | 
|---|
| 32 | HPC/FPGA application performance is unusually sensitive | 
|---|
| 33 | to the implementation quality~\cite{hpc06b}. | 
|---|
| 34 | % Thus, the performance strongly relies on the detected parallelism. | 
|---|
| 35 | % (pour résumer les 2 derniers points) | 
|---|
| 36 | Finally, efficient design methodology are required in order to | 
|---|
| 37 | hide FPGA complexity and the underlying implantation subtleties to HPC users, | 
|---|
| 38 | so that they do not have to change their habits and can have equivalent design productivity | 
|---|
| 39 | than in others families~\cite{hpc07a}. | 
|---|
| 40 |  | 
|---|
| 41 | %état de l'art FPGA | 
|---|
| 42 | HPC/FPGA hardware is only now emerging and in early commercial stages, | 
|---|
| 43 | but these techniques have not yet caught up. | 
|---|
| 44 | Industrial (Mitrionics~\cite{hpc08}, Gidel~\cite{hpc09}, Convey Computer~\cite{hpc10}) and academic (CHREC) | 
|---|
| 45 | researches on HPC-FPGA are mainly conducted in the USA. | 
|---|
| 46 | None of the approaches developed in these researches are fulfilling entirely the | 
|---|
| 47 | challenges described above. For example, Convey Computer proposes application-specific instruction | 
|---|
| 48 | set extension of x86 cores in an FPGA accelerator, | 
|---|
| 49 | but extension generation is not automated and requires hardware design skills. | 
|---|
| 50 | Mitrionics has an elegant solution based on a compute engine specifically | 
|---|
| 51 | developed for high-performance execution in FPGAs. Unfortunately, the design flow | 
|---|
| 52 | is based on a new programming language (mitrionC) implying important designer efforts and poor portability. | 
|---|
| 53 | % tool relying on operator libraries (XtremeData), | 
|---|
| 54 | % Parle t-on de l'OPenFPGA consortium, dont le but est : "to accelerate the incorporation of reconfigurable computing technology in high-performance and enterprise applications" ? | 
|---|
| 55 |  | 
|---|
| 56 | Thus, much effort is required to develop design tools that translate high level | 
|---|
| 57 | language programs to FPGA configurations. | 
|---|
| 58 |  | 
|---|
| 59 | \subsubsection{System Synthesis} | 
|---|
| 60 | \label{soa:system:synthesis} | 
|---|
| 61 | Today, several solutions for system design are proposed and commercialized. | 
|---|
| 62 | The existing commercial or free tools do not | 
|---|
| 63 | cover the whole system synthesis process in a fully automatic way. Moreover, | 
|---|
| 64 | they are bound to a particular device family and to an IP library. | 
|---|
| 65 | The most commonly used are provided by \altera and \xilinx to promote their | 
|---|
| 66 | FPGA devices. These representative tools used to synthesize SoC on FPGA | 
|---|
| 67 | are introduced below. | 
|---|
| 68 | \\ | 
|---|
| 69 | The \xilinx System Generator for DSP~\cite{system-generateur-for-dsp} is a | 
|---|
| 70 | plug-in to Simulink that enables designers to develop high-performance DSP | 
|---|
| 71 | systems for \xilinx FPGAs. | 
|---|
| 72 | Designers can specify and simulate a system using MATLAB and Simulink. The | 
|---|
| 73 | tool will then automatically generate synthesizable Hardware Description | 
|---|
| 74 | Language (HDL) code mapped to \xilinx pre-optimized algorithms. | 
|---|
| 75 | However, this tool targets only signal processing algorithms, \xilinx FPGAs and | 
|---|
| 76 | cannot handle a complete SoC. Thus, it is not really a system synthesis tool. | 
|---|
| 77 | \\ | 
|---|
| 78 | In the opposite, SOPC Builder~\cite{spoc-builder} from \altera and \xilinx | 
|---|
| 79 | Platform Studio XPS from \xilinx allow to describe a system, to synthesize it, | 
|---|
| 80 | to program it into a target FPGA and to upload a software application. | 
|---|
| 81 | Both SOPC Builder and XPS allow designers to select and parameterize components from | 
|---|
| 82 | an extensive drop-down list of IP cores (I/O core, DSP, processor,  bus core, ...) | 
|---|
| 83 | as well as to incorporate their own IP. Nevertheless, all the previously introduced tools | 
|---|
| 84 | do not provide any facilities to synthesize coprocessors and to simulate the platform | 
|---|
| 85 | at a high level (SystemC). | 
|---|
| 86 | A system designer must provide the synthesizable description of its own IP-cores | 
|---|
| 87 | interfaces it to the SoC bus. | 
|---|
| 88 | Design Space Exploration is thus limited and SystemC simulation is not possible | 
|---|
| 89 | either at transactional or at cycle accurate level. | 
|---|
| 90 | \\ | 
|---|
| 91 | In addition, \xilinx System Generator, XPS and SOPC Builder are closed world | 
|---|
| 92 | since each one imposes their own IPs which are not interchangeable. | 
|---|
| 93 | Designers can then only generate a synthesized netlist, VHDL/Verilog simulation test | 
|---|
| 94 | bench and custom software library that reflect the hardware configuration. | 
|---|
| 95 | \\ | 
|---|
| 96 | Consequently, a designer developing an embedded system needs to master four | 
|---|
| 97 | design environments: | 
|---|
| 98 | \begin{enumerate} | 
|---|
| 99 | \item a virtual prototyping environment (in SystemC) for system level exploration, | 
|---|
| 100 | \item an architecture compiler to define the hardware architecture (Verilog/VHDL), | 
|---|
| 101 | \item one or several third-party HLS tools for coprocessor synthesis (C to RTL), | 
|---|
| 102 | \item and finally back-end synthesis tools for the bit-stream generation (RTL to bitstream). | 
|---|
| 103 | \end{enumerate} | 
|---|
| 104 | Furthermore, mixing these tools requires an important interfacing effort and this makes | 
|---|
| 105 | the design process very complex and achievable only by designers skilled in many domains. | 
|---|
| 106 |  | 
|---|
| 107 | \subsubsection{High Level Synthesis} | 
|---|
| 108 | \label{soa:hls} | 
|---|
| 109 | High Level Synthesis translates a sequential algorithmic description and a | 
|---|
| 110 | set of constraints (area, power, frequency, ...) to a micro-architecture at | 
|---|
| 111 | Register Transfer Level (RTL). | 
|---|
| 112 | Several academic and commercial tools are today available. The most common | 
|---|
| 113 | tools are SPARK~\cite{spark04}, GAUT~\cite{gaut08}, UGH~\cite{ugh08} in the | 
|---|
| 114 | academic world and CATAPULTC~\cite{catapult-c}, PICO~\cite{pico} and | 
|---|
| 115 | CYNTHETIZER~\cite{cynthetizer} in the commercial world.  Despite their | 
|---|
| 116 | maturity, their usage is restrained by \cite{IEEEDT} \cite{CATRENE} \cite{HLSBOOK}: | 
|---|
| 117 | \begin{itemize} | 
|---|
| 118 | \item HLS tools are not integrated into an architecture and system exploration tool. | 
|---|
| 119 | Thus, a designer who needs to accelerate a software part of the system, must adapt it manually | 
|---|
| 120 | to the HLS input dialect and perform engineering work to exploit the synthesis result | 
|---|
| 121 | at the system level, | 
|---|
| 122 | \item Current HLS tools cannot target control AND data oriented applications, | 
|---|
| 123 | \item HLS tools take into account mainly a unique constraint while realistic design | 
|---|
| 124 | is multi-constrained. | 
|---|
| 125 | The power consumption constraint which is mandatory for embedded systems is not yet | 
|---|
| 126 | well handled or not handled at all by the HLS tools already available, | 
|---|
| 127 | \item The parallelism is limited to that present in the initial specification. | 
|---|
| 128 | To get more parallelism or to reduce the amount of required memory in the SoC, the user | 
|---|
| 129 | must re-write the algorithmic specification while there are techniques such as polyhedral | 
|---|
| 130 | transformations that can automate this process. | 
|---|
| 131 | \item While they support limited loop transformations like loop unrolling and loop | 
|---|
| 132 | pipelining, current HLS tools do not provide support for design space exploration, either | 
|---|
| 133 | through automatic loop transformations or for improving the memory mapping, | 
|---|
| 134 | \item Despite having the same input language (C/C++), they are sensitive to the style in | 
|---|
| 135 | which the algorithm is written. Consequently, engineering work is required to swap from | 
|---|
| 136 | a tool to another, | 
|---|
| 137 | \item They do not respect accurately the frequency constraint when they target an FPGA device. | 
|---|
| 138 | Their error is about 10 percent. This is annoying when the generated component is integrated | 
|---|
| 139 | in a SoC since it will slow down the whole system. | 
|---|
| 140 | \end{itemize} | 
|---|
| 141 | Regarding these limitations, it is necessary to create a new tool generation reducing the gap | 
|---|
| 142 | between the specification of an heterogeneous system and its hardware implementation \cite{HLSBOOK} \cite{IEEEDT}. | 
|---|
| 143 |  | 
|---|
| 144 | \subsubsection{Application Specific Instruction Processors} | 
|---|
| 145 | \label{soa:asip} | 
|---|
| 146 | ASIP (Application-Specific Instruction-Set Processor) are programmable | 
|---|
| 147 | processors in which both the instruction set and the micro architecture have | 
|---|
| 148 | been tailored to a given application domain or to a | 
|---|
| 149 | specific application.  This specialization usually offers a good compromise | 
|---|
| 150 | between performance (w.r.t a pure software implementation on an embedded | 
|---|
| 151 | CPU) and flexibility (w.r.t an application specific hardware co-processor). | 
|---|
| 152 | In spite of their obvious advantages, using/designing ASIPs remains a | 
|---|
| 153 | difficult task, since it involves designing both a micro-architecture and a | 
|---|
| 154 | compiler for this architecture. Besides, to our knowledge, there is still | 
|---|
| 155 | no available open-source design flow for ASIP design even if such a tool | 
|---|
| 156 | would be valuable in the | 
|---|
| 157 | context of a System Level design exploration tool. | 
|---|
| 158 | \\ | 
|---|
| 159 | In this context, ASIP design based on Instruction Set Extensions (ISEs) has | 
|---|
| 160 | received a lot of interest~\cite{NIOS2}, as it makes micro-architecture synthesis | 
|---|
| 161 | more tractable \footnote{ISEs rely on a template micro-architecture in which | 
|---|
| 162 | only a small fraction of the architecture has to be specialized}, and help ASIP | 
|---|
| 163 | designers to focus on compilers, for which there are still many open | 
|---|
| 164 | problems\cite{ARC08}. | 
|---|
| 165 | This approach however has a severe weakness, since it also significantly reduces | 
|---|
| 166 | opportunities for achieving good speedups (most speedups remain between 1.5x and | 
|---|
| 167 | 2.5x), since ISEs performance is generally limited by I/O constraints as | 
|---|
| 168 | they generally rely on the main CPU register file to access data. | 
|---|
| 169 | \\ | 
|---|
| 170 | To cope with this issue, recent approaches~\cite{DAC09,CODES08,TVLSI06} advocate the use of | 
|---|
| 171 | micro-architectural ISE models in which the coupling between the processor micro-architecture | 
|---|
| 172 | and the ISE component is tightened up so as to allow the ISE to overcome the register | 
|---|
| 173 | I/O limitations. However these approaches generally tackle the problem from a compiler/simulation | 
|---|
| 174 | point of view and do not address the problem of generating synthesizable representations for | 
|---|
| 175 | these models. | 
|---|
| 176 | \\ | 
|---|
| 177 | We therefore strongly believe that there is a need for an open-framework which | 
|---|
| 178 | would allow researchers and system designers to : | 
|---|
| 179 | \begin{itemize} | 
|---|
| 180 | \item Explore the various level of interactions between the original CPU micro-architecture | 
|---|
| 181 | and its extension (for example through a Domain Specific Language targeted at micro-architecture | 
|---|
| 182 | specification and synthesis). | 
|---|
| 183 | \item Retarget the compiler instruction-selection pass | 
|---|
| 184 | (or prototype new passes) so as to be able to take advantage of this ISEs. | 
|---|
| 185 | \item Provide  a complete System-level Integration for using ASIP as SoC building blocks | 
|---|
| 186 | (integration with application specific blocks, MPSoc, etc.) | 
|---|
| 187 | \end{itemize} | 
|---|
| 188 |  | 
|---|
| 189 | \subsubsection{Automatic Parallelization} | 
|---|
| 190 | \label{soa:automatic:parallelization} | 
|---|
| 191 | The problem of compiling sequential programs for parallel computers | 
|---|
| 192 | has been studied since the advent of the first parallel architectures | 
|---|
| 193 | in the 1970s. The basic approach consists in applying program transformations | 
|---|
| 194 | which exhibit or increase the potential parallelism, while guaranteeing | 
|---|
| 195 | the preservation of the program semantics. Most of these transformations | 
|---|
| 196 | just reorder the operations of the program; some of them modify its | 
|---|
| 197 | data structures. Dependences (exact or conservative) are checked to guarantee | 
|---|
| 198 | the legality of the transformation. | 
|---|
| 199 | \\ | 
|---|
| 200 | This has lead to the invention of many loop transformations (loop fusion, | 
|---|
| 201 | loop splitting, loop skewing, loop interchange, loop unrolling, ...) | 
|---|
| 202 | which interact in a complicated way. More recently, it has been noticed | 
|---|
| 203 | that all of these are just changes of basis in the iteration domain of | 
|---|
| 204 | the program. This has lead to the introduction of the polyhedral model | 
|---|
| 205 | \cite{FP:96,DRV:2000}, in which the combination of two transformations is | 
|---|
| 206 | simply a matrix product. | 
|---|
| 207 | \\ | 
|---|
| 208 | Since hardware is inherently parallel, finding parallelism in sequential | 
|---|
| 209 | programs in an important prerequisite for HLS. The large FPGA chips of | 
|---|
| 210 | today can accommodate much more parallelism than is available in basic blocks. | 
|---|
| 211 | The polyhedral model is the ideal tool for finding more parallelism in | 
|---|
| 212 | loops. | 
|---|
| 213 | \\ | 
|---|
| 214 | As a side effect, it has been observed that the polyhedral model is a useful | 
|---|
| 215 | tool for many other optimization, like memory reduction and locality | 
|---|
| 216 | improvement. It should be noted | 
|---|
| 217 | that the polyhedral model \emph{stricto sensu} applies only to | 
|---|
| 218 | very regular programs. Its extension to more general programs is | 
|---|
| 219 | an active research subject. | 
|---|
| 220 |  | 
|---|
| 221 | \subsubsection{SoC design flow automation using IP-XACT} | 
|---|
| 222 | \label{soa:ip-xact} | 
|---|
| 223 | % EV: Industrial IP integration flows based on IP-XACT standards: \cite{mds1}\\ | 
|---|
| 224 | % EV: SPIRIT IP-XACT Controlled ESL Design Tool Applied to a Network-on-Chip Platform: \cite{mds2}\\ | 
|---|
| 225 | % EV: SocKET design flow and Application on industrial use cases: \cite{socketflow}\\ | 
|---|
| 226 | % IA: http://www.design-reuse.com/articles/19895/ip-xact-xml.html \cite{dandr}\\ | 
|---|
| 227 | IP-XACT is an XML based open standard defined by the Accellera consortium. | 
|---|
| 228 | This non-profit organisation provides a unified set of high quality IP-XACT | 
|---|
| 229 | specifications for documenting IP using meta-data. This meta-data will be | 
|---|
| 230 | used for configuring, integrating, and verifying IP in advanced SoC design | 
|---|
| 231 | and interfacing tools using TGI (Tight Generator Interface is a software API) | 
|---|
| 232 | that can be used to access design meta-data descriptions of complete system designs. | 
|---|
| 233 | The specification for the schema is tailored to the requirements of the industry, | 
|---|
| 234 | and focused on enabling technologies for the efficient design of electronic | 
|---|
| 235 | systems from concept to production. The last IEEE 1685 release of IP-XACT incorporates | 
|---|
| 236 | both RTL and TLM (transaction level modelling) capabilities. Thus it can be used to | 
|---|
| 237 | package IP portfolios~\cite{dandr} and describe their assembly in complex hardware architectures.~\cite{mds1}~\cite{mds2} | 
|---|
| 238 | These description files are the basis for tool interoperability and data exchange | 
|---|
| 239 | through a common structured data management\cite{socketflow}. Today more than two hundred companies | 
|---|
| 240 | are members of the consortium and the board is incorporating top actors | 
|---|
| 241 | (STM, NXP, TI, ARM, FREESCALE, LSI, Mentor, Synopsys and Cadence), ensuring the | 
|---|
| 242 | wide adoption by industry. Initiatives have already | 
|---|
| 243 | attempted to extend this standard | 
|---|
| 244 | to the AMS IPs packaging domain (MEDEA+ Beyond Dreams Project) and to Hardware Dependent | 
|---|
| 245 | Software layers (MEDEA+ SoftSoc project) and Accellera is reusing these results for | 
|---|
| 246 | further releases. | 
|---|
| 247 | \parlf | 
|---|
| 248 | In IP-XACT the flow automation and data consistency is ensured by generators, which | 
|---|
| 249 | are program modules that process IP-XACT XML data into something useful | 
|---|
| 250 | for the design. They are key portable mechanism for encapsulating specialist design | 
|---|
| 251 | knowledge and enable designers to deploy specialist knowledge in their design. It is | 
|---|
| 252 | always possible to create generators in order to link several design or analysis tools | 
|---|
| 253 | around a centric representation of meta-data in IP-XACT. This kind of XML schema for | 
|---|
| 254 | meta-data management is a good solution for the federation of heterogeneous design domains | 
|---|
| 255 | (models, tools, languages, methodologies, etc.). | 
|---|
| 256 |  | 
|---|