[12] | 1 | % les objectifs scientifiques/techniques du projet. |
---|
[20] | 2 | The objectives of COACH project are to develop a complete framework to HPC |
---|
| 3 | (accelerating solutions for existing software applications) and embedded |
---|
| 4 | applications (implementing an application on a low power standalone |
---|
[24] | 5 | device). The design steps are presented figure~\ref{coach-flow}. |
---|
[12] | 6 | \begin{figure}[hbtp]\leavevmode\center |
---|
| 7 | \includegraphics[width=.8\linewidth]{flow} |
---|
[20] | 8 | \caption{\label{coach-flow} COACH flow} |
---|
[12] | 9 | \end{figure} |
---|
| 10 | \begin{description} |
---|
| 11 | \item[HPC setup] Here the user splits the application into 2 parts: the host application |
---|
| 12 | which remains on PC and the SoC application which migrates on SoC. |
---|
| 13 | The framework provides a simulation model allowing to evaluate the partitioning. |
---|
| 14 | \item[SoC design] In this phase, |
---|
| 15 | The user can obtain simulators at different abstraction levels of the SoC by giving to COACH framework |
---|
| 16 | a SoC description. |
---|
| 17 | This description consists of a process network corresponding to the SoC application, |
---|
| 18 | an OS, an instance of a generic hardware platform |
---|
| 19 | and a mapping of processes on the platform components. The supported mapping are |
---|
| 20 | software (the process runs on a SoC processor), |
---|
| 21 | XXXpeci (the process runs on a SoC processor enhanced with dedicated instructions), |
---|
| 22 | and hardware (the process runs into a coprocessor generated by HLS and plugged on the SoC bus). |
---|
| 23 | \item[Application compilation] Once SoC description is validated, COACH generates automatically |
---|
| 24 | an FPGA bitstream containing the hardware platform with SoC application software and |
---|
| 25 | an executable containing the host application. The user can launch the application by |
---|
| 26 | loading the bitstream on FPGA and running the executable on PC. |
---|
| 27 | \end{description} |
---|
| 28 | |
---|
| 29 | % l'avancee scientifique attendue. Preciser l'originalite et le caractere |
---|
| 30 | % ambitieux du projet. |
---|
| 31 | The main scientific contribution of the project is to unify various synthesis techniques |
---|
| 32 | (same input and output formats) allowing the user to swap without engineering effort |
---|
| 33 | from one to an other and even to chain them, for example, to run polyedric transformation |
---|
| 34 | before synthesis. |
---|
| 35 | Another advantage of this framework is to provide different abstraction levels from |
---|
| 36 | a single description. |
---|
| 37 | Finally, this description is device family independent and its hardware implementation |
---|
| 38 | is automatically generated. |
---|
| 39 | |
---|
| 40 | % Detailler les verrous scientifiques et techniques a lever par la realisation du projet. |
---|
| 41 | System design is a very complicated task and in this project we try to simplify it |
---|
| 42 | as much as possible. For this purpose we have to deal with the following scientific |
---|
| 43 | and technological barriers. |
---|
| 44 | \begin{itemize} |
---|
| 45 | \item The main problem in HPC is the communication between the PC and the SoC. |
---|
| 46 | This problem has 2 aspects. The first one is the efficiency. The second is to |
---|
| 47 | eliminate enginnering effort to implement it at different abstract levels. |
---|
| 48 | \item COACH design flow has a top-down approach. In the such case, |
---|
| 49 | the required performance of a coprocessor (run frequency, maximum cycles for |
---|
| 50 | a given computation, power consumption, etc) are imposed by the other system |
---|
| 51 | components. The challenge is to allow user to control accurately the synthesis |
---|
| 52 | process. For instance, the run frequency must not be a result of the RTL synthesis |
---|
| 53 | but a strict synthesis constraint. |
---|
| 54 | \item HLS tools are sensitive to the style in which the algorithm is written. |
---|
| 55 | In addition, they are are not integrated into an architecture and system |
---|
| 56 | exploration tool. |
---|
| 57 | Consequently, engineering work is required to swap from a tool to another, |
---|
| 58 | to integrate the resulting simulation model to an architectural exploration tool |
---|
| 59 | and to synthesize the generated RTL description. |
---|
| 60 | %CA Additionnal preprocessing, source-level transformations, are thus |
---|
| 61 | %CA required to improve the process. |
---|
| 62 | %CA Particularly, this includes parallelism exposure and efficient memory mapping. |
---|
| 63 | \item Most HLS tools translate a sequential algorithm into a coprocessor |
---|
| 64 | containing a single data-path and finite state machine (FSM). In this way, |
---|
| 65 | only the fine grained parallelism is exploited (ILP parallelism). |
---|
| 66 | The challenge is to identify the coarse grained parallelism and to generate, |
---|
| 67 | from a sequential algorithm, coprocessor containing multiple communicating |
---|
| 68 | tasks (data-paths and FSMs). |
---|
| 69 | \end{itemize} |
---|
| 70 | |
---|
| 71 | %Presenter les resultats escomptes en proposant si possible des criteres de reussite |
---|
| 72 | %et d'evaluation adaptes au type de projet, permettant d'evaluer les resultats en |
---|
| 73 | %fin de projet. |
---|
| 74 | The main result is the framework. It is composed concretely of: |
---|
| 75 | 2 HPC communication shemes with their implementation, |
---|
| 76 | 5 HLS tools (control dominated HLS, data dominated HLS, Coarse grained HLS, |
---|
| 77 | Memory optimisation HLS and ASIP), |
---|
| 78 | 3 systemC based virtual prototyping environment extended with synthesizable |
---|
| 79 | RTL IP cores (generic, ALTERA/NIOS/AVALON, XILINX/MICROBLAZE/OPB), |
---|
| 80 | one design space exploration tool, |
---|
| 81 | one operating system (OS). |
---|
| 82 | \\ |
---|
| 83 | The framework fonctionality will be demonstrated with XXX-EXAMPLE1, XXX-EXAMPLE2 |
---|
| 84 | and XXX-EXAMPLE3 on 4 archictures (generic/XILINX, generic/ALTERA, |
---|
| 85 | proprietary/XILINX, proprietary/ALTERA). |
---|
| 86 | |
---|