| 1 | % les objectifs scientifiques/techniques du projet. |
|---|
| 2 | The objectives of the COACH project are to develop a complete framework to HPC |
|---|
| 3 | (accelerating solutions for existing software applications) and embedded |
|---|
| 4 | applications (implementing an application on a low power standalone |
|---|
| 5 | device). The design steps are presented figure~\ref{coach-flow}. |
|---|
| 6 | \begin{figure}[hbtp]\leavevmode\center |
|---|
| 7 | \includegraphics[width=.8\linewidth]{flow} |
|---|
| 8 | \caption{\label{coach-flow} COACH design flow} |
|---|
| 9 | \end{figure} |
|---|
| 10 | \begin{description} |
|---|
| 11 | \item[HPC setup:] During this step, the user splits the application into 2 parts: the host application |
|---|
| 12 | which remains on a PC and the SoC application which is mapped on the FPGA. |
|---|
| 13 | COACH will allow to automatically translate high level language programs to FPGA configurations. |
|---|
| 14 | In addition, it will provide a SystemC simulation model of the whole system (PC+communication+FPGA-SoC) |
|---|
| 15 | which will allow performance evaluation of the partitioning. |
|---|
| 16 | \item[SoC design:] In this phase, |
|---|
| 17 | COACH will allow the user to obtain simulators for the SoC at different abstraction levels by giving to the COACH framework a SoC description. |
|---|
| 18 | This description will consist of a process network corresponding to the application, |
|---|
| 19 | an OS, an instance of a generic hardware platform |
|---|
| 20 | and a mapping of processes on the platform components. COACH will offer different targets to map the processes: |
|---|
| 21 | software (the process runs on a SoC processor), |
|---|
| 22 | ASIP (the process runs on a SoC processor enhanced with dedicated instructions), |
|---|
| 23 | and hardware (the process runs into a coprocessor that is generated by HLS and plugged on the SoC bus). |
|---|
| 24 | \item[Application compilation:] Once the SoC description is validated through performances analysis, COACH will generate automatically |
|---|
| 25 | an FPGA bitstream containing the hardware platform with the SoC application software and |
|---|
| 26 | an executable containing the host application. The user will be able to launch the application by |
|---|
| 27 | loading the bitstream on an FPGA and running the executable on PC. |
|---|
| 28 | \end{description} |
|---|
| 29 | |
|---|
| 30 | % l'avancee scientifique attendue. Preciser l'originalite et le caractere |
|---|
| 31 | % ambitieux du projet. |
|---|
| 32 | %FIXME == {NON ceci n'est pas une contribution scientifique. A re-ecrire} |
|---|
| 33 | |
|---|
| 34 | %The main scientific contribution of the project is to unify various synthesis techniques |
|---|
| 35 | %(same input and output formats) allowing the user to swap without engineering effort |
|---|
| 36 | %from one to another and even to chain them. For instance, it will be possible to run loop transformations before synthesis. |
|---|
| 37 | %Another advantage of this framework is to provide different abstraction levels from |
|---|
| 38 | %a single description. |
|---|
| 39 | %Finally, this description is device family independent and its hardware implementation |
|---|
| 40 | %is automatically generated. |
|---|
| 41 | |
|---|
| 42 | % Detailler les verrous scientifiques et techniques a lever par la realisation du projet. |
|---|
| 43 | System design is a very complex task and in this project we will try to simplify it |
|---|
| 44 | as much as possible. For this purpose the following scientific and technological barriers |
|---|
| 45 | have to be addressed. |
|---|
| 46 | |
|---|
| 47 | \begin{description} |
|---|
| 48 | \item[Design Space Exploration:] |
|---|
| 49 | The COACH environment will allow to easily map an application described by using a process |
|---|
| 50 | network Model of Computation (MoC) on a shared-memory, MPSoC architecture. COACH will |
|---|
| 51 | allow to explore the design space by allowing system designer to select and |
|---|
| 52 | parameterize the target architecture, and to define the best hardware/software |
|---|
| 53 | partitioning of the application. |
|---|
| 54 | \item[Hardware Accelerators Synthesis (HAS):] |
|---|
| 55 | COACH will allow the automatic generation of hardware accelerators when required. |
|---|
| 56 | Hence, High-Level Synthesis (HLS) tools, Application Specific Instruction Processor |
|---|
| 57 | (ASIP) design environment and source-level transformation tools (loop transformations |
|---|
| 58 | and memory optimisation) will be provided. |
|---|
| 59 | This will allow further exploration of the micro-architectural design space. |
|---|
| 60 | HLS tools are sensitive to the coding style of the input specification and the domain |
|---|
| 61 | they target (control vs. data dominated). |
|---|
| 62 | The HLS tools of COACH will support a common language and coding style to avoid |
|---|
| 63 | re-engineering by the designer. |
|---|
| 64 | \item[Platform based design:] |
|---|
| 65 | COACH will handle both \altera and \xilinx FPGA devices. |
|---|
| 66 | COACH will define architectural templates that can be customized by adding |
|---|
| 67 | dedicated coprocessors and ASIPs and by fixing template parameters such as |
|---|
| 68 | the number of embedded processors, the number of sizes of embedded memory banks |
|---|
| 69 | or the embedded the operating system. |
|---|
| 70 | However, the specification of the application will be independant of both the |
|---|
| 71 | architectural template and the target FPGA device. |
|---|
| 72 | Basically, the 3 following architectural templates will be provided: |
|---|
| 73 | \begin{enumerate} |
|---|
| 74 | \item A \mustbecompleted{FIXME :: Neutral est tres pejoratif. Technology inependent, independant, standard ???} Neutral architectural template based on the SoCLib IP core library and the |
|---|
| 75 | VCI/OCP communication infrastructure. |
|---|
| 76 | \item An \altera architectural template based on the \altera IP core library, the |
|---|
| 77 | AVALON system bus and the NIOS processor. |
|---|
| 78 | \item A \xilinx architectural template based on the Xilinx IP core library, the PLB |
|---|
| 79 | system bus and the Microblaze processor. |
|---|
| 80 | \end{enumerate} |
|---|
| 81 | \item[Hardware/Software communication middleware:] |
|---|
| 82 | COACH will implement an homogeneous HW/SW communication infrastructure and |
|---|
| 83 | communication APIs (Application Programming Interface), that will be used for |
|---|
| 84 | communications between software tasks running on embedded processors and |
|---|
| 85 | dedicated hardware coprocessors. |
|---|
| 86 | \end{description} |
|---|
| 87 | |
|---|
| 88 | |
|---|
| 89 | |
|---|
| 90 | ---------------------------------------------------------------------------------------------- |
|---|
| 91 | |
|---|
| 92 | |
|---|
| 93 | \begin{itemize} |
|---|
| 94 | \item HLS tools are sensitive to the style in which the algorithm is written. |
|---|
| 95 | In addition, they are are not integrated into an architecture and system |
|---|
| 96 | exploration tool. Consequently, engineering work is required to swap from a tool to another, |
|---|
| 97 | to integrate the resulting simulation model to an architectural exploration tool |
|---|
| 98 | and to synthesize the generated RTL description. |
|---|
| 99 | %CA Additionnal preprocessing, source-level transformations, are thus |
|---|
| 100 | %CA required to improve the process. |
|---|
| 101 | %CA Particularly, this includes parallelism exposure and efficient memory mapping. |
|---|
| 102 | \item Most HLS tools translate a sequential algorithm into a coprocessor |
|---|
| 103 | containing a single data-path and finite state machine (FSM). In this way, |
|---|
| 104 | only the fine grained parallelism is exploited (ILP parallelism). |
|---|
| 105 | The challenge is to identify the coarse grained parallelism and to generate, |
|---|
| 106 | from a sequential algorithm, coprocessor containing multiple communicating |
|---|
| 107 | tasks (data-paths and FSMs). To this aim, one may adapt techniques which |
|---|
| 108 | were developed in the 1990 for the construction of distributed programs. |
|---|
| 109 | However, in the context of HLS, there are still several original problems |
|---|
| 110 | to be solved, mainly to do with the construction of FIFO communication |
|---|
| 111 | channels and with memory optimization. |
|---|
| 112 | \item The COACH design flow has a top-down approach. In such a case, |
|---|
| 113 | the required performance of a coprocessor (clock frequency, maximum cycles for |
|---|
| 114 | a given computation, power consumption, etc) are imposed by the other system |
|---|
| 115 | components. The challenge is to allow user to control accurately the synthesis |
|---|
| 116 | process. For instance, the clock frequency must not be a result of the RTL synthesis |
|---|
| 117 | but a strict synthesis constraint. |
|---|
| 118 | \item The main problem in HPC is the communication between the PC and the SoC. |
|---|
| 119 | This problem has 2 aspects. The first one is the run-time efficiency. The second is |
|---|
| 120 | its engineering cost, especially if one want to refine an implementation |
|---|
| 121 | at several abstract levels. |
|---|
| 122 | |
|---|
| 123 | \end{itemize} |
|---|
| 124 | |
|---|
| 125 | %Presenter les resultats escomptes en proposant si possible des criteres de reussite |
|---|
| 126 | %et d'evaluation adaptes au type de projet, permettant d'evaluer les resultats en |
|---|
| 127 | %fin de projet. |
|---|
| 128 | The main result is the framework. It is composed concretely of: |
|---|
| 129 | a communication middleware for HPC, |
|---|
| 130 | 5 HAS tools (control dominated HLS, data dominated HLS, Coarse grained HLS, |
|---|
| 131 | Memory optimisation HLS and ASIP), |
|---|
| 132 | 3 architectural templates that are synthesizable and that can be prototyped, |
|---|
| 133 | one design space exploration tool, |
|---|
| 134 | 2 operating systems (DNA/OS and MUTEKH). |
|---|
| 135 | \\ |
|---|
| 136 | The framework fonctionality will be demonstrated with the demonstrators |
|---|
| 137 | (see task-7 page~\pageref{task-7}) and the tutorial example (see task-8 |
|---|
| 138 | page~\ref{subtask-tutorial}). |
|---|