Version 3 (modified by 17 years ago) (diff) | ,
---|
The project structure
Online monitoring, or instrumentation, consists in adding software and/or hardware probes/sensors to the running architecture in order to detect and collect events corresponding to one or several physical phenomena occurring in the MP2SoC (temperature, power consumption or processor workload reaches a threshold, the contents of a hardware register or of a variable is different from expected, etc). Monitoring aims at being a non-intrusive stage that basically reads analog, digital, and software sensors and stores the results in local memories.
Online diagnosis is the stage responsible for making thorough analysis, once events corresponding to alteration/malfunction have been detected in the previous stage. This stage interprets and formats the raw results, logs them into an efficient data structures like databases and manages their history. Diagnosis also performs intrusive tests, like functional or structural tests on IPs, computes an annotated representation of the running architecture, and finally builds a database of audited architecture views. These views, or maps, represent an instant picture of the architecture showing the exact physical locations of the analyzed phenomenon occurrences. Because it is intrusive, the diagnosis stage generally suspends or simply stops the running application. For instance, in the case of the structural test of a component, the running application must be stopped and totally replaced by the test application. In other words, an event map actually represents the audited architecture with respect to the monitored event
Online constrained application remapping exploits the database of event maps, and possibly its history, to determine how and under what conditions the application graph can be remapped to the architecture. The instant map is used to constrain the placement of the monitored application graph. Different placement strategies for the application graph are possible, from a centralized scheme which statically assigns threads to processors once for all to a distributed and dynamic placement algorithm that allows task migration/replication and local optimization.
The ADAM project addresses a major part of the issues related to MP2SoC self-adaptability and aims at determining the common hardware and software mechanisms needed for the three stages. For the sake of readability, to each of these steps has been assigned a work package in the proposal. CEA-LETI is responsible for the “online monitoring” work-package, LIP6 is responsible for the “online diagnosis” work-package and LIRMM for the “online constrained application remapping” work-package. As a proof of concept, and to validate the whole work, 3 distinct applications will be mapped onto the three hardware architectures maintained by each partner: a telecom application 3GPP-LTE, a H264 decoder, and a mp3 decoder.
The following figure shows a synthetic view of the ADAM project. The project is composed of 3 Work-Packages (WP). The first one, WP1 is dedicated to online non-intrusive monitoring, the second one, WP2, addresses the problem of online diagnosis and event database management and WP3 deals with constraint-driven application remapping.
WP1 contains 3 tasks: the first one 1a) is dedicated to performance measurement, the second one 1b) will give figures for power consumption, temperature and voltage and the third one 1c) addresses fault detection techniques. Information provided by these 3 tasks are then gathered in the Distributed Raw Event Tables (DRET) which are tables of “first-level” measurements distributed among the different processing tiles of the architecture. The objective of this WP is to obtain a whole set of information which allows to perform a correct diagnosis of the architecture.
WP2 takes as input the DRET from WP1, and its objective is to obtain a Consolidated Database of multi-parameters Architecture Instant Map (AIM), called AIM-DB. This database will have a formalism that will enable the different application remapping scenarios developed in WP3. To perform the WP2 objective, 4 tasks are defined: 2a) which allows access to the database and history management when needed, 2b) periodically analyses the DRET to gives inputs to the AIM, 2c) trigs alerts when important events from the DRET are detected and 2d) performs some intrusive diagnosis/test tasks after alerts have been triggered.
AIM-DB is the direct input of the WP3. The objective of this WP is to dynamically adapt the application with this knowledge of the architecture present state with respect to monitored events. The outputs are the different methods that will be developed to perform this self-adaptability, and the associated software codes and hardware developments. Task 3a) will study a centralized remapping scenario where the information found in AIM-DB is exploited globally to perform application remapping and binary relinking/reloading, while task 3b) will examine the distributed remapping scenario which takes advantage on AIM-DB to perform local or global remapping orders. Finally, task 3c) defines a common set of example applications that will be used for the validation of the different concepts.