[[PageOutline]] = TLMDT Modeling for tightly interdependent architectures with several levels of interconnections = == 0. Introduction ''This document is still under development''. This TLMDT specification is strongly based on the [http://www.soclib.fr/trac/dev/wiki/WritingRules/Tlmt TLMDT for SOCLIB] one.[[BR]] Several rules are preserved : * Method used to model a VCI / PDES transaction (payload with extension / phase / time) * Global modeling of the VCI Initiator and VCI Target * Global description on the interconnect's work * PDES : activity message Some others are not : * PDES : null-message * PDES : token (not described in the link) * Interconnect's inner synchronisation mechanisms This TLMDT specification is needed to prevent deadlocks and to greatly promote performance & parallelization with minor loss to precision on architecture that are not only composed of simple initiators and targets linked through a single network.[[BR]] List of components' behaviors which exists on TSAR and cannot be efficiently modeled without this specification : * Multi-transactionnal initiators * Multiple networks on which a single component can (directly or not) be an initiator for several ports of an interconnect. * Components which are target and initiator at the same time. == 1. VCI Transactions The VCI Transactions representation mostly remains the same. Two new concepts are introduced for optimization and performance gain. They both got default value which will allow to skip this aspect for convenience. However, this will, in some cases, increase the part of PDES communication (compared to the VCI one) and slow down the simulation. === 1.1. Blocking type Basically, when an initiator sends a blocking transaction, it needs to be stopped and it waits the 'response caught' event to be awaken. When the response to the blocking transaction is caught, if the initiator is mono-transactionnal, or handled as such, its time is updated with the response's one. In case of a multi-transactionnal initiator, another transaction could be sent before the response's time, so the component needs to simulate those cycles too. If the initiator doesn't progress on its time while waiting the response, it is called Passive_Sync. On the contrary, it is called Active_Sync. The machanisms of this concept are detailed later on this page. Sometimes, the response treatment of a transaction doesn't impact the continuation of the simulation. In this case, it's a waste of energy to wait for a response which could be neglected, thus the initiator could pursue its treatment. Such a non-blocking transaction does not prevent the initiator to exceed the defined time quantum if the synchronization timer is reseted when it is sent. However, when an initiator sends a transaction, it needs to be stored until the response comes back. In practice, the data structure which stores the requests has a fixed maximum size. When the buffer is full, a response needs to be caught in order to free a slot, thus the initiator needs to be stopped and it waits the 'response caught' event from any of these transactions to be awaken. This is an alternative to the non-blocking transactions which is more precise ,especially when the buffer is often full, but more difficult to implement. This type of transaction is called conditionaly-blocking The blocking type of a transaction is a 2 bits data (char) stored in its payload extension. If blocking type is not specified by the user, it is considered as blocking. {{{#!c++ tlm::tlm_generic_payload *payload_ptr = new tlm::tlm_generic_payload(); soclib_payload_extension *extension_ptr = new soclib_payload_extension(); //Fill the transaction with the necessary datas ... //Set the transaction as blocking - 2 methods extension_ptr->set_blocking(); extension_ptr->set_blocking_type(BLOCKING); //Default //Set the transaction as non-blocking - 2 methods extension_ptr->set_non_blocking(); extension_ptr->set_blocking_type(NON_BLOCKING); //Set the transaction as conditionaly-blocking - 2 methods extension_ptr->set_cond_blocking(); extension_ptr->set_blocking_type(COND_BLOCKING); //Send the transaction the usual way ... //Retrieve the information extension_ptr->get_cond_blocking(); //returns char extension_ptr->is_blocking(); //returns bool extension_ptr->is_non_blocking(); //returns bool extension_ptr->is_cond_blocking(); //returns bool }}} === 1.2. Primarity A VCI Transaction can either be primary or secondary. The VCI transaction is primary only if it is not related to another transaction. That is to say when the response will be caught and treated, no response to another transaction will be sent. A pure Initiator will only send primary transactions while a Target-Initiator will be able to send both. The Primarity of a transaction is a boolean stored in its payload extension. If primarity is not specified by the user, it is considered as secondary. {{{#!c++ tlm::tlm_generic_payload *payload_ptr = new tlm::tlm_generic_payload(); soclib_payload_extension *extension_ptr = new soclib_payload_extension(); //Fill the transaction with the necessary datas ... //Set the transaction as primary - 2 methods extension_ptr->set_primary(); extension_ptr->set_primarity(true); //Set the transaction as secondary - 2 methods extension_ptr->set_secondary(); extension_ptr->set_primarity(false); //Default //Send the transaction the usual way ... //Retrieve the information extension_ptr->get_primarity(); //returns bool extension_ptr->is_primary(); //returns bool extension_ptr->is_secondary(); //returns bool }}} == 2. PDES Messages === 2.1. PDES Activity message The PDES activity message remains unmodified. The activity message intends to connect or disconnect an initiator from the interconnect with which it is linked. If the activity of an initiator is true, then the associated centralized buffer slot on the interconnect will be handled for the temporal arbitration, otherwise it won't. The following code is a recall from the linked TLMDT for SOCLIB specification. {{{#!c++ send to interconnect the initiator activity status void my_initiator::sendActivity() { tlm::tlm_generic_payload *payload_ptr = new tlm::tlm_generic_payload(); soclib_payload_extension *extension_ptr = new soclib_payload_extension(); tlm::tlm_phase phase; sc_core::sc_time time; // set the active or inactive command if(m_pdes_activity_status->get()) extension_ptr->set_active(); else extension_ptr->set_inactive(); // set the extension to tlm payload payload_ptr->set_extension (extension_ptr); //set the tlm phase phase = tlm::BEGIN_REQ; //set the local time to transaction time time = m_pdes_local_time->get(); //send a message with command equals to PDES_ACTIVE or PDES_INACTIVE p_vci_init->nb_transport_fw(*payload_ptr, phase, time); //wait a response wait(m_rspEvent); } }}} Usage : {{{#!c++ //active the initiator and inform to interconnect m_pdes_activity_status->set(true); //desactive the initiator and inform to interconnect m_pdes_activity_status->set(false); sendActivity(); }}} === 2.2. PDES Null_command The Null_command is a message with does not require a response. Its only goal is to deliver a temporal information, in order to preserve the synchronization between components. Mostly, the initiator doesn't stop when the null_command message is sent, except if it is waiting for the response from a transaction. This messages allows the interconnects to respect their time quantums, regarding to its targets. It is also used to perform the Active_Sync on initiators. The message scope is local, it cannot be routed or redirected. Receiving a null_command awakes the target if it is waiting. Initiators and interconnects can send Null_commands. {{{#!c++ tlm::tlm_generic_payload *payload_ptr = new tlm::tlm_generic_payload(); soclib_payload_extension *extension_ptr = new soclib_payload_extension(); //set as a null_command - 2 methods extension_ptr->set_null_command(); extension_ptr->set_command(PDES_NULL_COMMAND); // set the extension to tlm payload payload_ptr->set_extension (extension_ptr); //set the tlm phase phase = tlm::BEGIN_REQ; //set the local time to transaction time time = m_pdes_local_time->get(); p_vci_init->nb_transport_fw(*payload_ptr, phase, time); //Retrieve information extension_ptr->is_null_command(); }}} === 2.3. PDES Null_response The Null_response is a message which transits on networks like a VCI responses. Only transactions which are blocking or conditionnaly blocking needs to receive Null_responses. The meaning of this message is : "The response to the associated VCI transaction won't be caught before this time". The Null_response is part of the Passive/Active_Sync mechanisms. It is used to predict the future of simulation. The only useful data contained in the Null_response is the temporal information. When an initiator receives a Null_response instead of a VCI response, it allows itself to pursue its treatment, neglecting the need of the VCI response, until the Null_response time. In order to properly aim the right transaction on the right initiator, the VCI transaction is reused for the Null_response. When the Null_response is sent, only the target of this message is awaken. Multiple Null_responses can be sent for a single VCI transaction. Successives Null_response's time related to the same transaction needs to grow. Interconnects and targets can generate and transmit the Null_responses. Null_responses are useful for preventing deadlocks related to synchronization. For performance optimizations, an interconnect can neglect generating a Null_response when the associated transaction is primary. Since Null_response allows to release parallelism of the simulation, it seems important to send it with the highest time possible. {{{#!c++ soclib_payload_extension *extension_ptr; payload_ptr->get_extension(extension_ptr); //Set the Null_response flag - 2 methods extension_ptr->set_null_response(); extension_ptr->set_command(PDES_NULL_RESPONSE); //set the tlm phase phase = tlm::BEGIN_RSP; //set the local time to transaction time time = m_pdes_local_time->get(); p_vci_target->nb_transport_bw(*payload_ptr, phase, time); //Retrieve information extension_ptr->is_null_response(); }}} === 2.4. PDES Sync transaction The Sync transaction is composed of a command and a response. It is used by initiators which went ahead their quantum, in order to keep the synchronization between initiators connected on the same interconnect. The transaction's scope is local. Its response is sent when it is arbitrated on the interconnect. This means that every other initiators connected to this interconnect got a higher time, so it can pursue its treatment until the next quantum. {{{#!c++ //... Initiator tlm::tlm_generic_payload *payload_ptr = new tlm::tlm_generic_payload(); soclib_payload_extension *extension_ptr = new soclib_payload_extension(); //Set the Sync_cmd flag - 2 methods extension_ptr->set_sync(); extension_ptr->set_command(PDES_SYNC); //set the tlm phase phase = tlm::BEGIN_REQ; //set the local time to transaction time time = m_pdes_local_time->get(); p_vci_initiator->nb_transport_fw(*payload_ptr, phase, time); wait(m_rspEvent); //... Interconnect //Retrieve information extension_ptr->is_sync(); //when arbitrated, send the response phase = tlm::BEGIN_RSP; p_vci_target->nb_transport_bw(*payload_ptr, phase, time); //... Initiator (callback function) //Sync is done, initiator can pursue its treatment if(extension_ptr->is_sync()){ notify(m_rspEvent); } }}} === 2.5. Passive_Sync / Active_Sync According to the TLMDT for SOCLIB specification, an initiator which sends a blocking request will be completely locked until the response comes back. However, targets are not ever only reactive and won't ever immediately answer to a request, possibly waiting another request which could be handled before the first one, due to the target structure. In this case, if the initiator doesn't transmit a greater time information than the one of its last request, the related interconnect won't be able to route any other request to the waiting target. The null_response is the message which allows to inform an initiator that it needs to increase its own time up to the one in the message. There are two methods for handling the null_responses. The first and easiest one is to continue to consider that an initiator is fully locked until the real response is caught. This way, the only thing to do when a null_response is caught is to send a null_command to the interconnect with the same time information as the null_response. This doesn't even need to wake up the initiator. This method is called Passive_Sync. The second one is dedicated for advanced multitransactionnal components modeling. There is a gap between the null_response time and the initiator one. During this gap, there can be some useful cycles to simulate which can also initiate a transaction. In order to prevent this eventual request to be delayed, the cycles in the gap need to be simulated. When a null_response is caught, the initiator is woken up and is allowed to pursue its treatment until a new transaction is sent or the null_response time is matched, resulting in the sending of a null_command. This method is called Active_Sync. == 3. Efficient time modeling in a multi-transactionnal VCI Component Mostly, a multi-transactionnal component is composed of several threads in its CABA model. Those threads are used to model various behaviors, such as the control of access to a material resource or the resource usage by a dataflow. In a CABA simulation, a multi-thread component is effective because threads advance their time all together. The only issue which can occur is in case of concurrent accesses on a single material resource. In a TLMDT simulation, the possible desynchronization between threads has to be taken into account. This induce the need of a strong synchronization between each threads in order to prevent accidental transaction's reordering. Moreover, the cost of this synchronisation isn't negligible. However, in TLMDT a component is modeled using a single thread. In order to represent the multi-thread function, every CABA thread is modeled by a timer in the TLMDT model. The internal modeling of a component will then be divided into two major sections. The first one will represent a scheduler whose job is to determine which action can be computed, while the second one will perform the elected action treatment. When a treatment is started, it won't be stopped unless it is over or an access to a shared material resource is requested. == 4. VCI Initiator modeling The initiators need to be modeled using Active_Sync or Passive_Sync and should consider the primarity and blocking_type mechanisms. For anything else, the initiators remain unchanged. == 5. VCI Target-Initiator (decoupled) modeling A decoupled Target-Initiator component can be seen as two components, a regular target and a regular initiator. The interesting part is the interactions between those two. When the target receives a message, it is transmitted to the initiator which will instantly send an activity message if it was inactive. The initiator part will pursue its treatment until it has nothing else to do but to wait and then send an inactivity message. == 6. VCI Target-Initiator (coupled) modeling A coupled Target-Initiator acts like a simple initiator but needs to consider the input timer and some suppositions on the capabilities of the next incoming transaction. It cans also generate null_responses. == 7. VCI Target modeling There are no changes in the target modeling. == 8. VCI Local Crossbar modeling The interconnects are the most modified component with the "TLMDT for tightly interdependent architectures with several levels of interconnections" specification. The Local Crossbar has to handle the new synchronisation protocol. It has a time quantum (Δqlc), which determine the maximum allowed desynchronization for each target. Pseudo code : {{{#!c++ //Global Synchronisation // T = Time - Q = quantum If (T.global_input + Qqlc < T.local_crossbar) If (null_command of vci_transaction received from global crossbar) T.global_input = T.local_crossbar Else If(arbitration ok : Req = handled request) T.local_crossbar = T.Req //Local Synchonisation For every local target if(T.local_target + Qqlc < T.local_crossbar) send a null_command to the target T.local_target = T.local_crossbar //Global Synchronisation If (T.global_input + Qqlc < T.local_crossbar) send a null_command to the global_crossbar //Initiators Synchronisation Else If (Req.type == Sync_request) send the response to this transaction //Cluster Unlock Else If (Req.type == Null_command && input != global_input) send a Null_command to every local target and global crossbar //Routing Else If (Req.type == vci_transaction && input == global_input) T.global_input = T.local_crossbar Routage Else If (Req.type == vci_transaction && input != global_input) Routage Else //arbitration ko send a null_response for any non primary blocking or conditionnaly blocking request for which the interconnect did not sent one before. wait for a new incoming transaction }}} == 9. VCI Global Crossbar modeling The Global Crossbar is even more impacted than the local crossbar because of the release of synchronisation on it in order to break the strong global dependencies and ever allow every cluster to feed its targets with times. Its duty is to release all clusters which are not too much in advance (determined by its time quantum - Δqgc) and to route transactions. Considering that a cluster can be released even if another one got a lower timer and that a cluster doesn't need to synchronize with the Global Crossbar until the desynchronization timer reaches a certain value, there will be a loss in precision but this allows an increase in the parallelization of the simulation and a reduced amount of PDES transactions. Pseudo code : {{{#!c++ T.global_crossbar = min(all T inputs) For every input If (T.input <= T.global_crossbar + Qqgc) //Synchronization If(Req.type == Null_command) Send back another null_command with the same temporal information. //Routing If(Req.type == vci_transaction) T.Req = T.target_port Routing Else //Init Unlcok If(Req.type == vci_transaction) Send a null_response }}} == 10. Proof of the deadlock free feature What is to be proven : If a vci_transaction (local) waits on the external_crossbar (TSAR), it will be arbitrated. If the vci_transaction is global, the local external_crossbar input will have a greater time due to the null_response treatment. If there is more than one vci_transaction, the cluster the lowest one is considered, the others are already allowing the arbitration and can be neglect. {{{#!c++ //Qqt : quantum targets - Qqlc : quantum local crossbar - Qqgc : quantum global crossbar //Δlocal - Δglobal : routing delays //EC : External_crossbar //C : Cluster which transmitted the transaction and need arbitration }}} {{{#!c++ //Demonstrating that with condition : for every cluster else than C : T.EC_input > T.EC_input.C //Time of the pending request T.EC_input.C = minT.C.initiator = minT.global_crossbar_inputs - Δlocal //due to null_responses //Condition for locking the cluster : minT.initiator + Δlocal > minT.global_crossbar_inputs + Qqgc minT.initiator > T.EC_input.C + Qqgc //Condition for a memory cache to get a temporal information T.local_crossbar.memory_cache_output +Qqlc <= T.local_crossbar T.local_crossbar - Qqlc >= T.memory_cache //Condition for an input from the external_crossbar to get a temporal information T.memory_cache.EC_output + Qqt <= T.memory_cache T.memory_cache - Qqt >= T.EC_input T.local_crossbar - Qqlc - Qqt >= T.memory_cache - Qqt >= T.EC_input //minimum timer without update minT.EC_input = T.local_crossbar - Qqlc - Qqt + 1 //The clusters are locked T.local_crossbar > T.EC_input.C + Qqgc T.local_crossbar - Qqlc - Qqt + 1 > T.EC_input.C + Qqgc - Qqlc - Qqt + 1 minT.EC_input > T.EC_input.C + Qqgc - (Qqlc + Qqt) +1 Then if Qqgc - (Qqlc + Qqt) + 1 > 0 , The inequality -- minT.EC_input > T.EC_input.C -- is true too Thus Qqgc - (Qqlc + Qqt) + 1 > 0 Qqgc > Qqlc + Qqt - 1 ------------------- Qqgc >= Qqlc + Qqt ------------------- QED }}} == 11. Locating the loss in precision == 12. Adjust precision / performance for a simulation