wiki:TLMDT

Version 13 (modified by gioja, 14 years ago) (diff)

--

TLMDT Modeling for tightly interdependent architectures with several levels of interconnections

0. Introduction

This document is still under development.

This TLMDT specification is strongly based on the TLMDT for SOCLIB one.
Several rules are preserved :

  • Method used to model a VCI / PDES transaction (payload with extension / phase / time)
  • Global modeling of the VCI Initiator and VCI Target
  • Global description on the interconnect's work
  • PDES : activity message

Some others are not :

  • PDES : null-message
  • PDES : token (not described in the link)
  • Interconnect's inner synchronisation mechanisms

This TLMDT specification is needed to prevent deadlocks and to greatly promote performance & parallelization with minor loss to precision on architecture that are not only composed of simple initiators and targets linked through a single network.
List of components' behaviors which exists on TSAR and cannot be efficiently modeled without this specification :

  • Multi-transactionnal initiators
  • Multiple networks on which a single component can (directly or not) be an initiator for several ports of an interconnect.
  • Components which are target and initiator at the same time.

1. VCI Transactions

The VCI Transactions representation mostly remains the same. Two new concepts are introduced for optimization and performance gain. They both got default value which will allow to skip this aspect for convenience. However, this will, in some cases, increase the part of PDES communication (compared to the VCI one) and slow down the simulation.

1.1. Blocking type

Basically, when an initiator sends a blocking transaction, it needs to be stopped and it waits the 'response caught' event to be awaken. When the response to the blocking transaction is caught, if the initiator is mono-transactionnal, or handled as such, its time is updated with the response's one. In case of a multi-transactionnal initiator, another transaction could be sent before the response's time, so the component needs to simulate those cycles too. If the initiator doesn't progress on its time while waiting the response, it is called Passive_Sync. On the contrary, it is called Active_Sync. The machanisms of this concept are detailed later on this page.

Sometimes, the response treatment of a transaction doesn't impact the continuation of the simulation. In this case, it's a waste of energy to wait for a response which could be neglected, thus the initiator could pursue its treatment. Such a non-blocking transaction does not prevent the initiator to exceed the defined time quantum if the synchronization timer is reseted when it is sent.

However, when an initiator sends a transaction, it needs to be stored until the response comes back. In practice, the data structure which stores the requests has a fixed maximum size. When the buffer is full, a response needs to be caught in order to free a slot, thus the initiator needs to be stopped and it waits the 'response caught' event from any of these transactions to be awaken. This is an alternative to the non-blocking transactions which is more precise ,especially when the buffer is often full, but more difficult to implement. This type of transaction is called conditionaly-blocking

The blocking type of a transaction is a 2 bits data (char) stored in its payload extension. If blocking type is not specified by the user, it is considered as blocking.

tlm::tlm_generic_payload *payload_ptr = new tlm::tlm_generic_payload();
soclib_payload_extension *extension_ptr = new soclib_payload_extension();

//Fill the transaction with the necessary datas
...

//Set the transaction as blocking - 2 methods
extension_ptr->set_blocking();
extension_ptr->set_blocking_type(BLOCKING); //Default

//Set the transaction as non-blocking - 2 methods
extension_ptr->set_non_blocking();
extension_ptr->set_blocking_type(NON_BLOCKING);

//Set the transaction as conditionaly-blocking - 2 methods
extension_ptr->set_cond_blocking();
extension_ptr->set_blocking_type(COND_BLOCKING);

//Send the transaction the usual way
...

//Retrieve the information
extension_ptr->get_cond_blocking(); //returns char
extension_ptr->is_blocking();       //returns bool
extension_ptr->is_non_blocking();   //returns bool
extension_ptr->is_cond_blocking();  //returns bool

1.2. Primarity

A VCI Transaction can either be primary or secondary. The VCI transaction is primary only if it is not related to another transaction. That is to say when the response will be caught and treated, no response to another transaction will be sent. A pure Initiator will only send primary transactions while a Target-Initiator will be able to send both.

The Primarity of a transaction is a boolean stored in its payload extension. If primarity is not specified by the user, it is considered as secondary.

tlm::tlm_generic_payload *payload_ptr = new tlm::tlm_generic_payload();
soclib_payload_extension *extension_ptr = new soclib_payload_extension();

//Fill the transaction with the necessary datas
...

//Set the transaction as primary - 2 methods
extension_ptr->set_primary();
extension_ptr->set_primarity(true);

//Set the transaction as secondary - 2 methods
extension_ptr->set_secondary();
extension_ptr->set_primarity(false); //Default

//Send the transaction the usual way
...

//Retrieve the information
extension_ptr->get_primarity(); //returns bool
extension_ptr->is_primary();    //returns bool
extension_ptr->is_secondary();  //returns bool

2. PDES Messages

2.1. PDES Activity message

The PDES activity message remains unmodified. The activity message intends to connect or disconnect an initiator from the interconnect with which it is linked. If the activity of an initiator is true, then the associated centralized buffer slot on the interconnect will be handled for the temporal arbitration, otherwise it won't. The following code is a recall from the linked TLMDT for SOCLIB specification.

send to interconnect the initiator activity status
void my_initiator::sendActivity()
{
  tlm::tlm_generic_payload *payload_ptr = new tlm::tlm_generic_payload();
  soclib_payload_extension *extension_ptr = new soclib_payload_extension();
  tlm::tlm_phase            phase;
  sc_core::sc_time          time;

  // set the active or inactive command
  if(m_pdes_activity_status->get()) extension_ptr->set_active();
  else extension_ptr->set_inactive();
  // set the extension to tlm payload
  payload_ptr->set_extension (extension_ptr);
  //set the tlm phase
  phase = tlm::BEGIN_REQ;
  //set the local time to transaction time
  time = m_pdes_local_time->get();
  //send a message with command equals to PDES_ACTIVE or PDES_INACTIVE
  p_vci_init->nb_transport_fw(*payload_ptr, phase, time);
  //wait a response
  wait(m_rspEvent);
}

Usage :

//active the initiator and inform to interconnect
m_pdes_activity_status->set(true);

//desactive the initiator and inform to interconnect
m_pdes_activity_status->set(false);

sendActivity();

2.2. PDES Nolock_command

The Nolock_command is a message which does not require a response. Its only goal is to deliver a temporal information, in order to preserve the lockfree feature. When such a message is initiated, some others will have to be sent in order to reach the bottom of the architecture. This allows to locally broadcast a temporal information across different level of interconnection in order to prevent a possible lock. The initiator needs to wait when a Nolock_command message is sent. It is used to perform the Active_Sync on initiators. The message scope is local, it cannot be routed or redirected. Receiving a nolock_command awakes the target if it is waiting. Initiators, Local interconnects and Target-Initiators can send Nolock_commands.

tlm::tlm_generic_payload *payload_ptr = new tlm::tlm_generic_payload();
soclib_payload_extension *extension_ptr = new soclib_payload_extension();

//set as a null_command - 2 methods
extension_ptr->set_null_command();
extension_ptr->set_command(PDES_NULL_COMMAND);
// set the extension to tlm payload
payload_ptr->set_extension (extension_ptr);
//set the tlm phase
phase = tlm::BEGIN_REQ;
//set the local time to transaction time
time = m_pdes_local_time->get();
p_vci_init->nb_transport_fw(*payload_ptr, phase, time);

//Retrieve information
extension_ptr->is_null_command();

2.3. PDES Sync_response

The Sync_response is a message which transits on networks like a VCI responses. Only transactions which are blocking or conditionnaly blocking needs to receive Sync_responses. The meaning of this message is : "The response to the associated VCI transaction won't be caught before this time". The Sync_response is part of the Passive/Active_Sync mechanisms. It is used to predict the future of simulation. The only useful data contained in the Sync_response is the temporal information. When an initiator receives a Sync_response instead of a VCI response, it allows itself to pursue its treatment, neglecting the need of the VCI response, until the Sync_response time. In order to properly aim the right transaction on the right initiator, the VCI transaction is reused for the Sync_response. When the Sync_response is sent, only the target of this message is awaken. Multiple Sync_responses can be sent for a single VCI transaction. Successives Sync_response's time related to the same transaction needs to grow.

Interconnects and targets can generate and transmit the Sync_responses. Sync_responses are useful for preventing deadlocks related to synchronization. For performance optimizations, an interconnect can neglect generating a Sync_response when the associated transaction is primary. Since Sync_response allows to release parallelism of the simulation, it seems important to send it with the highest time possible.

soclib_payload_extension *extension_ptr;
payload_ptr->get_extension(extension_ptr);

//Set the Sync_response flag - 2 methods
extension_ptr->set_sync_response();
extension_ptr->set_command(PDES_SYNC_RESPONSE);
//set the tlm phase
phase = tlm::BEGIN_RSP;
//set the local time to transaction time
time = m_pdes_local_time->get();
p_vci_target->nb_transport_bw(*payload_ptr, phase, time);

//Retrieve information
extension_ptr->is_sync_response();

2.4. PDES Sync_request

The Sync_request is composed of a command and a response. It is used by initiators which went ahead their quantum, in order to keep the synchronization between initiators connected on the same interconnect. The transaction's scope is local. Its response is sent when it is arbitrated on the interconnect. This means that every other initiators connected to this interconnect got a higher time, so it can pursue its treatment until the next quantum.

//... Initiator  
tlm::tlm_generic_payload *payload_ptr = new tlm::tlm_generic_payload();
soclib_payload_extension *extension_ptr = new soclib_payload_extension();

//Set the Sync_cmd flag - 2 methods
extension_ptr->set_sync();
extension_ptr->set_command(PDES_SYNC);

//set the tlm phase
phase = tlm::BEGIN_REQ;
//set the local time to transaction time
time = m_pdes_local_time->get();
p_vci_initiator->nb_transport_fw(*payload_ptr, phase, time);
wait(m_rspEvent);

//... Interconnect
//Retrieve information
extension_ptr->is_sync();
//when arbitrated, send the response
phase = tlm::BEGIN_RSP;
p_vci_target->nb_transport_bw(*payload_ptr, phase, time);

//... Initiator (callback function)
//Sync is done, initiator can pursue its treatment
if(extension_ptr->is_sync()){
   notify(m_rspEvent);
}

2.5. PDES Sync_command

The Sync_command is similar to a Sync_request except that it doesn't need a response. It is used for Interconnects and Target-Initiator components which are bound to the requests they receives in order to increase their time.

2.6. Passive_Sync / Active_Sync

According to the TLMDT for SOCLIB specification, an initiator which sends a blocking request will be completely locked until the response comes back. However, targets are not ever only reactive and won't ever immediately answer to a request, possibly waiting another request which could be handled before the first one, due to the target structure. In this case, if the initiator doesn't transmit a greater time information than the one of its last request, the related interconnect won't be able to route any other request to the waiting target. The sync_response is the message which allows to inform an initiator that it needs to increase its own time up to the one in the message.

There are two methods for handling the sync_responses. The first and easiest one is to continue to consider that an initiator is fully locked until the real response is caught. This way, the only thing to do when a sync_response is caught is to send a nolock_command to the interconnect with the same time information as the sync_response. This doesn't even need to wake up the initiator. This method is called Passive_Sync.

The second one is dedicated for advanced multitransactionnal components modeling. There is a gap between the sync_response time and the initiator one. During this gap, there can be some useful cycles to simulate which can also initiate a transaction. In order to prevent this eventual request to be delayed, the cycles in the gap need to be simulated. When a sync_response is caught, the initiator is woken up and is allowed to pursue its treatment until a new transaction is sent or the sync_response time is matched, resulting in the sending of a nolock_command. This method is called Active_Sync.

3. Efficient time modeling in a multi-transactionnal VCI Component

Mostly, a multi-transactionnal component is composed of several threads in its CABA model. Those threads are used to model various behaviors, such as the control of access to a material resource or the resource usage by a dataflow.

In a CABA simulation, a multi-thread component is effective because threads advance their time all together. The only issue which can occur is in case of concurrent accesses on a single material resource. In a TLMDT simulation, the possible desynchronization between threads has to be taken into account. This induce the need of a strong synchronization between each threads in order to prevent accidental transaction's reordering. Moreover, the cost of this synchronisation isn't negligible.

However, in TLMDT a component is modeled using a single thread. In order to represent the multi-thread function, every CABA thread is modeled by a timer in the TLMDT model. The internal modeling of a component will then be divided into two major sections. The first one will represent a scheduler whose job is to determine which action can be computed, while the second one will perform the elected action treatment. When a treatment is started, it won't be stopped unless it is over or an access to a shared material resource is requested.

4. VCI Initiator modeling

The initiators need to be modeled using Active_Sync or Passive_Sync and should consider the primarity and blocking_type mechanisms. For anything else, the initiators remain unchanged.

5. VCI Target-Initiator (decoupled) modeling

A decoupled Target-Initiator component can be seen as two components, a regular target and a regular initiator. The interesting part is the interactions between those two. When the target receives a message, it is transmitted to the initiator which will instantly send an activity message if it was inactive. The initiator part will pursue its treatment until it has nothing else to do but to wait and then send an inactivity message.

6. VCI Target-Initiator (coupled) modeling

A coupled Target-Initiator acts like a simple initiator but needs to consider the input timer and some suppositions on the capabilities of the next incoming transaction. It cans also generate sync_responses.

7. VCI Target modeling

There are no changes in the target modeling.

8. VCI Local Crossbar modeling

The interconnects are the most modified component with the "TLMDT for tightly interdependent architectures with several levels of interconnections" specification. The Local Crossbar has to handle the new synchronisation protocol. It has a time quantum (Δqlc), which determine the maximum allowed desynchronization for each target.

Pseudo code :

//Global Synchronisation
// T = Time - Q = quantum 
If (T.global_input + Qqlc < T.local_crossbar)
   If (sync_command of vci_transaction received from global crossbar)
      T.global_input = T.local_crossbar
Else If(arbitration ok : Req = handled request)
   T.local_crossbar = T.Req
   //Local Synchonisation
   For every local target
      if(T.local_target + Qqlc < T.local_crossbar)
         send a sync_command to the target
         T.local_target = T.local_crossbar
   //Global Synchronisation
   If (T.global_input + Qqlc < T.local_crossbar)
      send a sync_command to the global_crossbar
   //Initiators Synchronisation
   Else If (Req.type == Sync_request)
      send the response to this transaction
   Else If (Req.type == Sync_command)
      nothing else to do //time update already done
   //Cluster Unlock
   Else If (Req.type == Nolock_command)
      send a Nolock_command to every local target and to the global crossbar
   //Routing
   Else If (Req.type == vci_transaction && input == global_input)
      T.global_input = T.local_crossbar
      Routage
   Else If (Req.type == vci_transaction && input != global_input)
      Routage
Else //arbitration ko
   send a sync_response for any non primary blocking or conditionnaly blocking request for which the interconnect did not sent one before.
   wait for a new incoming transaction

9. VCI Global Crossbar modeling

The Global Crossbar is even more impacted than the local crossbar because of the release of synchronisation on it in order to break the strong global dependencies and ever allow every cluster to feed its targets with times. Its duty is to release all clusters which are not too much in advance (determined by its time quantum - Δqgc) and to route transactions. Considering that a cluster can be released even if another one got a lower timer and that a cluster doesn't need to synchronize with the Global Crossbar until the desynchronization timer reaches a certain value, there will be a loss in precision but this allows an increase in the parallelization of the simulation and a reduced amount of PDES transactions.

Pseudo code :

T.global_crossbar = min(all T inputs)
For every input
   If (T.input <= T.global_crossbar + Qqgc)
      //Synchronization
      If(Req.type == Sync_command || Req.type == Nolock_command)
         Send back another Sync_command with the same temporal information.
      //Routing
      If(Req.type == vci_transaction)
         T.Req = T.target_port
         Routing
   Else
      //Init Unlcok
      If(Req.type == vci_transaction)
         Send a null_response

10. Proof of the deadlock free feature

What is to be proven : If a vci_transaction (local) waits on the external_crossbar (TSAR), it will be arbitrated. If the vci_transaction is global, the local external_crossbar input will have a greater time due to the null_response treatment. If there is more than one vci_transaction, the cluster the lowest one is considered, the others are already allowing the arbitration and can be neglect.

//Qqt : quantum targets - Qqlc : quantum local crossbar - Qqgc : quantum global crossbar
//Δlocal - Δglobal : routing delays
//EC  : External_crossbar
//C   : Cluster which transmitted the transaction and need arbitration
//Demonstrating that with condition : for every cluster else than C :
T.EC_input > T.EC_input.C

//Time of the pending request
T.EC_input.C = minT.C.initiator = minT.global_crossbar_inputs - Δlocal //due to null_responses

//Condition for locking the cluster :
minT.initiator + Δlocal > minT.global_crossbar_inputs + Qqgc
minT.initiator > T.EC_input.C + Qqgc

//Condition for a memory cache to get a temporal information
T.local_crossbar.memory_cache_output +Qqlc <= T.local_crossbar
T.local_crossbar - Qqlc >= T.memory_cache

//Condition for an input from the external_crossbar to get a temporal information
T.memory_cache.EC_output + Qqt <= T.memory_cache
T.memory_cache - Qqt >= T.EC_input
T.local_crossbar - Qqlc - Qqt >= T.memory_cache - Qqt >= T.EC_input

//minimum timer without update
minT.EC_input = T.local_crossbar - Qqlc - Qqt + 1

//The clusters are locked
T.local_crossbar > T.EC_input.C + Qqgc
T.local_crossbar - Qqlc - Qqt + 1 > T.EC_input.C + Qqgc - Qqlc - Qqt + 1
minT.EC_input > T.EC_input.C + Qqgc - (Qqlc + Qqt) +1

Then if Qqgc - (Qqlc + Qqt) + 1 > 0 , The inequality  -- minT.EC_input > T.EC_input.C -- is true too

Thus

Qqgc - (Qqlc + Qqt) + 1 > 0
Qqgc > Qqlc + Qqt - 1
-------------------
Qqgc >= Qqlc + Qqt
-------------------

QED

11. Locating the loss in precision

Due to the release of synchronization on the global crossbar, the global vci_transaction will possibly be handled with up to Qqgc + Qqlc cycles of positive delay. This will also allow transactions unexpected reordering, but this was already true because the interconnect topology is not accurately modeled in TLMDT. Apart this increase on an existing lack of precision, no new imprecision is introduced. Anyway, the imprecision is positive and if the result is checked to know whether the architecture is fast enough, the TLMDT can be pessimistic but won't be optimistic.

12. Adjust precision / performance for a simulation

As said above, the increase in imprecision is due to the released approach of the global crossbar model. However, the imprecision on a single transaction is bounded by the values of the local and global quantums. With extremely low values on those quantums, the increase in imprecision should be negligible.

On the contrary, reducing the quantum to extremely low values will increase the quantity of overall PDES transactions in order to keep the synchronization alive, while extremely high values on those quantums will almost suppress the PDES transactions and allow high performance.