Changes between Version 45 and Version 46 of InterconnexionNetworks


Ignore:
Timestamp:
Mar 19, 2013, 12:53:36 PM (12 years ago)
Author:
alain
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • InterconnexionNetworks

    v45 v46  
    33= Communication Infrastructure =
    44
    5 == 1. The 3 interconnection networks ==
    6 
    7 The TSAR architecture defines three logically independent VCI compliant networks, that are fully separated for dead-lock prevention :
     5== 1. The interconnection networks ==
     6
     7The TSAR architecture uses the DSPIN network on chip infrastructure to define three independent networks, that are fully separated for dead-lock prevention :
    88 
    9  * The '''Direct Network''' implements the 40 bits TSAR physical address space that is visible by the software. It  transports the direct READ, WRITE, LL, SC and CAS transactions from any VCI initiator (typically a L1 cache controller or another hardware coprocessor with a DMA capability) to any VCI target (typically a memory cache controller, or a memory mapped peripheral).
    10 
    11  * The '''Coherence Network''' implements a separated address space, used to transport the coherence transactions between memory cache controllers and L1 cache controllers. This address space is not visible by the software.
    12 
    13  * The '''External Network''' implements a 34 bits physical address space.This network transports the PUT and GET transactions from the memory cache controller to the external RAM controller, in case of MISS or cache line replacement in the memory cache. This address space is not visible by the software.
    14 
    15 == 2.  VCI initiators & targets indexing ==
    16 
    17 A given hardware component can have several VCI ports. For example the L1 cache has three VCI ports : one initiator port to the direct network, one initiator port to the coherence network, and one target port on the coherence network. Each VCI port can have a different identifier that is defined by three indexes :
     9 * The '''Direct Network''' implements the 40 bits TSAR physical address space that is visible by the software. It  transports the direct READ, WRITE, LL, SC and CAS transactions from any VCI initiator (typically a L1 cache controller or another hardware coprocessor with a DMA capability) to any VCI target (typically a memory cache controller, or a memory mapped peripheral). All VCI packets are translated to DSPIN packets by specific VCI/DSPIN wrappers. There is actually two physically separated networks for command packets and response packets. Both networks have a two-level hierarchical structure with a local interconnect in each cluster (that can be implemented as a local crossbar, or as a local ring), and a global interconnect (implemented as a 2D mesh).
     10
     11 * The '''Coherence Network''' is used to transport the coherence packets implementing the DHCCP coherence protocol between L2 cache controllers and L1 cache controllers. This network is not visible by the software, and does not use wrappers, as the L1 and L2 cache controllers use directly the DSPIN packet format. Here again there is two physically separated networks to transport L2-to-L1 packets, and to transport L1-to-L2 packets. Both networks have a two-level hierarchical structure with a local interconnect in each cluster (that can be implemented as a local crossbar, or as a local ring), and a global interconnect (implemented as a 2D mesh).
     12
     13 * The '''Direct Network''' and the '''coherence Network''' are physically separated in each cluster, but
     14they are only logically separated for the global communications: Regarding the local interconnect, there is four physically separated local crossbars (or local ring) transporting the ''direct command'', ''direct response'', '' coherence L1-to-L2'', ''coherence L2-to-L1'' packets.  Regarding the global interconnect, the DSPIN infrastructure supporting virtual channels, the ''direct command'' and the ''coherence L2-to-L1" packets are multiplexed on the same 2D mesh (40 bits DSPIN flit width). Similarly, the ''direct response'' and ''coherence L1-to-L2'' packets are multiplexed on the same 2D mesh (33 bits DSPIN width).
     15
     16 * The '''External Network''' supports communications between the L2 caches and the ''tiles'' implementing the 3D L3 cache, in case of MISS or cache line replacement in the L2 caches. It has a 3D mesh topology and the DSPIN flit width is 64 bits. This external network addressing space is not visible by the software.
     17
     18 * A given hardware component can be connected to several networks. For example the L1 cache has one VCI initiator port to the direct network, and one DSPIN port to the coherence network. The L2 cache has one VCI target port on the direct network, one DSPIN port on the coherence network, and one DSPIN port to the external network.
     19
     20== 2.  VCI initiators & targets indexing on direct network ==
     21
     22On the direct network, each VCI port has an identifier that is defined by three indexes :
    1823
    1924 * '''X_ID''' is the cluster X-coordinate.
    2025 * '''Y_ID''' is the cluster Y-coordinate.
    21  * '''L_ID''' is the local index inside the cluster.
    22 
    23 An hardware component that has several VCI ports can have several different values for the L_ID local index.
     26 * '''L_ID''' is the local index inside the cluster.
    2427
    2528The X_ID, Y_ID and L_ID are coded on NX, NY, NL bits respectively.
    26 The NX, NY and NL parameters are global for a given instance of the TSAR architecture.  NX & NY cannot be larger than 5 (no more than 1024 clusters),
    27 but can be smaller, if the number of clusters is smaller than 1024. NL is equal to 4 (no more than 16 target ports or 16 initiator ports per cluster).
    28 
    29 In order to simplify the hardware implementation of the memory coherence protocol, the L_ID values are standardized on the coherence network, and the same value is used for an initiator port and for a target port: If the number of processors per cluster is NPROCS, the processor L_ID value is between 0 and (NPROCS-1).  The memory cache L_ID is equal to NPROCS.
    30 
     29The NX, NY and NL parameters are global for a given instance of the TSAR architecture.  NX & NY cannot be larger than 5 (no more than 1024 clusters), but can be smaller, if the number of clusters is smaller than 1024. NL is equal to 4 (no more than 16 target ports or 16 initiator ports per cluster).
     30
     31In order to simplify the hardware implementation, the L_ID values defined for the direct network are
     32also used on the coherence network, and the same value is used for an initiator port and for a target port: If the number of processors per cluster is NPROCS, the LI cache L_ID value is between 0 and (NPROCS-1).  The L2 cache L_ID is equal to NPROCS.
    3133
    3234=== 2.1 Target identification ===
     
    3941
    4042 * According to the NUMA characteristics of the TSAR architecture, there is no transcoding of the X & Y fields, that directly define the target cluster coordinates  (X_INDEX, Y_INDEX).
    41  * The network decodes the LADR field to obtain the target L_ID, using a local routing table (implemented as a wired decoder in each local interconnect controller). The local routing tables and the number of bits NLADR to be decoded depend on the cluster.
     43 * The network hardware decodes the LADR field to obtain the target L_ID, using a local routing table (implemented as a wired decoder in each local interconnect controller). The local routing tables and the number of bits NLADR to be decoded can depend on the cluster.
    4244
    4345=== 2.2 Initiator identification ===
    4446
    45 The initiator identification is required to route a response packet. For both the direct and coherence networks,
     47The initiator identification is required to route a response packet.
    4648a VCI initiator is identified by the VCI SRCID & RSRCID fields (NX + NY + NL bits) :
    4749
     
    6769The TSAR architecture uses one single bit for the VCI RERROR field, even if the DSPIN infrastructure supports 2 bits for the error field.
    6870
    69 There are 8 transaction types on the direct network, that are encoded through the VCI fields '''CMD''' and '''PKTID'''. The '''PKTID''' field in TSAR is 4 bits long, but the MSB is ignored (reserved for future use).
     71There are 8 transaction types on the direct network, that are encoded through the VCI fields '''CMD''' and '''PKTID'''. The PKTID MSB bit is ignored (reserved for future use).
     72This redundant encoding help to use in the TSAR architecture existing hardware components that do not
     73decode the PKTID field, and use only the CMD field.
    7074
    7175||TYPE          ||CMD (2 bits)||PKTID (4 bits)|| '''PKTID''' mnemo   || '''CMD''' mnemo ||
     
    8084||SC            ||00          ||X111          || TYPE_SC             || CMD_STORE_COND  ||
    8185
    82 Remarks on the '''PKTID''' field encoding :
    83  * for a TYPE_READ, bit 0 is set (resp. not set) for a miss (resp. uncached) request
    84  * for a TYPE_READ, bit 1 is set (resp. not set) for an instruction (resp. data) request
    85  * bit 2 can be used to check for a TYPE_READ (bit 2 = 0)
    86 
    87 When a given initiator can send several simultaneous transactions of a given type (such as several simultaneous '''WRITE''' transactions), the VCI '''TRDID''' field is used to discriminate them. The '''TRDID''' field is 4 bits, supporting up to 16 simultaneous transactions for a given initiator.
     86When a given initiator can send several simultaneous transactions of a given type (such as several simultaneous WRITE transactions), the VCI '''TRDID''' field is used to discriminate them. The '''TRDID''' field is 4 bits, supporting up to 16 simultaneous transactions for a given initiator.
    8887
    8988=== 3.1 VCI READ transaction ===
    9089
    91 A VCI '''READ''' command packet contains one flit. In case of burst, all addresses must within the same cache line.
    92  * The VCI '''CMD''' field must be set to CMD_READ.
    93  * The VCI '''TRDID''' field is not used by the L1 cache, but can be used by multi-channel DMA controllers to transmit the channel index.
    94  * The VCI '''PKTID''' field can be any of the 4 TYPE_READ_* of the previous table.
    95 
    96 A VCI '''READ''' response packet returns either
    97  * Up to 16 flits containing the uncached data in the '''RDATA''' field (for a '''PKTID''' = TYPE_READ_*_UNC).
    98  * Exactly 16 flits containing one word per flit in the '''RDATA''' field (for a '''PKTID''' = TYPE_READ_*_MISS).
     90A VCI '''READ''' command packet contains one flit. In case of burst, all addresses must within the same cache line. The VCI '''TRDID''' field can be used by multi-channel DMA controllers to transmit the channel index. A VCI '''READ''' response packet returns up to 16 flits.
    9991
    10092=== 3.2 VCI WRITE transaction ===
    10193
    102 A VCI '''WRITE''' command packet contains from 1 to 16 flits. In case of burst, all addresses must within the same cache line.
    103  * The VCI '''CMD''' field must be set to CMD_WRITE.
    104  * The VCI '''TRDID''' field is used by the L1 cache to index its write buffer. It can be used by multi-channel DMA controllers to transmit the channel index.
    105  * The VCI '''PKTID''' field must be TYPE_WRITE.
    106 
    107 A VCI '''WRITE''' response packet always returns a single flit with a 0 value in the '''RDATA''' field.
     94 * A VCI '''WRITE''' command packet contains from 1 to 16 flits. In case of burst, all addresses must within the same cache line. The VCI '''TRDID''' field is used by the L1 cache to index its write buffer. It can be used by multi-channel DMA controllers to transmit the channel index.
     95 * A VCI '''WRITE''' response packet contains one single flit.
    10896
    10997=== 3.3 VCI LL (Linked Load) transaction ===
    11098
    111 A VCI '''LL (Linked Load)''' command packet contains one single flit.
    112 ('''N.B.''': this request is only sent by a L1 cache and can only target a memory cache)
    113  * The VCI '''CMD''' field must be set to CMD_LOCKED_READ.
    114  * The VCI '''TRDID''' field is not used by the L1 cache.
    115  * The VCI '''PKTID''' field must be TYPE_LL.
    116 
    117 A VCI '''LL (Linked Load)''' response packet contains 2 flits :
    118  * The first flit contains in the '''RDATA''' field a signature returned by the memory cache for this LL reservation.
    119  * The second flit contains in the '''RDATA''' field the data that has been read in the memory cache.
     99 * '''N.B.''': this request is only sent by a L1 cache and can only target a memory cache.
     100 * A VCI '''LL''' command packet contains one single flit.
     101 * A VCI '''LL''' response packet contains 2 flits: The first flit contains in the '''RDATA''' field a signature returned by the memory cache for this LL reservation. The second flit contains in the '''RDATA''' field the data that has been read in the memory cache.
    120102
    121103=== 3.4 VCI SC (Store Conditional) transaction ===
    122104
    123 A VCI '''SC (Store Conditionnal)''' command packet contains 2 flits.
    124 ('''N.B.''': this request is only sent by a L1 cache and can only target a memory cache)
    125  * The VCI '''CMD''' field must be set to CMD_STORE_COND.
    126  * The VCI '''TRDID''' field is not used by the L1 cache.
    127  * The VCI '''PKTID''' field must be TYPE_SC.
    128  * The first flit contains in the '''WDATA''' field the signature obtained with the last LL operation at this address.
    129  * The second flit contains in the '''WDATA''' field the data to be written.
    130 
    131 A VCI '''SC (Store Conditional)''' response packet contains 1 flit.
    132  * The '''RDATA''' field contains 0 (resp. 1) to indicate an SC success (resp. failure).
     105 * '''N.B.''': this request is only sent by a L1 cache and can only target a memory cache.
     106 * A VCI '''SC''' command packet contains 2 flits. The first flit contains in the '''WDATA''' field the signature obtained with the last LL operation at this address. The second flit contains in the '''WDATA''' field the data to be written.
     107 * A VCI '''SC''' response packet contains 1 flit. The '''RDATA''' field contains 0 (resp. 1) to indicate an SC success (resp. failure).
    133108
    134109=== 3.5 VCI CAS (Compare & Swap) transaction ===
    135110
    136 A VCI '''CAS (Compare & Swap)''' command packet contains 2 flits.
    137 ('''N.B.''': this request is only sent by a L1 cache and can only target a memory cache)
    138  * The VCI '''CMD''' field must be set to CMD_STORE_COND.
    139  * The VCI '''TRDID''' field is not used by the L1 cache.
    140  * The VCI '''PKTID''' field must be TYPE_CAS.
    141  * The first flit contains in the '''WDATA''' field the old value of the data to be overwritten.
    142  * The second flit contains in the '''WDATA''' field the new value to be written.
    143 
    144 A VCI '''CAS (Compare & Swap)''' response packet contains 1 flit.
    145  * The '''RDATA''' field contains 0 (resp. 1) to indicate a CAS success (resp. failure).
     111 * '''N.B.''': this request is only sent by a L1 cache and can only target a memory cache.
     112 * A VCI '''CAS''' command packet contains 2 flits  The first flit contains in the '''WDATA''' field the old value of the data to be overwritten. The second flit contains in the '''WDATA''' field the new value to be written.
     113 * A VCI '''CAS''' response packet contains 1 flit. The '''RDATA''' field contains 0 (resp. 1) to indicate a CAS success (resp. failure).
    146114
    147115== 4.  DSPIN encoding of the various transaction types on the direct network ==
    148116
    149 The VCI command & response packets are translated (actually serialized) to a more convenient DSPIN network format by the VCI/RING wrappers (in platform using the RING local interconnect) or by the VCI/DSPIN wrappers (in platforms using a XBAR local interconnect). These wrappers are located between the VCI initiator and target components and the DSPIN network. The DSPIN command packet width is 40 bits, and the DSPIN response packet width is 33 bits. The DSPIN interconnexion network uses only the following information to route both the DSPIN packets to the proper destination:
     117The VCI command & response packets are translated (actually serialized) to DSPIN network format by the VCI/RING wrappers (in platform using the RING local interconnect) or by the VCI/DSPIN wrappers (in platforms using a XBAR local interconnect). These wrappers are located between the VCI initiator and target components and the DSPIN network. The DSPIN command packet width is 40 bits, and the DSPIN response packet width is 33 bits. The DSPIN interconnexion network uses only the following information to route the DSPIN packets to the proper destination:
    150118 * The EOP flag, defining the last flit of a DSPIN packet.
    151119 * The LSB bit of the first flit is the BC flag,  defining a DSPIN broadcast packet.
    152  * For a non broadcast command packet (BC = 0), the (NX+NY+NL) MSB bits of the first field are used to route the packet to the proper destination.
    153  * For a broadcast packet (BC = 1), and the XMIN, XMAX, YMIN, YMAX fields (5 bits each), are used by the network to limit the broadcast.
    154 
    155 The DSPIN format can transport 40 bits VCI ADDRESS, and 14 bits VCI SRCID.
     120 * For a non broadcast packet (BC = 0), the (NX+NY+NL) MSB bits of the first field are used to route the packet to the proper destination.
     121 * For a broadcast packet (BC = 1), the XMIN, XMAX, YMIN, YMAX fields (5 bits each), are used by the network to limit the broadcast.
     122
     123The DSPIN format can transport up to 40 bits VCI ADDRESS, and up to 14 bits VCI SRCID.
    156124If the VCI ADDRESS use less than 40 bits (for example 32 bits), the DSPIN ADDRESS field is left aligned, and the LSB bits of the DSPIN field are completed with "0".
    157 If the SRCID field uses less than 14 bits (NX < 5 or NY < 5), the SRCID field is left aligned, and the LSB bits of the DSPIN field are completed with "O".
    158 
    159 The DSPIN packets formats are defined below:
     125If the SRCID field uses less than 14 bits (NX < 5 or NY < 5), the SRCID field is left aligned, and the LSB bits of the DSPIN field are completed with "0".
    160126
    161127=== 4.1 DSPIN Read Command packet format (40 bits) ===
     
    210176
    211177The coherence transactions are directly transmitted to the coherence network by the L1 caches and L2 caches in DSPIN format. The L2-to-L1 network uses 40 bits flits. The L1-to-L2 network uses 33 bits flits.
    212 There is 4 packets types from L2 to L1, and 2 packet types from L1 to L2. 
     178Broadcast commands are only used on the L2-to-L1 network, and use the BC bit in first flit.
     179
     180 * Other than BROADCAST, there is 5 packet types from L2 to L1 (3 bits encoding)
     181
     182|| TYPE      || BIT2 || BIT1 || BIT0 ||
     183||           ||      ||      ||      ||
     184||CLEANUP_ACK||  1   ||  *   ||  *   ||
     185||UPDATE_DATA||  0   ||  1   ||  1   ||
     186||UPDATE_INS ||  0   ||  1   ||  0   ||
     187||INVAL_DATA ||  0   ||  0   ||  1   ||
     188||INVAL_INS  ||  0   ||  0   ||  0   ||
     189
     190 * There is 3 packet types from L1 to L2 (2 bits encoding)
     191
     192|| TYPE       || BIT1 || BIT0 ||
     193||            ||      ||      ||   
     194||CLEANUP_DATA||  1   ||  0   ||
     195||CLEANUP_INS ||  1   ||  1   ||
     196||MULTI-ACK   ||  0   ||  *   ||
    213197
    214198=== 5.1 DSPIN MULTI-UPDATE packet format (L2-to-L1 : 40 bits) ===
     
    308292== 6.  External Network ==
    309293
    310 This network has a 3D mesh topology: All PUT/GET transactions are from N initiators to M targets (the M tiles of the L3 cache).
    311 
    312 === 4.1 VCI parameters ===
    313 
    314 The external network, that is only transporting cache lines does not use all VCI fields. The
    315 address is coded on 34 bits (it is actually a cache line index), and the data field is 64 bits,
    316 to increase the bandwidth.
    317 
    318 || VCI Field             ||  width  ||
    319 ||                       ||         ||
    320 ||ADDRESS                || 34 bits ||
    321 ||WDATA , RDATA          || 64 bits ||
    322 ||PLEN                   || unused  ||
    323 ||SRCID, RSRCID          || 10 bits ||
    324 ||TRDID, RTRDID          || 4 bits  ||
    325 ||PKTID, RPKTID          || unused  || 
    326 ||RERROR                 || 1 bit    ||         
     294TBD
     295
     296