wiki:InterconnexionNetworks

Version 83 (modified by cfuguet, 11 years ago) (diff)

--

Communication Infrastructure

1. The interconnection networks

The TSAR architecture uses the DSPIN network on chip infrastructure to define three independent networks.

  • The Direct Network implements the 40 bits TSAR physical address space supporting software driven transactions. It transports the direct READ, WRITE, LL, SC and CAS transactions from a VCI initiator (typically a L1 cache controller) to a VCI target (typically a memory cache controller, or a memory mapped peripheral). All VCI packets are translated to DSPIN packets by specific VCI/DSPIN wrappers. There is two physically separated networks for command packets and response packets. Both networks have a two-level hierarchical structure with a local interconnect in each cluster (that can be implemented as a local crossbar, or as a local ring), and a global interconnect (implemented as a 2D mesh).
  • The Coherence Network is used to transport the coherence packets implementing the DHCCP coherence protocol between L2 cache controllers and L1 cache controllers. This network is not visible by the software, and does not use wrappers, as the L1 and L2 cache controllers use directly the DSPIN packet format. Here again there is three physically separated networks to transport L2-to-L1 packets (M2P network), to transport L1-to-L2 packets (P2M network) and to transport CLACK packets (CLACK network). These networks have a two-level hierarchical structure with a local interconnect in each cluster (that can be implemented as a local crossbar, or as a local ring), and a global interconnect (implemented as a 2D mesh).
  • The RAM Network supports communications between the L2 caches and the tiles implementing the L3 cache, in case of MISS or cache line replacement in the L2 caches. It supports also the direct communication between the external peripheral that have a DMA capability (Disk controllers, or network controllers) and the L3 cache. It has a 3D mesh topology and the DSPIN flit width is 64 bits.

Regarding implementation, the Direct Network and the Coherence Network are physically separated for the local interconnect, as there is five physically separated local crossbars (or local rings) transporting the direct command, direct response, coherence P2M, coherence M2P and coherence CLACK packets. But they are only logically separated for the global interconnect, as they use the DSPIN virtual channels: the direct command, the coherence M2P and the coherence CLACK packets are multiplexed on the same 2D mesh (40 bits DSPIN flit width). Similarly, the direct response and coherence P2M packets are multiplexed on the same 2D mesh (33 bits DSPIN flit width). The "Direct Network" and the "Coherence Network" using the same hardware infrastructure are also called the "INT network".

2. VCI initiators & targets identifiers on direct network

On the direct network, each VCI port has an identifier that is defined by three indexes :

  • X_ID is the cluster X-coordinate in the 2D mesh.
  • Y_ID is the cluster Y-coordinate in the 2D mexh.
  • L_ID is the local index inside the cluster.

The X_ID, Y_ID and L_ID are coded on NX, NY, NL bits respectively. The NX, NY and NL parameters are global for a given instance of the TSAR architecture. NX & NY cannot be larger than 5 (no more than 1024 clusters), but can be smaller when the number of clusters is smaller than 1024. NL is equal to 4 (no more than 16 target ports or 16 initiator ports per cluster).

2.1 Target identification

The target identification is required to route a command packet. For both the direct and coherence networks, a VCI target is identified by the (NX + NY + NLADR) most significant bits of the VCI ADDRESS field :

X Y LADR OFFSET
NX bits NY bits NLADR bits 40-NX-NY-NLADR
  • According to the NUMA characteristics of the TSAR architecture, there is no transcoding of the X & Y address fields, that directly define the target cluster coordinates (X_ID, Y_ID).
  • The network hardware decodes the LADR field to obtain the target L_ID, using a local routing table (implemented as a wired decoder in each local interconnect controller). The local routing tables and the number of bits NLADR to be decoded can depend on the cluster.

2.2 Initiator identification

The initiator identification is required to route a response packet. a VCI initiator is identified by the VCI SRCID field : (NX + NY + NL) bits.

X_ID Y_ID L_ID
NX bits NY bits NL bits
  • The max value of the SRCID_SIZE is 14 bits.
  • NX can have any value from 0 to 5 (from 1 to 32 clusters per row).
  • NY can have any value from 0 to 5 (from 1 to 32 clusters per column).
  • NL is always equal to SRCID_SIZE - (NX + NY), but only the 4 LSB bits are significant.

As we want to support configuration up to 1024 clusters, the X_ID and Y_ID fields can require up to 10 bits. Therefore, the local index L_ID cannot use more than 4 bits, even if NL is larger than 4.

3. Component identifiers on the coherence network

The only components connected on the coherence network are the processors (L1), and the memory cache controler (L2). There is NPROCS (L1) components and only one (L2) component per cluster. To route a coherence packet from a source component to a destination component the DSPIN network uses the DEST field (left justified in the first flit of the packet).

  • When the destination is a processor (L1), the DEST value is the processor SRCID (X_ID|Y_ID|L_ID), with L_ID between 0 and (NPROCS-1).
  • When the destination is a memory cache (L2), the DEST value has the same structure (X_ID|Y_ID|L_ID), with L_ID = NPROCS.

4.  VCI encoding of the transaction on the direct network

All Hardware components connected to the direct network respect the VCI/OCP communication interface.

VCI Field width
ADDRESS 40 bits
WDATA , RDATA 32 bits
PLEN 8 bits
SRCID, RSRCID 14 bits
TRDID, RTRDID 4 bits
PKTID, RPKTID 4 bits
RERROR 1 bit

The TSAR architecture uses one single bit for the VCI RERROR field, even if the DSPIN infrastructure supports 2 bits for the error field.

There are 8 transaction types on the direct network, that are encoded through the VCI fields CMD and PKTID. The PKTID MSB bit is ignored (reserved for future use). This redundant encoding help to use in the TSAR architecture existing hardware components that do not decode the PKTID field, and use only the CMD field.

TYPE CMD (2 bits)PKTID (4 bits) PKTID mnemo CMD mnemo
READ_DATA_UNC 01 X000 TYPE_READ_DATA_UNC CMD_READ
READ_DATA_MISS01 X001 TYPE_READ_DATA_MISS CMD_READ
READ_INS_UNC 01 X010 TYPE_READ_INS_UNC CMD_READ
READ_INS_MISS 01 X011 TYPE_READ_INS_MISS CMD_READ
WRITE 10 X100 TYPE_WRITE CMD_WRITE
CAS 00 X101 TYPE_CAS CMD_STORE_COND
LL 11 X110 TYPE_LL CMD_LOCKED_READ
SC 00 X111 TYPE_SC CMD_STORE_COND

When a given initiator can send several simultaneous transactions of a given type (such as several simultaneous WRITE transactions), the VCI TRDID field is used to discriminate them. The TRDID field is 4 bits, supporting up to 16 simultaneous transactions for a given initiator.

4.1 VCI READ transaction

  • A VCI READ command packet contains one flit. In case of burst, all addresses must be within the same cache line. The VCI TRDID field is not used by L1 cache, but can be used by multi-channel DMA controllers to transmit the channel index.
  • A VCI READ response packet returns up to 16 flits.

4.2 VCI WRITE transaction

  • A VCI WRITE command packet contains from 1 to 16 flits. In case of burst, all addresses must within the same cache line. The VCI TRDID field is used by the L1 cache to index its write buffer. It can be used by multi-channel DMA controllers to transmit the channel index.
  • A VCI WRITE response packet contains one single flit.

4.3 VCI LL (Linked Load) transaction

  • This request is only sent by a L1 cache and can only target a memory cache.
  • A VCI LL command packet contains one single flit.
  • A VCI LL response packet contains 2 flits: The first flit contains in the RDATA field a signature returned by the memory cache for this LL reservation. The second flit contains in the RDATA field the data that has been read in the memory cache.

4.4 VCI SC (Store Conditional) transaction

  • This request is only sent by a L1 cache and can only target a memory cache.
  • A VCI SC command packet contains 2 flits. The first flit contains in the WDATA field the signature obtained with the last LL operation at this address. The second flit contains in the WDATA field the data to be written.
  • A VCI SC response packet contains 1 flit. The RDATA field contains 0 (resp. 1) to indicate an SC success (resp. failure).

3.5 VCI CAS (Compare & Swap) transaction

  • This request is only sent by a L1 cache and can only target a memory cache.
  • A VCI CAS command packet contains 2 flits The first flit contains in the WDATA field the old value of the data to be overwritten. The second flit contains in the WDATA field the new value to be written.
  • A VCI CAS response packet contains 1 flit. The RDATA field contains 0 (resp. 1) to indicate a CAS success (resp. failure).

5. DSPIN packet encoding on the direct network

The VCI command & response packets are translated (actually serialized) to DSPIN network format by the VCI/DSPIN wrappers. These wrappers are located between the VCI initiator and target components and the DSPIN network. The DSPIN command packet width is 40 bits (including EOP), and the DSPIN response packet width is 33 bits (including EOP). The DSPIN interconnexion network uses only the following information to route the DSPIN packets to the proper destination:

  • The EOP flag, defining the last flit of a DSPIN packet.
  • The (NX+NY+NL) MSB bits of the first flit are used to route the packet to the proper destination.
  • In a DSPIN command packet, the first flit LSB bit (BC) must be 0.

The DSPIN format can transport up to 40 bits VCI ADDRESS, and up to 14 bits VCI SRCID. If the VCI ADDRESS use less than 40 bits (for example 32 bits), the DSPIN ADDRESS field is left aligned, and the LSB bits of the DSPIN field are completed with "0". If the SRCID field uses less than 14 bits (NX < 5 or NY < 5), the SRCID field is left aligned, and the LSB bits of the DSPIN field are completed with "0".

5.1 DSPIN Read Command packet format (40 bits)

A single flit VCI Read Command packet (this includes LL packets) is translated to a 2 flits DSPIN Read Command packet :

Flit 0 :

||EOP||----------------ADDRESS-----------------||BC ||
|| 0 ||                (38)                    || 0 ||

Flit 1 :

||EOP||SRCID||CMD||CGT||PLEN||TRDID||PKTID||BE ||res||
|| 1 || (14)||(2)||(2)|| (8)|| (4) || (4) ||(4)||(1)||

5.2 DSPIN write Command packet format (40 bits)

A N flits VCI Write Command packet (this includes SC and CAS packets) is translated to a N+2 flits DSPIN Write Command packet :

Flit 0 :

||EOP||-----------------ADDRESS-----------------||BC||
|| 0 ||                  (38)                   || 0||

Flit 1 :

||EOP||SRCID||CMD||CGT||PLEN||TRDID||PKTID||--res---||
|| 0 || (14)||(2)||(2)|| (8)|| (4) || (4) ||  (5)   ||

Flit N :

||EOP||-res-||BE ||------------WDATA----------------||
|| 1 || (3) ||(4)||            (32)                 ||

5.3 DSPIN single flit Response packet format (33 bits)

A single flit DSPIN Response packet is built for the following VCI response packets:

  • a single flit VCI response packet to a WRITE command (no data transmitted),
  • a single flit VCI response packet to a READ or LL command, where the RDATA field has value 0,
  • a single flit VCI response packet to a SC or CAS command, where the RDATA field has value 0,

Flit 0 :

||EOP||RSRCID||RERROR||RTRDID||RPKTID||res||BC||
|| 1 || (14) || (2)  || (4)  || (4)  ||(7)|| 0||

5.4 DSPIN multi-flit Response packet format (33 bits)

For all other VCI response packets (multi-flits VCI response packet, or non-zero RDATA value) a multi-flits DSPIN response packet is built : a N flits VCI response packet is translated to a N+1 flits DSPIN response packet.

Flit 0 :

||EOP||RSRCID||RERROR||RTRDID||RPKTID||res||BC||
|| 0 || (14) || (2)  || (4)  || (4)  ||(7)|| 0||

Flit 1 :

||EOP||---------------RDATA-------------------||
|| 1 ||                (32)                   ||

6. DSPIN packet encoding on the coherence network

The coherence transactions are directly transmitted to the coherence network by the L1 caches and L2 caches in DSPIN format. The M2P (L2-L1) network uses 40 bits flits (including EOP). The P2M (L1-L2) network uses 33 bits flits (including EOP). The CLACK (L2-L1) network uses 40 bits (including EOP). Broadcast commands are only used on the M2P network, and use the BC bit in first flit.

  • Other than BROADCAST, there are 4 packet types on the M2P network (2 bits encoding)
TYPE BIT1 BIT0
UPDATE_DATA 0 0
UPDATE_INS 0 1
INVAL_DATA 1 0
INVAL_INS 1 1
  • There are 2 packet types on the CLACK network (1 bit encoding)
TYPE BIT0
CLACK_DATA 0
CLACK_INS 1
  • There are 3 packet types on the P2M network (2 bits encoding)
TYPE BIT1 BIT0
MULTI-ACK 0 *
CLEANUP_DATA 1 0
CLEANUP_INS 1 1

6.1 DSPIN MULTI-UPDATE packet format (M2P : 40 bits)

This DSPIN packet contains 2+N flits.

  • The DEST field contains the target L1 cache identifier (SRCID).
  • The SOURCE field contains the source L2 cache identifier (SRCID).
  • The TRDID field contains the UPDATE Table index.
  • The WORD field contains the first modified word index.
  • The NLINE field contains the cache line identifier (34 bits).

Flit 0 :

||EOP||----DEST----||-res-||--SOURCE--||TRDID||TYPE||BC||
|| 0 ||    (14)    || (4) ||   (14)   || (4) ||(2) ||0 ||

Flit 1 :

||EOP||res||WORD||---------------NLINE-----------------||
|| 0 ||(1)|| (4)||                (34)                 ||

Flit 2 :

||EOP||-res-||-BE-||-------------WDATA-----------------||
|| 0 || (3) ||(4) ||             (32)                  ||

Flit N :

||EOP||-res-||-BE-||-------------WDATA-----------------||
|| 1 || (3) ||(4) ||             (32)                  ||

6.2 DSPIN MULTI-INVAL packet format (M2P : 40 bits)

This DSPIN packet contains 2 flits.

  • The DEST field contains the target L1 cache identifier (SRCID).
  • The SOURCE field contains the source L2 cache identifier (SRCID).
  • The TRDID field contains the INVALIDATE Table index.
  • The WORD field contains the first modified word index.
  • The NLINE field contains the cache line identifier (34 bits).

Flit 0 :

||EOP||----DEST----||-res-||--SOURCE--||TRDID||TYPE||BC||
|| 0 ||    (14)    || (4) ||   (14)   || (4) || (2)||0 ||

Flit 1 :

||EOP||---res---||--------------NLINE------------------||
|| 1 ||   (5)   ||              (34)                   ||

6.3 DSPIN BROADCAST packet format (M2P : 40 bits)

This DSPIN packet contains 2 flits.

  • The SOURCE field contains the source L2 cache identifier (right justified SRCID).
  • The XMIN,XMAX, YMIN, YMAX fields define the limits of the broadcast.
  • The NLINE field contains the cache line identifier (34 bits).

Flit 0 :

||EOP||XMIN||XMAX||YMIN||YMAX||---SOURCE---||-res--||BC||
|| 0 ||(5) ||(5) ||(5) ||(5) ||    (14)    || (4)  ||1 ||

Flit 1 :

||EOP||---res---||------------NLINE--------------------||
|| 1 ||   (5)   ||            (34)                     ||

6.4 DSPIN CLACK packet format (CLACK : 40 bits)

This DSPIN packet contains one flit.

  • The DEST field contains the target L1 cache identifier (SRCID).
  • The SET field contains the cleared set index.
  • The WAY field contains the cleared way index.

Flit 0 :

||EOP||---DEST---||--res-||----SET----||-WAY-||TYPE||BC||
|| 1 ||   (14)   || (15) ||    (6)    || (2) ||(1) ||0 ||

6.5 DSPIN CLEANUP packet format (P2M : 33 bits)

This DSPIN packet contains 2 flits.

  • The DEST field contains the target (X,Y) cluster coordinates.
  • The SOURCE field contains the source L1 cache identifier (SRCID).
  • The NL32 field contains the 32 LSB bits of the cache line index.
  • The NL2 field contains the 2 MSB bits of the cache line index.
  • The WAY field contains the cleared way index.

Flit 0 :

||EOP||--DEST--||---SOURCE---||NL2||res||WAY||TYPE||BC||
|| 0 ||  (10)  ||    (14)    ||(2)||(1)||(2)||(2) ||0 ||

Flit 1 :

||EOP||--------------------NL32-----------------------||
|| 1 ||                    (32)                       ||

6.6 DSPIN MULTI-ACK packet format (P2M : 33 bits)

This DSPIN packet contains one flit.

  • The DEST field contains the target L2 cache identifier (SRCID).
  • The UPTID field contains the UPDATE Table index.
  • The TYPE field contains the MULTI-ACK code.

Flit 0 :

||EOP||--DEST--||---SOURCE---||res||-UPDTID-||TYPE||BC||
|| 1 ||  (10)  ||    (14)    ||(1)||  (4)   ||(2) ||0 ||

7. Initiators & targets on the external network

The L3 cache is implemented as a set of physical memory banks, with one single memory bank per cluster.

The only targets on the external network are the physical memory bank (one target per cluster). The initiators are the memory cache (one initiator per cluster) and the I/O bridge (one extra initiator in the I/O cluster). The I/O bridge is acting as a multiplexer for the various DMA commands sent by the external peripherals.

As for the direct network, each initiator is identified by three indexes X_ID (cluster X-coordinate), Y_ID (cluster Y-coordinate), and L_ID (local index). The L_ID value 0 is reserved for the memory cache. L-ID values larger than 0 must be allocated to the various external peripherals.

  • The only supported transactions are READ and WRITE burst transactions,
  • All addresses must be aligned on 32 bits word boundary,
  • All addresses in the same burst must be in the same cache line (64 bytes),
  • Byte operations are not supported.

The external network is implemented as a 2D mesh, using DSPIN routers with a 65 bits flit width, where the DATA field in a flit is 64 bits. The memory cache and I/O bridge components are directly connected to the external network through DSPIN_65 interfaces.

The various fields in the DSPIN_65 command and response packets are defined below: he following features are implemented in the VCI/DSPIN wrappers:

  • ADDRESS: Only the 38 MSB bits of the physical 40 bits address are transported in a DSPIN command packet.
  • WLEN: This argument define the number of 32 bits words in a burst (WLEN = (PLEN/4) - 1).
  • SRCID: Initiator identifier (X_ID | Y_ID | L_ID) coded on 14 bits.
  • TRDID: Transaction identifier for simultaneous transactions from a given initiator.
  • CMD: Transaction type ( 00 == READ / 01 == WRITE / 1* == reserved )
  • ERROR: Transaction status ( 00 == Read Success / 01 == Read Error / 10 == Write Success / 11 == Write Error)

8. DSPIN_65 packet encoding on the external network

The DSPIN_65 network uses only the following information to route the DSPIN packets to the proper destination:

  • The EOP flag, defining the last flit of a DSPIN packet.
  • The 14 MSB bits of the first field are used to route the DSPIN packet.

8.1 DSPIN Read Command packet format (65 bits)

A single flit VCI Read Command packet is translated to a single flits DSPIN Read Command packet.

Flit 0 :

||EOP||----------------ADDRESS---------------||res||WLEN||CMD||SRCID||TRDID||
|| 1 ||                (38)                  ||(2)|| (4)||(2)|| (14)|| (4) ||

8.2 DSPIN write Command packet format (65 bits)

A N flits VCI Write Command packet is translated to a (N+1) flits DSPIN Write Command packet. As a DATA flit contains 8 bytes

Flit 0 :

||EOP||----------------ADDRESS---------------||res||WLEN||CMD||SRCID||TRDID||
|| 1 ||                (38)                  ||(2)|| (4)||(2)||(14) || (4) ||

Flit N :

||EOP||----------------------------WDATA-----------------------------------||
|| 1 ||                            (64)                                    ||

8.3 DSPIN Read Response packet format (65 bits)

A N flits VCI Read Response packet is translated to a (N+1) flits DSPIN Read Response packet.

Flit 0 :

||EOP||RSRCID||--------------- res -------------------------||ERROR||RTRDID||
|| 0 || (14) ||                (44)                         || (2) || (4)  ||

Flit N :

||EOP||-----------------------------RDATA----------------------------------||
|| 1 ||                             (64)                                   ||

8.4 DSPIN Write Response packet format (65 bits)

A single flit VCI Write Response packet is translated to a single flit DSPIN Write Response packet.

Flit 0 :

||EOP||RSRCID||--------------- res ------------------------||RERROR||RTRDID||
|| 1 || (14) ||                (44)                        || (2)  || (4)  ||