Version 45 (modified by 12 years ago) (diff) | ,
---|
-
Communication Infrastructure
- 1. The 3 interconnection networks
- 2. VCI initiators & targets indexing
- 3. VCI encoding of the various transaction types on the direct network
- 4. DSPIN encoding of the various transaction types on the direct network
-
5. DSPIN encoding of the coherence transactions
- 5.1 DSPIN MULTI-UPDATE packet format (L2-to-L1 : 40 bits)
- 5.2 DSPIN MULTI-INVAL packet format (L2-to-L1 : 40 bits)
- 5.3 DSPIN BROADCAST packet format (L2-to-L1 : 40 bits)
- 5.4 DSPIN CLEANUP-ACK packet format (L2-to-L1 : 40 bits)
- 5.5 DSPIN CLEANUP packet format (L1-to-L2 : 33 bits)
- 5.6 DSPIN MULTI-ACK packet format
- 6. External Network
Communication Infrastructure
1. The 3 interconnection networks
The TSAR architecture defines three logically independent VCI compliant networks, that are fully separated for dead-lock prevention :
- The Direct Network implements the 40 bits TSAR physical address space that is visible by the software. It transports the direct READ, WRITE, LL, SC and CAS transactions from any VCI initiator (typically a L1 cache controller or another hardware coprocessor with a DMA capability) to any VCI target (typically a memory cache controller, or a memory mapped peripheral).
- The Coherence Network implements a separated address space, used to transport the coherence transactions between memory cache controllers and L1 cache controllers. This address space is not visible by the software.
- The External Network implements a 34 bits physical address space.This network transports the PUT and GET transactions from the memory cache controller to the external RAM controller, in case of MISS or cache line replacement in the memory cache. This address space is not visible by the software.
2. VCI initiators & targets indexing
A given hardware component can have several VCI ports. For example the L1 cache has three VCI ports : one initiator port to the direct network, one initiator port to the coherence network, and one target port on the coherence network. Each VCI port can have a different identifier that is defined by three indexes :
- X_ID is the cluster X-coordinate.
- Y_ID is the cluster Y-coordinate.
- L_ID is the local index inside the cluster.
An hardware component that has several VCI ports can have several different values for the L_ID local index.
The X_ID, Y_ID and L_ID are coded on NX, NY, NL bits respectively. The NX, NY and NL parameters are global for a given instance of the TSAR architecture. NX & NY cannot be larger than 5 (no more than 1024 clusters), but can be smaller, if the number of clusters is smaller than 1024. NL is equal to 4 (no more than 16 target ports or 16 initiator ports per cluster).
In order to simplify the hardware implementation of the memory coherence protocol, the L_ID values are standardized on the coherence network, and the same value is used for an initiator port and for a target port: If the number of processors per cluster is NPROCS, the processor L_ID value is between 0 and (NPROCS-1). The memory cache L_ID is equal to NPROCS.
2.1 Target identification
The target identification is required to route a command packet. For both the direct and coherence networks, a VCI target is identified by the (NX + NY + NLADR) most significant bits of the VCI ADDRESS field :
X | Y | LADR | OFFSET |
NX bits | NY bits | NLADR bits | 40-NX-NY-NLADR |
- According to the NUMA characteristics of the TSAR architecture, there is no transcoding of the X & Y fields, that directly define the target cluster coordinates (X_INDEX, Y_INDEX).
- The network decodes the LADR field to obtain the target L_ID, using a local routing table (implemented as a wired decoder in each local interconnect controller). The local routing tables and the number of bits NLADR to be decoded depend on the cluster.
2.2 Initiator identification
The initiator identification is required to route a response packet. For both the direct and coherence networks, a VCI initiator is identified by the VCI SRCID & RSRCID fields (NX + NY + NL bits) :
X_ID | Y_ID | L_ID |
NX bits | NY bits | NL bits |
Therefore, the total SRCID width cannot be larger than 14 bits. It can use less than 14 bits when the number of clusters is smaller than 1024.
3. VCI encoding of the various transaction types on the direct network
All Hardware components connected to the direct network respect the VCI/OCP communication interface.
VCI Field | width |
ADDRESS | 40 bits |
WDATA , RDATA | 32 bits |
PLEN | 8 bits |
SRCID, RSRCID | 14 bits |
TRDID, RTRDID | 4 bits |
PKTID, RPKTID | 4 bits |
RERROR | 1 bit |
The TSAR architecture uses one single bit for the VCI RERROR field, even if the DSPIN infrastructure supports 2 bits for the error field.
There are 8 transaction types on the direct network, that are encoded through the VCI fields CMD and PKTID. The PKTID field in TSAR is 4 bits long, but the MSB is ignored (reserved for future use).
TYPE | CMD (2 bits) | PKTID (4 bits) | PKTID mnemo | CMD mnemo |
READ_DATA_UNC | 01 | X000 | TYPE_READ_DATA_UNC | CMD_READ |
READ_DATA_MISS | 01 | X001 | TYPE_READ_DATA_MISS | CMD_READ |
READ_INS_UNC | 01 | X010 | TYPE_READ_INS_UNC | CMD_READ |
READ_INS_MISS | 01 | X011 | TYPE_READ_INS_MISS | CMD_READ |
WRITE | 10 | X100 | TYPE_WRITE | CMD_WRITE |
CAS | 00 | X101 | TYPE_CAS | CMD_STORE_COND |
LL | 11 | X110 | TYPE_LL | CMD_LOCKED_READ |
SC | 00 | X111 | TYPE_SC | CMD_STORE_COND |
Remarks on the PKTID field encoding :
- for a TYPE_READ, bit 0 is set (resp. not set) for a miss (resp. uncached) request
- for a TYPE_READ, bit 1 is set (resp. not set) for an instruction (resp. data) request
- bit 2 can be used to check for a TYPE_READ (bit 2 = 0)
When a given initiator can send several simultaneous transactions of a given type (such as several simultaneous WRITE transactions), the VCI TRDID field is used to discriminate them. The TRDID field is 4 bits, supporting up to 16 simultaneous transactions for a given initiator.
3.1 VCI READ transaction
A VCI READ command packet contains one flit. In case of burst, all addresses must within the same cache line.
- The VCI CMD field must be set to CMD_READ.
- The VCI TRDID field is not used by the L1 cache, but can be used by multi-channel DMA controllers to transmit the channel index.
- The VCI PKTID field can be any of the 4 TYPE_READ_* of the previous table.
A VCI READ response packet returns either
- Up to 16 flits containing the uncached data in the RDATA field (for a PKTID = TYPE_READ_*_UNC).
- Exactly 16 flits containing one word per flit in the RDATA field (for a PKTID = TYPE_READ_*_MISS).
3.2 VCI WRITE transaction
A VCI WRITE command packet contains from 1 to 16 flits. In case of burst, all addresses must within the same cache line.
- The VCI CMD field must be set to CMD_WRITE.
- The VCI TRDID field is used by the L1 cache to index its write buffer. It can be used by multi-channel DMA controllers to transmit the channel index.
- The VCI PKTID field must be TYPE_WRITE.
A VCI WRITE response packet always returns a single flit with a 0 value in the RDATA field.
3.3 VCI LL (Linked Load) transaction
A VCI LL (Linked Load) command packet contains one single flit. (N.B.: this request is only sent by a L1 cache and can only target a memory cache)
- The VCI CMD field must be set to CMD_LOCKED_READ.
- The VCI TRDID field is not used by the L1 cache.
- The VCI PKTID field must be TYPE_LL.
A VCI LL (Linked Load) response packet contains 2 flits :
- The first flit contains in the RDATA field a signature returned by the memory cache for this LL reservation.
- The second flit contains in the RDATA field the data that has been read in the memory cache.
3.4 VCI SC (Store Conditional) transaction
A VCI SC (Store Conditionnal) command packet contains 2 flits. (N.B.: this request is only sent by a L1 cache and can only target a memory cache)
- The VCI CMD field must be set to CMD_STORE_COND.
- The VCI TRDID field is not used by the L1 cache.
- The VCI PKTID field must be TYPE_SC.
- The first flit contains in the WDATA field the signature obtained with the last LL operation at this address.
- The second flit contains in the WDATA field the data to be written.
A VCI SC (Store Conditional) response packet contains 1 flit.
- The RDATA field contains 0 (resp. 1) to indicate an SC success (resp. failure).
3.5 VCI CAS (Compare & Swap) transaction
A VCI CAS (Compare & Swap) command packet contains 2 flits. (N.B.: this request is only sent by a L1 cache and can only target a memory cache)
- The VCI CMD field must be set to CMD_STORE_COND.
- The VCI TRDID field is not used by the L1 cache.
- The VCI PKTID field must be TYPE_CAS.
- The first flit contains in the WDATA field the old value of the data to be overwritten.
- The second flit contains in the WDATA field the new value to be written.
A VCI CAS (Compare & Swap) response packet contains 1 flit.
- The RDATA field contains 0 (resp. 1) to indicate a CAS success (resp. failure).
4. DSPIN encoding of the various transaction types on the direct network
The VCI command & response packets are translated (actually serialized) to a more convenient DSPIN network format by the VCI/RING wrappers (in platform using the RING local interconnect) or by the VCI/DSPIN wrappers (in platforms using a XBAR local interconnect). These wrappers are located between the VCI initiator and target components and the DSPIN network. The DSPIN command packet width is 40 bits, and the DSPIN response packet width is 33 bits. The DSPIN interconnexion network uses only the following information to route both the DSPIN packets to the proper destination:
- The EOP flag, defining the last flit of a DSPIN packet.
- The LSB bit of the first flit is the BC flag, defining a DSPIN broadcast packet.
- For a non broadcast command packet (BC = 0), the (NX+NY+NL) MSB bits of the first field are used to route the packet to the proper destination.
- For a broadcast packet (BC = 1), and the XMIN, XMAX, YMIN, YMAX fields (5 bits each), are used by the network to limit the broadcast.
The DSPIN format can transport 40 bits VCI ADDRESS, and 14 bits VCI SRCID. If the VCI ADDRESS use less than 40 bits (for example 32 bits), the DSPIN ADDRESS field is left aligned, and the LSB bits of the DSPIN field are completed with "0". If the SRCID field uses less than 14 bits (NX < 5 or NY < 5), the SRCID field is left aligned, and the LSB bits of the DSPIN field are completed with "O".
The DSPIN packets formats are defined below:
4.1 DSPIN Read Command packet format (40 bits)
A single flit VCI Read Command packet (this includes LL packets) is translated to a 2 flits DSPIN Read Command packet :
Flit 0 :
EOP | ----------------ADDRESS-------------------- | BC |
0 | (38) | 0 |
Flit 1 :
EOP | SRCID | CMD | CGT | PLEN | TRDID | PKTID | BE | res |
1 | (14) | (2) | (2) | (8) | (4) | (4) | (4) | (1) |
4.2 DSPIN write Command packet format (40 bits)
A N flits VCI Write Command packet (this includes SC packets) is translated to a N+2 flits DSPIN Write Command packet :
Flit 0 :
EOP | ----------------ADDRESS-------------------- | BC |
0 | (38) | 0 |
Flit 1 :
EOP | SRCID | CMD | CGT | PLEN | TRDID | PKTID | BE | res |
0 | (14) | (2) | (2) | (8) | (4) | (4) | (4) | (1) |
Flit N :
EOP | -res- | BE | --------------WDATA--------------- |
1 | (3) | (4) | (32) |
4.3 DSPIN single flit Response packet format (33 bits)
A single flit DSPIN Response packet is built for the following VCI response packets:
- a single flit VCI response packet to a WRITE command (no data transmitted),
- a single flit VCI response packet to a READ or LL command, where the RDATA field has value 0,
- a single flit VCI response packet to a SC or CAS command, where the RDATA field has value 0,
Flit 0 :
EOP | RSRCID | RERROR | RTRDID | RPKTID | res | BC |
1 | (14) | (2) | (4) | (4) | (7) | 0 |
4.4 DSPIN multi-flit Response packet format (33 bits)
For all other VCI response packets (multi-flits VCI response packet, or non-zero RDATA value) a multi-flits DSPIN response packet is built : a N flits VCI response packet is translated to a N+1 flits DSPIN response packet.
Flit 0 :
EOP | RSRCID | RERROR | RTRDID | RPKTID | res | BC |
0 | (14) | (2) | (4) | (4) | (7) | 0 |
Flit 1 :
EOP | ---------------RDATA------------------------ |
1 | (32) |
5. DSPIN encoding of the coherence transactions
The coherence transactions are directly transmitted to the coherence network by the L1 caches and L2 caches in DSPIN format. The L2-to-L1 network uses 40 bits flits. The L1-to-L2 network uses 33 bits flits. There is 4 packets types from L2 to L1, and 2 packet types from L1 to L2.
5.1 DSPIN MULTI-UPDATE packet format (L2-to-L1 : 40 bits)
This DSPIN packet contains 2+N flits.
- The DEST field contains the target L1 cache identifier (SRCID).
- The SOURCE field contains the source L2 cache identifier (SRCID.
- The UPTID field contains the UPDATE Table index.
- The WORD field contains the first modified word index.
- The NLINE field contains the cache line identifier (34 bits).
Flit 0 :
EOP | ---DEST--- | -res- | --SOURCE-- | UPTID | TYPE | BC |
0 | (14) | (3) | (14) | (4) | (3) | 0 |
Flit 1 :
EOP | res | WORD | ---------------NLINE----------------- |
0 | (1) | (4) | (34) |
Flit 3 :
EOP | -res- | -BE- | -------------WDATA----------------- |
0 | (3) | (4) | (32) |
Flit N :
EOP | -res- | -BE- | -------------WDATA----------------- |
1 | (3) | (4) | (32) |
5.2 DSPIN MULTI-INVAL packet format (L2-to-L1 : 40 bits)
This DSPIN packet contains 2 flits.
- The DEST field contains the target L1 cache identifier (SRCID).
- The SOURCE field contains the source L2 cache identifier (SRCID.
- The UPTID field contains the UPDATE Table index.
- The WORD field contains the first modified word index.
- The NLINE field contains the cache line identifier (34 bits).
Flit 0 :
EOP | ---DEST--- | -res- | --SOURCE-- | UPTID | TYPE | BC |
0 | (14) | (3) | (14) | (4) | (3) | 0 |
Flit 1 :
EOP | res | WORD | ---------------NLINE----------------- |
1 | (1) | (4) | (34) |
5.3 DSPIN BROADCAST packet format (L2-to-L1 : 40 bits)
This DSPIN packet contains 2 flits.
- The SOURCE field contains the source L2 cache identifier (SRCID).
- The XMIN,XMAX, YMIN, YMAX fields define the limits of the broadcast.
- The UPTID field contains the UPDATE Table index.
- The NLINE field contains the cache line identifier (34 bits).
Flit 0 :
EOP | XMIN | XMAX | YMIN | YMAX | --SOURCE-- | -res- | BC |
0 | (5) | (5) | (5) | (5) | (14) | (4) | 1 |
Flit 1 :
EOP | res | UPTID | ------------NLINE------------------- |
1 | (1) | (4) | (34) |
5.4 DSPIN CLEANUP-ACK packet format (L2-to-L1 : 40 bits)
This DSPIN packet contains one flit.
- The DEST field contains the target L1 cache identifier (SRCID).
- The SET field contains the cleared set index.
- The WAY field contains the cleared way index.
Flit 0 :
EOP | ---DEST--- | -res- | --SET----- | -WAY- | TYPE | BC |
1 | (14) | (3) | (16) | (2) | (3) | 0 |
5.5 DSPIN CLEANUP packet format (L1-to-L2 : 33 bits)
This DSPIN packet contains 2 flits.
- The DEST field contains the target (X,Y) cluster coordinates.
- The SOURCE field contains the source L1 cache identifier (SRCID).
- The NL32 field contains the 32 LSB bits of the cache line index.
- The NL2 field contains the 2 MSB bits of the cache line index.
- The WAY field contains the cleared way index.
Flit 0 :
EOP | --DEST-- | --SOURCE-- | NL2 | res | WAY | TYPE | BC |
0 | (10) | (14) | (2) | (1) | (2) | (2) | 0 |
Flit 1 :
EOP | ---------------NLINE------------------------- |
1 | (32) |
5.6 DSPIN MULTI-ACK packet format
This DSPIN packet contains one flit.
- The DEST field contains the target L1 cache identifier (SRCID).
- The UPTID field contains the UPDATE Table index.
- The WAY field contains the cleared way index.
Flit 0 :
EOP | --DEST-- | ------res--------- | UPTID | TYPE | BC |
1 | (10) | (15) | (4) | (2) | 0 |
6. External Network
This network has a 3D mesh topology: All PUT/GET transactions are from N initiators to M targets (the M tiles of the L3 cache).
4.1 VCI parameters
The external network, that is only transporting cache lines does not use all VCI fields. The address is coded on 34 bits (it is actually a cache line index), and the data field is 64 bits, to increase the bandwidth.
VCI Field | width |
ADDRESS | 34 bits |
WDATA , RDATA | 64 bits |
PLEN | unused |
SRCID, RSRCID | 10 bits |
TRDID, RTRDID | 4 bits |
PKTID, RPKTID | unused |
RERROR | 1 bit |