Version 51 (modified by 12 years ago) (diff) | ,
---|
-
Communication Infrastructure
- 1. The interconnection networks
- 2. VCI initiators & targets indexing on direct network
- 3. VCI encoding of the transaction on the direct network
- 4. DSPIN packet encoding on the direct network
-
5. DSPIN packet encoding on the coherence network
- 5.1 DSPIN MULTI-UPDATE packet format (L2-to-L1 : 40 bits)
- 5.2 DSPIN MULTI-INVAL packet format (L2-to-L1 : 40 bits)
- 5.3 DSPIN BROADCAST packet format (L2-to-L1 : 40 bits)
- 5.4 DSPIN CLEANUP-ACK packet format (L2-to-L1 : 40 bits)
- 5.5 DSPIN CLEANUP packet format (L1-to-L2 : 33 bits)
- 5.6 DSPIN MULTI-ACK packet format
- 6. External Network
Communication Infrastructure
1. The interconnection networks
The TSAR architecture uses the DSPIN network on chip infrastructure to define three independent networks.
- The Direct Network implements the 40 bits TSAR physical address space that is visible by the software. It transports the direct READ, WRITE, LL, SC and CAS transactions from any VCI initiator (typically a L1 cache controller or another hardware coprocessor with a DMA capability) to any VCI target (typically a memory cache controller, or a memory mapped peripheral). All VCI packets are translated to DSPIN packets by specific VCI/DSPIN wrappers. There is actually two physically separated networks for command packets and response packets. Both networks have a two-level hierarchical structure with a local interconnect in each cluster (that can be implemented as a local crossbar, or as a local ring), and a global interconnect (implemented as a 2D mesh).
- The Coherence Network is used to transport the coherence packets implementing the DHCCP coherence protocol between L2 cache controllers and L1 cache controllers. This network is not visible by the software, and does not use wrappers, as the L1 and L2 cache controllers use directly the DSPIN packet format. Here again there is two physically separated networks to transport L2-to-L1 packets, and to transport L1-to-L2 packets. Both networks have a two-level hierarchical structure with a local interconnect in each cluster (that can be implemented as a local crossbar, or as a local ring), and a global interconnect (implemented as a 2D mesh).
- The Direct Network and the coherence Network are physically separated in each cluster, but they are only logically separated for the global communications: Regarding the local interconnect, there is four physically separated local crossbars (or local ring) transporting the direct command, direct response, coherence L1-to-L2, coherence L2-to-L1 packets. Regarding the global interconnect, the DSPIN infrastructure supporting virtual channels, the direct command and the coherence L2-to-L1" packets are multiplexed on the same 2D mesh (40 bits DSPIN flit width). Similarly, the direct response and coherence L1-to-L2 packets are multiplexed on the same 2D mesh (33 bits DSPIN width).
- The External Network supports communications between the L2 caches and the tiles implementing the 3D L3 cache, in case of MISS or cache line replacement in the L2 caches. It has a 3D mesh topology and the DSPIN flit width is 64 bits. This external network addressing space is not visible by the software.
2. VCI initiators & targets indexing on direct network
On the direct network, each VCI port has an identifier that is defined by three indexes :
- X_ID is the cluster X-coordinate.
- Y_ID is the cluster Y-coordinate.
- L_ID is the local index inside the cluster.
The X_ID, Y_ID and L_ID are coded on NX, NY, NL bits respectively. The NX, NY and NL parameters are global for a given instance of the TSAR architecture. NX & NY cannot be larger than 5 (no more than 1024 clusters), but can be smaller, if the number of clusters is smaller than 1024. NL is equal to 4 (no more than 16 target ports or 16 initiator ports per cluster).
In order to simplify the hardware implementation, the L_ID values defined for the direct network are also used on the coherence network, and the same value is used for an initiator port and for a target port: If the number of processors per cluster is NPROCS, the LI cache L_ID value is between 0 and (NPROCS-1). The L2 cache L_ID is equal to NPROCS.
2.1 Target identification
The target identification is required to route a command packet. For both the direct and coherence networks, a VCI target is identified by the (NX + NY + NLADR) most significant bits of the VCI ADDRESS field :
X | Y | LADR | OFFSET |
NX bits | NY bits | NLADR bits | 40-NX-NY-NLADR |
- According to the NUMA characteristics of the TSAR architecture, there is no transcoding of the X & Y fields, that directly define the target cluster coordinates (X_INDEX, Y_INDEX).
- The network hardware decodes the LADR field to obtain the target L_ID, using a local routing table (implemented as a wired decoder in each local interconnect controller). The local routing tables and the number of bits NLADR to be decoded can depend on the cluster.
2.2 Initiator identification
The initiator identification is required to route a response packet. a VCI initiator is identified by the VCI SRCID & RSRCID fields (NX + NY + NL bits) :
X_ID | Y_ID | L_ID |
NX bits | NY bits | NL bits |
Therefore, the total SRCID width cannot be larger than 14 bits. It can use less than 14 bits when the number of clusters is smaller than 1024.
3. VCI encoding of the transaction on the direct network
All Hardware components connected to the direct network respect the VCI/OCP communication interface.
VCI Field | width |
ADDRESS | 40 bits |
WDATA , RDATA | 32 bits |
PLEN | 8 bits |
SRCID, RSRCID | 14 bits |
TRDID, RTRDID | 4 bits |
PKTID, RPKTID | 4 bits |
RERROR | 1 bit |
The TSAR architecture uses one single bit for the VCI RERROR field, even if the DSPIN infrastructure supports 2 bits for the error field.
There are 8 transaction types on the direct network, that are encoded through the VCI fields CMD and PKTID. The PKTID MSB bit is ignored (reserved for future use). This redundant encoding help to use in the TSAR architecture existing hardware components that do not decode the PKTID field, and use only the CMD field.
TYPE | CMD (2 bits) | PKTID (4 bits) | PKTID mnemo | CMD mnemo |
READ_DATA_UNC | 01 | X000 | TYPE_READ_DATA_UNC | CMD_READ |
READ_DATA_MISS | 01 | X001 | TYPE_READ_DATA_MISS | CMD_READ |
READ_INS_UNC | 01 | X010 | TYPE_READ_INS_UNC | CMD_READ |
READ_INS_MISS | 01 | X011 | TYPE_READ_INS_MISS | CMD_READ |
WRITE | 10 | X100 | TYPE_WRITE | CMD_WRITE |
CAS | 00 | X101 | TYPE_CAS | CMD_STORE_COND |
LL | 11 | X110 | TYPE_LL | CMD_LOCKED_READ |
SC | 00 | X111 | TYPE_SC | CMD_STORE_COND |
When a given initiator can send several simultaneous transactions of a given type (such as several simultaneous WRITE transactions), the VCI TRDID field is used to discriminate them. The TRDID field is 4 bits, supporting up to 16 simultaneous transactions for a given initiator.
3.1 VCI READ transaction
- A VCI READ command packet contains one flit. In case of burst, all addresses must be within the same cache line. The VCI TRDID field is not used by L1 cache, but can be used by multi-channel DMA controllers to transmit the channel index.
- A VCI READ response packet returns up to 16 flits.
3.2 VCI WRITE transaction
- A VCI WRITE command packet contains from 1 to 16 flits. In case of burst, all addresses must within the same cache line. The VCI TRDID field is used by the L1 cache to index its write buffer. It can be used by multi-channel DMA controllers to transmit the channel index.
- A VCI WRITE response packet contains one single flit.
3.3 VCI LL (Linked Load) transaction
- This request is only sent by a L1 cache and can only target a memory cache.
- A VCI LL command packet contains one single flit.
- A VCI LL response packet contains 2 flits: The first flit contains in the RDATA field a signature returned by the memory cache for this LL reservation. The second flit contains in the RDATA field the data that has been read in the memory cache.
3.4 VCI SC (Store Conditional) transaction
- This request is only sent by a L1 cache and can only target a memory cache.
- A VCI SC command packet contains 2 flits. The first flit contains in the WDATA field the signature obtained with the last LL operation at this address. The second flit contains in the WDATA field the data to be written.
- A VCI SC response packet contains 1 flit. The RDATA field contains 0 (resp. 1) to indicate an SC success (resp. failure).
3.5 VCI CAS (Compare & Swap) transaction
- This request is only sent by a L1 cache and can only target a memory cache.
- A VCI CAS command packet contains 2 flits The first flit contains in the WDATA field the old value of the data to be overwritten. The second flit contains in the WDATA field the new value to be written.
- A VCI CAS response packet contains 1 flit. The RDATA field contains 0 (resp. 1) to indicate a CAS success (resp. failure).
4. DSPIN packet encoding on the direct network
The VCI command & response packets are translated (actually serialized) to DSPIN network format by the VCI/RING wrappers (in platform using the RING local interconnect) or by the VCI/DSPIN wrappers (in platforms using a XBAR local interconnect). These wrappers are located between the VCI initiator and target components and the DSPIN network. The DSPIN command packet width is 40 bits, and the DSPIN response packet width is 33 bits. The DSPIN interconnexion network uses only the following information to route the DSPIN packets to the proper destination:
- The EOP flag, defining the last flit of a DSPIN packet.
- The LSB bit of the first flit is the BC flag, defining a DSPIN broadcast packet.
- For a non broadcast packet (BC = 0), the (NX+NY+NL) MSB bits of the first field are used to route the packet to the proper destination.
- For a broadcast packet (BC = 1), the XMIN, XMAX, YMIN, YMAX fields (5 bits each), are used by the network to limit the broadcast.
The DSPIN format can transport up to 40 bits VCI ADDRESS, and up to 14 bits VCI SRCID. If the VCI ADDRESS use less than 40 bits (for example 32 bits), the DSPIN ADDRESS field is left aligned, and the LSB bits of the DSPIN field are completed with "0". If the SRCID field uses less than 14 bits (NX < 5 or NY < 5), the SRCID field is left aligned, and the LSB bits of the DSPIN field are completed with "0".
4.1 DSPIN Read Command packet format (40 bits)
A single flit VCI Read Command packet (this includes LL packets) is translated to a 2 flits DSPIN Read Command packet :
Flit 0 :
EOP | ----------------ADDRESS-------------------- | BC |
0 | (38) | 0 |
Flit 1 :
EOP | SRCID | CMD | CGT | PLEN | TRDID | PKTID | BE | res |
1 | (14) | (2) | (2) | (8) | (4) | (4) | (4) | (1) |
4.2 DSPIN write Command packet format (40 bits)
A N flits VCI Write Command packet (this includes SC packets) is translated to a N+2 flits DSPIN Write Command packet :
Flit 0 :
EOP | ----------------ADDRESS-------------------- | BC |
0 | (38) | 0 |
Flit 1 :
EOP | SRCID | CMD | CGT | PLEN | TRDID | PKTID | BE | res |
0 | (14) | (2) | (2) | (8) | (4) | (4) | (4) | (1) |
Flit N :
EOP | -res- | BE | --------------WDATA--------------- |
1 | (3) | (4) | (32) |
4.3 DSPIN single flit Response packet format (33 bits)
A single flit DSPIN Response packet is built for the following VCI response packets:
- a single flit VCI response packet to a WRITE command (no data transmitted),
- a single flit VCI response packet to a READ or LL command, where the RDATA field has value 0,
- a single flit VCI response packet to a SC or CAS command, where the RDATA field has value 0,
Flit 0 :
EOP | RSRCID | RERROR | RTRDID | RPKTID | res | BC |
1 | (14) | (2) | (4) | (4) | (7) | 0 |
4.4 DSPIN multi-flit Response packet format (33 bits)
For all other VCI response packets (multi-flits VCI response packet, or non-zero RDATA value) a multi-flits DSPIN response packet is built : a N flits VCI response packet is translated to a N+1 flits DSPIN response packet.
Flit 0 :
EOP | RSRCID | RERROR | RTRDID | RPKTID | res | BC |
0 | (14) | (2) | (4) | (4) | (7) | 0 |
Flit 1 :
EOP | ---------------RDATA------------------------ |
1 | (32) |
5. DSPIN packet encoding on the coherence network
The coherence transactions are directly transmitted to the coherence network by the L1 caches and L2 caches in DSPIN format. The L2-to-L1 network uses 40 bits flits. The L1-to-L2 network uses 33 bits flits. Broadcast commands are only used on the L2-to-L1 network, and use the BC bit in first flit.
- Other than BROADCAST, there is 5 packet types from L2 to L1 (3 bits encoding)
TYPE | BIT2 | BIT1 | BIT0 |
CLEANUP_ACK | 1 | * | * |
UPDATE_DATA | 0 | 1 | 1 |
UPDATE_INS | 0 | 1 | 0 |
INVAL_DATA | 0 | 0 | 1 |
INVAL_INS | 0 | 0 | 0 |
- There is 3 packet types from L1 to L2 (2 bits encoding)
TYPE | BIT1 | BIT0 |
CLEANUP_DATA | 1 | 0 |
CLEANUP_INS | 1 | 1 |
MULTI-ACK | 0 | * |
5.1 DSPIN MULTI-UPDATE packet format (L2-to-L1 : 40 bits)
This DSPIN packet contains 2+N flits.
- The DEST field contains the target L1 cache identifier (SRCID).
- The SOURCE field contains the source L2 cache identifier (SRCID.
- The UPTID field contains the UPDATE Table index.
- The WORD field contains the first modified word index.
- The NLINE field contains the cache line identifier (34 bits).
Flit 0 :
EOP | ---DEST--- | -res- | --SOURCE-- | UPTID | TYPE | BC |
0 | (14) | (3) | (14) | (4) | (3) | 0 |
Flit 1 :
EOP | res | WORD | ---------------NLINE----------------- |
0 | (1) | (4) | (34) |
Flit 3 :
EOP | -res- | -BE- | -------------WDATA----------------- |
0 | (3) | (4) | (32) |
Flit N :
EOP | -res- | -BE- | -------------WDATA----------------- |
1 | (3) | (4) | (32) |
5.2 DSPIN MULTI-INVAL packet format (L2-to-L1 : 40 bits)
This DSPIN packet contains 2 flits.
- The DEST field contains the target L1 cache identifier (SRCID).
- The SOURCE field contains the source L2 cache identifier (SRCID.
- The UPTID field contains the UPDATE Table index.
- The WORD field contains the first modified word index.
- The NLINE field contains the cache line identifier (34 bits).
Flit 0 :
EOP | ---DEST--- | -res- | --SOURCE-- | UPTID | TYPE | BC |
0 | (14) | (3) | (14) | (4) | (3) | 0 |
Flit 1 :
EOP | res | WORD | ---------------NLINE----------------- |
1 | (1) | (4) | (34) |
5.3 DSPIN BROADCAST packet format (L2-to-L1 : 40 bits)
This DSPIN packet contains 2 flits.
- The SOURCE field contains the source L2 cache identifier (SRCID).
- The XMIN,XMAX, YMIN, YMAX fields define the limits of the broadcast.
- The UPTID field contains the UPDATE Table index.
- The NLINE field contains the cache line identifier (34 bits).
Flit 0 :
EOP | XMIN | XMAX | YMIN | YMAX | --SOURCE-- | -res- | BC |
0 | (5) | (5) | (5) | (5) | (14) | (4) | 1 |
Flit 1 :
EOP | res | UPTID | ------------NLINE------------------- |
1 | (1) | (4) | (34) |
5.4 DSPIN CLEANUP-ACK packet format (L2-to-L1 : 40 bits)
This DSPIN packet contains one flit.
- The DEST field contains the target L1 cache identifier (SRCID).
- The SET field contains the cleared set index.
- The WAY field contains the cleared way index.
Flit 0 :
EOP | ---DEST--- | -res- | --SET----- | -WAY- | TYPE | BC |
1 | (14) | (3) | (16) | (2) | (3) | 0 |
5.5 DSPIN CLEANUP packet format (L1-to-L2 : 33 bits)
This DSPIN packet contains 2 flits.
- The DEST field contains the target (X,Y) cluster coordinates.
- The SOURCE field contains the source L1 cache identifier (SRCID).
- The NL32 field contains the 32 LSB bits of the cache line index.
- The NL2 field contains the 2 MSB bits of the cache line index.
- The WAY field contains the cleared way index.
Flit 0 :
EOP | --DEST-- | --SOURCE-- | NL2 | res | WAY | TYPE | BC |
0 | (10) | (14) | (2) | (1) | (2) | (2) | 0 |
Flit 1 :
EOP | ---------------NLINE------------------------- |
1 | (32) |
5.6 DSPIN MULTI-ACK packet format
This DSPIN packet contains one flit.
- The DEST field contains the target L1 cache identifier (SRCID).
- The UPTID field contains the UPDATE Table index.
- The WAY field contains the cleared way index.
Flit 0 :
EOP | --DEST-- | ------res--------- | UPTID | TYPE | BC |
1 | (10) | (15) | (4) | (2) | 0 |
6. External Network
TBD