Version 21 (modified by 14 years ago) (diff) | ,
---|
- Communication Infrastructure
Communication Infrastructure
1. The 3 interconnection networks
The TSAR architecture defines three logically independent VCI compliant networks, that are fully separated for dead-lock prevention :
- The Direct Network implements the 40 bits TSAR physical address space that is visible by the software. It transports the direct READ, WRITE, LL, & SC transactions from any VCI initiator (typically a L1 cache controller or another hardware coprocessor with a DMA capability) to any VCI target (typically a memory cache controller, or a memory mapped peripheral).
- The Coherence Network implements a separated 40 bits physical address space, used to transport the coherence transactions : MULTI_UPDATE, MULTI_INVAL, BROADCAST_INVAL (from memory cache controllers to L1 cache controllers) and CLEANUP (from the L1 cache controllers to the memory cache controllers). This address space is not visible by the software.
- The External Network implements a 34 bits physical address space.This network transports the PUT and GET transactions from the memory cache controller to the external RAM controller, in case of MISS or cache line replacement in the memory cache. This address space is not visible by the software.
2. VCI initiators & targets indexing
As a given hardware component can have several VCI ports (for example the L1 cache has three VCI ports : one initiator port to the direct network, one initiator port to the coherence network, and one target port on the coherence network), each VCI port has a different identifier that is defined by three indexes :
- X_ID is the cluster X-coordinate.
- Y_ID is the cluster Y-coordinate.
- L_ID is the local index inside the cluster.
The X_ID, Y_ID and L_ID are coded on NX, NY, NL bits respectively. NX, NY and NL are global parameters for the TSAR architecture, but NX & NY cannot be larger than 5 (no more than 1024 clusters), and NL cannot be larger than 4 (no more than 16 ports per cluster).
In order to simplify the hardware implementation of the memory coherence protocol, the L_ID values are standardized on the coherence network, and the same value is used for an initaitor port and for a target port:
COMPONENT | LOCAL_INDEX |
Memory Cache | 0000 |
Processor 0 (L1 cache) | 0001 |
Processor 1 (L1 cache) | 0010 |
Processor 2 (L1 cache | 0011 |
Processor 3 (L1 cache) | 0100 |
2.1 Target identification
The target identification is required to route a command packet. For both the direct and coherence networks, a VCI target is identified by the (NX + NY + NLADR) most significant bits of the VCI ADDRESS field :
X (NX bits) | Y (NY bits) | LADR (NLADR bits) | OFFSET (40 - NX - NY - NL bits) |
- According to the NUMA characteristics of the TSAR architecture, there is no transcoding of the X & Y fields, that directly define the target cluster coordinates (X_INDEX, Y_INDEX).
- The network decodes the LADR field to obtain the target LOCAL_INDEX, using a local routing table (implemented as a wired decoder in each local interconnect controller). The local routing tables and the number of bits NLADR to be decoded depend on the cluster.
2.2 Initiator identification
The initiator identification is required to route a response packet. For both the direct and coherence networks, a VCI initiator is identified by the VCI SRCID & RSRCID fields (NX + NY + NL bits) :
X_ID (NX bits) | Y_ID (NY bits) | L_ID (NL bits) |
Therefore, the total SRCID width cannot be larger than 14 bits.
3. Direct Network & Coherence Network
These two networks are implemented by the DSPIN network on chip general infrastructure :
- The local interconnect is implemented as two physically independent local rings, and the coherence ring supports a broadcast service for single flit VCI commands.
Note : These two physically independent rings could be implemented later as one single physical ring supporting two virtual networks.
- The global interconnect is implemented as one DSPIN network, supporting two virtual sub-networks, and the coherence sub-network supports a broadcast service for single flit VCI commands.
3.1 VCI Address generation on the direct network
On the direct network, the addresses are controlled by the software.
3.2 VCI Address generation on the coherence network
On the coherence network, the addresses are defined by the hardware with the following policy:
- In a multicast command packet from a memory cache controller to a L1 cache controller, the address is obtained by copying the target L1 cache SRCID in the MSB bits of the VCI ADDRESS (left aligned) : The L1 cache L_ID is actually used as the LADR address field. UPDATE/INVAL requests are distinguished by the bit ADDRESS[2] (0 for INVAL, 1 for UPDATE).
- In a cleanup command packet from a L1 cache controller to a memory cache controller, the address is obtained by copying the (NX + NY) MSB bits of the line address in the VCI ADDRESS field (left aligned). The 0 value for the LADR address field is used to select the memory cache.
- In a broadcast_invalidate command packet, from a memory cache controller to a L1 cache controller, the ADDRESS[1:0] bits must be equal to 0x3. The 20 bits ADDRESS[39:20] contain the XMIN,XMAX,YMIN,YMAX values defining the bounding box of the broadcast:
XMIN | XMAX | YMIN | YMAX | RESERVED | 11 |
5 | 5 | 5 | 5 | 18 | 2 |
3.3 VCI parameters
All Hardware components connected to the direct network or to the coherence network respect the VCI/OCP communication interface.
The direct network, and the coherence network being time-multiplexed on the DSPIN infrastructure, have identical VCI formats :
VCI Field | width |
ADDRESS | 40 bits |
WDATA , RDATA | 32 bits |
PLEN | 8 bits |
SRCID, RSRCID | 14 bits |
TRDID, RTRDID | 4 bits |
PKTID, RPKTID | 4 bits |
RERROR | 2 bits |
The TSAR architecture uses two bits for the VCI RERROR field, in order to simplify the VCI/DSPIN wrapper, and to reduce the DSPIN Write Response packet length to one flit :
RERROR | code |
READ_ | 00 |
WRITE_OK | 10 |
READ_ERROR | 01 |
WRITE_ERROR | 11 |
3.3 DSPIN Packet format
The VCI command & response packets are translated (actually serialized) to a more convenient DSPIN network format by the VCI/RING wrappers located between the VCI initiator and target components and the DSPIN network. The DSPIN command packet width is 40 bits, and the DSPIN response packet width is 33 bits. The DSPIN interconnexion network uses only the following information to route both the DSPIN packets to the proper destination:
- the MSB bit is the EOP flag, defining the last flit of a DSPIN packet.
- the LSB bit of the first flit is the BC flag, defining a DSPIN broadcast packet.
- For a non broadcast packet (BC = 0), the first flit contains a 38 bits ADDRESS field (defining an aligned 32 bits word address). The (NX+NY+NL) MSB bits of this ADDRESS field are used to route the packet to the proper destination.
- For a broadcast packet (BC = 1), the first flit contains the 20 MSB bits of the ADDRESS field contain the XMIN, XMAX, YMIN, YMAX fields (5 bits each), that are used by the network to limit the broadcast.
There is actually Five types of DSPIN packets:
3.3.1 DSPIN Read Command packet format
A single flit VCI Read Command packet (this includes LL packets) is translated to a 2 flits DSPIN Read Command packet :
Flit 0 :
EOP | ----------------ADDRESS----------------------- | BC |
1 | 38 | 1 |
Flit 1 :
EOP | SRCID | CMD | CST | PLEN | TRDID | PKTID | reserved |
1 | 14 | 2 | 2 | 8 | 4 | 4 | 5 |
3.3.2 DSPIN write Command packet format
A N flits VCI Write Command packet (this includes SC packets) is translated to a N+2 flits DSPIN Write Command packet :
Flit 0 :
EOP | ----------------ADDRESS----------------------- | BC |
1 | 38 | 1 |
Flit 1 :
EOP | SRCID | CMD | CST | PLEN | TRDID | PKTID | reserved |
1 | 14 | 2 | 2 | 8 | 4 | 4 | 5 |
Flit N :
EOP | reserved | BE | -------------WDATA---------------- |
1 | 3 | 4 | 32 |
3.3.3 DSPIN Broadcast Command packet format
The single flit VCI Write Broadcast is translated to a 2 flits DSPIN Broadcast Command packet. The CID field contains the 10 MSB bits of the VCI SRCID (actually the source cluster coordinates). The XMIN,XMAX, YMIN, YMAX fields are the 20 MSB bits of the VCI ADDRESS, used by the network to limit the broadcast.
Flit 0 :
EOP | XMI | XMA | YMI | YMA | CID | TRDID | PKTID | BC |
1 | 5 | 5 | 5 | 5 | 10 | 4 | 4 | 1 |
Flit 1 :
EOP | reserved | ----------------NLINE------------------- |
1 | 5 | 34 |
3.3.4 DSPIN Read Response packet format
A N flits VCI Read Response packet is translated to a N+1 flits DSPIN Read Response packet :
Flit 0 :
EOP | RSRCID | RERROR | RTRDID | RPKTID | reserved | BC |
1 | 14 | 2 | 4 | 4 | 5 | 1 |
Flit 1 :
EOP | ---------------RDATA-------------------------- |
1 | 32 |
3.3.5 DSPIN Write response packet format
A single flit VCI Write Response packet is translated to a single flit DSPIN Write Response packet.
Flit 0 :
EOP | RSRCID | RERROR | RTRDID | RPKTID | reserved | BC |
1 | 14 | 2 | 4 | 4 | 5 | 1 |
Note : This format is also used for the response packets to a broadcast command, as each VCI response packet to a broadcast command is actually a VCI response packet to a single flit write command.
4. External Network
This network has a specific topology, as the communication scheme is very peculiar: All PUT/GET transactions are from N initiators (one initiator per cluster) to one single target (the external RAM controller).
4.1 VCI parameters
The external network, that is only transporting cache lines does not use all VCI fields. The address is coded on 34 bits (it is actually a cache line index), and the data field is 64 bits, to increase the bandwidth.
VCI Field | width |
ADDRESS | 34 bits |
WDATA , RDATA | 64 bits |
PLEN | unused |
SRCID, RSRCID | 10 bits |
TRDID, RTRDID | 4 bits |
PKTID, RPKTID | unused |
RERROR | 1 bit |