9 | | * The '''Direct Network''' implements the 40 bits TSAR physical address space that is visible by the software. It transports the direct READ, WRITE, LL, SC and CAS transactions from any VCI initiator (typically a L1 cache controller or another hardware coprocessor with a DMA capability) to any VCI target (typically a memory cache controller, or a memory mapped peripheral). |
10 | | |
11 | | * The '''Coherence Network''' implements a separated address space, used to transport the coherence transactions between memory cache controllers and L1 cache controllers. This address space is not visible by the software. |
12 | | |
13 | | * The '''External Network''' implements a 34 bits physical address space.This network transports the PUT and GET transactions from the memory cache controller to the external RAM controller, in case of MISS or cache line replacement in the memory cache. This address space is not visible by the software. |
14 | | |
15 | | == 2. VCI initiators & targets indexing == |
16 | | |
17 | | A given hardware component can have several VCI ports. For example the L1 cache has three VCI ports : one initiator port to the direct network, one initiator port to the coherence network, and one target port on the coherence network. Each VCI port can have a different identifier that is defined by three indexes : |
| 9 | * The '''Direct Network''' implements the 40 bits TSAR physical address space that is visible by the software. It transports the direct READ, WRITE, LL, SC and CAS transactions from any VCI initiator (typically a L1 cache controller or another hardware coprocessor with a DMA capability) to any VCI target (typically a memory cache controller, or a memory mapped peripheral). All VCI packets are translated to DSPIN packets by specific VCI/DSPIN wrappers. There is actually two physically separated networks for command packets and response packets. Both networks have a two-level hierarchical structure with a local interconnect in each cluster (that can be implemented as a local crossbar, or as a local ring), and a global interconnect (implemented as a 2D mesh). |
| 10 | |
| 11 | * The '''Coherence Network''' is used to transport the coherence packets implementing the DHCCP coherence protocol between L2 cache controllers and L1 cache controllers. This network is not visible by the software, and does not use wrappers, as the L1 and L2 cache controllers use directly the DSPIN packet format. Here again there is two physically separated networks to transport L2-to-L1 packets, and to transport L1-to-L2 packets. Both networks have a two-level hierarchical structure with a local interconnect in each cluster (that can be implemented as a local crossbar, or as a local ring), and a global interconnect (implemented as a 2D mesh). |
| 12 | |
| 13 | * The '''Direct Network''' and the '''coherence Network''' are physically separated in each cluster, but |
| 14 | they are only logically separated for the global communications: Regarding the local interconnect, there is four physically separated local crossbars (or local ring) transporting the ''direct command'', ''direct response'', '' coherence L1-to-L2'', ''coherence L2-to-L1'' packets. Regarding the global interconnect, the DSPIN infrastructure supporting virtual channels, the ''direct command'' and the ''coherence L2-to-L1" packets are multiplexed on the same 2D mesh (40 bits DSPIN flit width). Similarly, the ''direct response'' and ''coherence L1-to-L2'' packets are multiplexed on the same 2D mesh (33 bits DSPIN width). |
| 15 | |
| 16 | * The '''External Network''' supports communications between the L2 caches and the ''tiles'' implementing the 3D L3 cache, in case of MISS or cache line replacement in the L2 caches. It has a 3D mesh topology and the DSPIN flit width is 64 bits. This external network addressing space is not visible by the software. |
| 17 | |
| 18 | * A given hardware component can be connected to several networks. For example the L1 cache has one VCI initiator port to the direct network, and one DSPIN port to the coherence network. The L2 cache has one VCI target port on the direct network, one DSPIN port on the coherence network, and one DSPIN port to the external network. |
| 19 | |
| 20 | == 2. VCI initiators & targets indexing on direct network == |
| 21 | |
| 22 | On the direct network, each VCI port has an identifier that is defined by three indexes : |
26 | | The NX, NY and NL parameters are global for a given instance of the TSAR architecture. NX & NY cannot be larger than 5 (no more than 1024 clusters), |
27 | | but can be smaller, if the number of clusters is smaller than 1024. NL is equal to 4 (no more than 16 target ports or 16 initiator ports per cluster). |
28 | | |
29 | | In order to simplify the hardware implementation of the memory coherence protocol, the L_ID values are standardized on the coherence network, and the same value is used for an initiator port and for a target port: If the number of processors per cluster is NPROCS, the processor L_ID value is between 0 and (NPROCS-1). The memory cache L_ID is equal to NPROCS. |
30 | | |
| 29 | The NX, NY and NL parameters are global for a given instance of the TSAR architecture. NX & NY cannot be larger than 5 (no more than 1024 clusters), but can be smaller, if the number of clusters is smaller than 1024. NL is equal to 4 (no more than 16 target ports or 16 initiator ports per cluster). |
| 30 | |
| 31 | In order to simplify the hardware implementation, the L_ID values defined for the direct network are |
| 32 | also used on the coherence network, and the same value is used for an initiator port and for a target port: If the number of processors per cluster is NPROCS, the LI cache L_ID value is between 0 and (NPROCS-1). The L2 cache L_ID is equal to NPROCS. |
82 | | Remarks on the '''PKTID''' field encoding : |
83 | | * for a TYPE_READ, bit 0 is set (resp. not set) for a miss (resp. uncached) request |
84 | | * for a TYPE_READ, bit 1 is set (resp. not set) for an instruction (resp. data) request |
85 | | * bit 2 can be used to check for a TYPE_READ (bit 2 = 0) |
86 | | |
87 | | When a given initiator can send several simultaneous transactions of a given type (such as several simultaneous '''WRITE''' transactions), the VCI '''TRDID''' field is used to discriminate them. The '''TRDID''' field is 4 bits, supporting up to 16 simultaneous transactions for a given initiator. |
| 86 | When a given initiator can send several simultaneous transactions of a given type (such as several simultaneous WRITE transactions), the VCI '''TRDID''' field is used to discriminate them. The '''TRDID''' field is 4 bits, supporting up to 16 simultaneous transactions for a given initiator. |
91 | | A VCI '''READ''' command packet contains one flit. In case of burst, all addresses must within the same cache line. |
92 | | * The VCI '''CMD''' field must be set to CMD_READ. |
93 | | * The VCI '''TRDID''' field is not used by the L1 cache, but can be used by multi-channel DMA controllers to transmit the channel index. |
94 | | * The VCI '''PKTID''' field can be any of the 4 TYPE_READ_* of the previous table. |
95 | | |
96 | | A VCI '''READ''' response packet returns either |
97 | | * Up to 16 flits containing the uncached data in the '''RDATA''' field (for a '''PKTID''' = TYPE_READ_*_UNC). |
98 | | * Exactly 16 flits containing one word per flit in the '''RDATA''' field (for a '''PKTID''' = TYPE_READ_*_MISS). |
| 90 | A VCI '''READ''' command packet contains one flit. In case of burst, all addresses must within the same cache line. The VCI '''TRDID''' field can be used by multi-channel DMA controllers to transmit the channel index. A VCI '''READ''' response packet returns up to 16 flits. |
102 | | A VCI '''WRITE''' command packet contains from 1 to 16 flits. In case of burst, all addresses must within the same cache line. |
103 | | * The VCI '''CMD''' field must be set to CMD_WRITE. |
104 | | * The VCI '''TRDID''' field is used by the L1 cache to index its write buffer. It can be used by multi-channel DMA controllers to transmit the channel index. |
105 | | * The VCI '''PKTID''' field must be TYPE_WRITE. |
106 | | |
107 | | A VCI '''WRITE''' response packet always returns a single flit with a 0 value in the '''RDATA''' field. |
| 94 | * A VCI '''WRITE''' command packet contains from 1 to 16 flits. In case of burst, all addresses must within the same cache line. The VCI '''TRDID''' field is used by the L1 cache to index its write buffer. It can be used by multi-channel DMA controllers to transmit the channel index. |
| 95 | * A VCI '''WRITE''' response packet contains one single flit. |
111 | | A VCI '''LL (Linked Load)''' command packet contains one single flit. |
112 | | ('''N.B.''': this request is only sent by a L1 cache and can only target a memory cache) |
113 | | * The VCI '''CMD''' field must be set to CMD_LOCKED_READ. |
114 | | * The VCI '''TRDID''' field is not used by the L1 cache. |
115 | | * The VCI '''PKTID''' field must be TYPE_LL. |
116 | | |
117 | | A VCI '''LL (Linked Load)''' response packet contains 2 flits : |
118 | | * The first flit contains in the '''RDATA''' field a signature returned by the memory cache for this LL reservation. |
119 | | * The second flit contains in the '''RDATA''' field the data that has been read in the memory cache. |
| 99 | * '''N.B.''': this request is only sent by a L1 cache and can only target a memory cache. |
| 100 | * A VCI '''LL''' command packet contains one single flit. |
| 101 | * A VCI '''LL''' response packet contains 2 flits: The first flit contains in the '''RDATA''' field a signature returned by the memory cache for this LL reservation. The second flit contains in the '''RDATA''' field the data that has been read in the memory cache. |
123 | | A VCI '''SC (Store Conditionnal)''' command packet contains 2 flits. |
124 | | ('''N.B.''': this request is only sent by a L1 cache and can only target a memory cache) |
125 | | * The VCI '''CMD''' field must be set to CMD_STORE_COND. |
126 | | * The VCI '''TRDID''' field is not used by the L1 cache. |
127 | | * The VCI '''PKTID''' field must be TYPE_SC. |
128 | | * The first flit contains in the '''WDATA''' field the signature obtained with the last LL operation at this address. |
129 | | * The second flit contains in the '''WDATA''' field the data to be written. |
130 | | |
131 | | A VCI '''SC (Store Conditional)''' response packet contains 1 flit. |
132 | | * The '''RDATA''' field contains 0 (resp. 1) to indicate an SC success (resp. failure). |
| 105 | * '''N.B.''': this request is only sent by a L1 cache and can only target a memory cache. |
| 106 | * A VCI '''SC''' command packet contains 2 flits. The first flit contains in the '''WDATA''' field the signature obtained with the last LL operation at this address. The second flit contains in the '''WDATA''' field the data to be written. |
| 107 | * A VCI '''SC''' response packet contains 1 flit. The '''RDATA''' field contains 0 (resp. 1) to indicate an SC success (resp. failure). |
136 | | A VCI '''CAS (Compare & Swap)''' command packet contains 2 flits. |
137 | | ('''N.B.''': this request is only sent by a L1 cache and can only target a memory cache) |
138 | | * The VCI '''CMD''' field must be set to CMD_STORE_COND. |
139 | | * The VCI '''TRDID''' field is not used by the L1 cache. |
140 | | * The VCI '''PKTID''' field must be TYPE_CAS. |
141 | | * The first flit contains in the '''WDATA''' field the old value of the data to be overwritten. |
142 | | * The second flit contains in the '''WDATA''' field the new value to be written. |
143 | | |
144 | | A VCI '''CAS (Compare & Swap)''' response packet contains 1 flit. |
145 | | * The '''RDATA''' field contains 0 (resp. 1) to indicate a CAS success (resp. failure). |
| 111 | * '''N.B.''': this request is only sent by a L1 cache and can only target a memory cache. |
| 112 | * A VCI '''CAS''' command packet contains 2 flits The first flit contains in the '''WDATA''' field the old value of the data to be overwritten. The second flit contains in the '''WDATA''' field the new value to be written. |
| 113 | * A VCI '''CAS''' response packet contains 1 flit. The '''RDATA''' field contains 0 (resp. 1) to indicate a CAS success (resp. failure). |
152 | | * For a non broadcast command packet (BC = 0), the (NX+NY+NL) MSB bits of the first field are used to route the packet to the proper destination. |
153 | | * For a broadcast packet (BC = 1), and the XMIN, XMAX, YMIN, YMAX fields (5 bits each), are used by the network to limit the broadcast. |
154 | | |
155 | | The DSPIN format can transport 40 bits VCI ADDRESS, and 14 bits VCI SRCID. |
| 120 | * For a non broadcast packet (BC = 0), the (NX+NY+NL) MSB bits of the first field are used to route the packet to the proper destination. |
| 121 | * For a broadcast packet (BC = 1), the XMIN, XMAX, YMIN, YMAX fields (5 bits each), are used by the network to limit the broadcast. |
| 122 | |
| 123 | The DSPIN format can transport up to 40 bits VCI ADDRESS, and up to 14 bits VCI SRCID. |
212 | | There is 4 packets types from L2 to L1, and 2 packet types from L1 to L2. |
| 178 | Broadcast commands are only used on the L2-to-L1 network, and use the BC bit in first flit. |
| 179 | |
| 180 | * Other than BROADCAST, there is 5 packet types from L2 to L1 (3 bits encoding) |
| 181 | |
| 182 | || TYPE || BIT2 || BIT1 || BIT0 || |
| 183 | || || || || || |
| 184 | ||CLEANUP_ACK|| 1 || * || * || |
| 185 | ||UPDATE_DATA|| 0 || 1 || 1 || |
| 186 | ||UPDATE_INS || 0 || 1 || 0 || |
| 187 | ||INVAL_DATA || 0 || 0 || 1 || |
| 188 | ||INVAL_INS || 0 || 0 || 0 || |
| 189 | |
| 190 | * There is 3 packet types from L1 to L2 (2 bits encoding) |
| 191 | |
| 192 | || TYPE || BIT1 || BIT0 || |
| 193 | || || || || |
| 194 | ||CLEANUP_DATA|| 1 || 0 || |
| 195 | ||CLEANUP_INS || 1 || 1 || |
| 196 | ||MULTI-ACK || 0 || * || |
310 | | This network has a 3D mesh topology: All PUT/GET transactions are from N initiators to M targets (the M tiles of the L3 cache). |
311 | | |
312 | | === 4.1 VCI parameters === |
313 | | |
314 | | The external network, that is only transporting cache lines does not use all VCI fields. The |
315 | | address is coded on 34 bits (it is actually a cache line index), and the data field is 64 bits, |
316 | | to increase the bandwidth. |
317 | | |
318 | | || VCI Field || width || |
319 | | || || || |
320 | | ||ADDRESS || 34 bits || |
321 | | ||WDATA , RDATA || 64 bits || |
322 | | ||PLEN || unused || |
323 | | ||SRCID, RSRCID || 10 bits || |
324 | | ||TRDID, RTRDID || 4 bits || |
325 | | ||PKTID, RPKTID || unused || |
326 | | ||RERROR || 1 bit || |
| 294 | TBD |
| 295 | |
| 296 | |