18 | | The TSAR architecture wants to guaranty the cache coherence by hardware, for both the data and instruction L1 caches. Reflecting the different behaviour of data & instruction caches, the DHCCP protocol defines two different strategies, depending on the number of copies : |
19 | | * '''MULTICAST_UPDATE''' : the modifications of shared data are very frequent events, but – in average – the number of copies is not very high. Therefore, when the number of copies is smaller than a given threshold, the cache controller registers the locations of all the copies, and use a ''multicast/update'' transaction. |
20 | | * Regarding the instructions, the modifications of shared code are rather rare events ( in case of self modifying code, or dynamic libraries ), but the number of replicated copies can be very large ( the system call handler, or the libc are likely replicated in all L1 caches ). Therefore, the DHCCP ptotocol will generally use a ''broadcast/invalidate'' policy for instruction caches. |
| 18 | The TSAR architecture wants to guaranty the cache coherence by hardware, for both the data and instruction L1 caches. Reflecting the different behaviour of data & instruction caches, the ''hybrid" cache coherence protocol defines two different strategies, depending on the number of copies : |
| 19 | * '''MULTICAST_UPDATE''' : the modifications of shared data are very frequent events, but the number of copies is generally not very high. When the number of copies is smaller than the DHCCP threshold, the cache controller registers the locations of all the copies, and send a ''multicast_update'' transaction |
| 20 | to the concerned L1 caches. |
| 21 | * '''BROADCAST_INVAL''' : the modifications of shared code rare events ( self modifying code, or dynamic libraries ), but the number of replicated copies can be very large ( the exception handler, or the libc are generally replicated in all L1 caches ). When the number of copies is larger than the DHCCP threshold, the memory cache controller will simply store the number of copies (without localization) and send a ''broadcast_inval'' transaction to all L1 caches. |
47 | | These 4 transactions implement the DHCCP protocol : For each cache line stored in the memory cache, the memory cache implement a Registration Table that contain the copies replicated in the L1 caches. Each entry in this Registration Table contains the SRCID of a L1 cache that contains a copy, as well as the type of the copy (instruction/data). When the same cache line is replicated in both the instruction cache and the data cache of a processor, this defines two separated entries in the Registration Table. When the number copies for a given cache line L exceeds the DHCCP threshold, the corresponding Registration Table is flushed, and the memory cache register only the number of copies. |
| 48 | These transactions implement the DHCCP protocol : For each cache line stored in the memory cache, the memory cache implement a Registration Table that contain the copies replicated in the L1 caches. Each entry in this Registration Table contains the SRCID of the L1 cache that contains a copy, as well as the type of the copy (instruction/data). When the same cache line is replicated in both the instruction cache and the data cache of a processor, this defines two separated entries in the Registration Table. When the number copies for a given cache line L exceeds the DHCCP threshold, the corresponding Registration Table is flushed, and the memory cache register only the number of copies. |
| 49 | |
| 50 | The coherence transactions use a logically separated ''coherence network'', implementing a separated address space. |
49 | | * A '''MULTI_UPDATE''' transaction is a multi-cast transaction sent by the memory cache controller when it receives a WRITE request to a replicated cache line and the number of copies does not exceeds the DHCCP threshold. It sends as many VCI transactions as the number of registered copies (but the writer). The VCI command packet contains (N+2) flits. The VCI ADDRESS field is constant & contains the address of the memory mapped UPDATE register in the L1 cache. The VCI CMD field contains the WRITE value. As the memory cache controller can handle several simultaneous update/invalidate transactions, the VCI TRDID field contains the transaction index. The VCI PLEN field contains the value 4*N, where N is the actual number of modified words in the cache line. The line index (34 bits) is transported in the VCI WDATA and VCI BE fields, of the first flit. The first modified word index (3 bits) is transported in the WDATA field of the second flit, and the N modified words in the WDATA and BE fields of the N following flits. For each modified word, the VCI BE field can have a different value (including the 0x0 value). The VCI response packet contains one single flit. The memory cache controller counts the number of VCI responses to detect the completion of the MULTI_UPDATE transaction. |
| 52 | * A '''MULTICAST_UPDATE''' transaction is a multi-cast transaction sent by the memory cache controller when it receives a WRITE request to a replicated cache line and the number of copies does not exceeds the DHCCP threshold. It sends as many VCI transactions as the number of registered copies (but the writer). The VCI command packet contains (N+2) flits. The VCI ADDRESS field is constant & contains the address of the memory mapped UPDATE register in the L1 cache. The VCI CMD field contains the WRITE value. As the memory cache controller can handle several simultaneous update/invalidate transactions, the VCI TRDID field contains the transaction index. The VCI PLEN field contains the value 4*N, where N is the actual number of modified words in the cache line. The line index (34 bits) is transported in the VCI WDATA and VCI BE fields, of the first flit. The first modified word index (3 bits) is transported in the WDATA field of the second flit, and the N modified words in the WDATA and BE fields of the N following flits. For each modified word, the VCI BE field can have a different value (including the 0x0 value). The VCI response packet contains one single flit. The memory cache controller counts the number of VCI responses to detect the completion of the MULTI_UPDATE transaction. |
53 | | • A BROADCAST_INVAL transaction is a broadcast transaction. This transaction is initiated when a memory cache controller replace a line that has the instruction type (INS = 1), or when the memory cache receives a WRITE request to a replicated cache line that has the instruction type (INS = 1). The VCI command packet contains one single flit. This packet is replicated & dynamically broadcasted by the network itself. The VCI CMD field contains the WRITE value. The VCI ADDRESS field contains the global broadcast address 0x000000003 (only the two LSB bits are set). The VCI WDATA field contains the line index. This VCI command is broadcasted to all L1 caches in the system, but only L1 caches that have a copy send a VCI response packet. All VCI response packets are independently returned to the memory cache initiator, that counts the number of VCI responses to detect the completion of the BROADCAST_INVAL transaction. If a L1 cache contains two copies of a cache line (i.e. the line is replicated in both the DATA cache, and the INSTRUCTION cache), it must send two VCI responses. |
| 56 | * A '''BROADCAST_INVAL''' transaction is a broadcast transaction. This transaction is initiated when a memory cache controller replaces a line, or receives a WRITE request to a replicated cache line, that has a number of copies larger than the DHCCP threshold. The VCI command packet contains one single flit. This packet is replicated & dynamically broadcasted by the network itself. The VCI CMD field contains the WRITE value. The VCI ADDRESS field contains the global broadcast address 0x000000003 (only the two LSB bits are set). The VCI WDATA field contains the line index. This VCI command is broadcasted to all L1 caches in the system, but only L1 caches that have a copy send a VCI response packet. All VCI response packets are independently returned to the memory cache initiator, that counts the number of VCI responses to detect the completion of the BROADCAST_INVAL transaction. If a L1 cache contains two copies of a cache line (i.e. the line is replicated in both the DATA cache, and the INSTRUCTION cache), it must send two VCI responses. |