Changes between Version 1 and Version 2 of AtomicOperations
- Timestamp:
- Jun 30, 2009, 7:59:30 PM (15 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
AtomicOperations
v1 v2 3 3 = Atomic Operations = 4 4 5 == 1 )Goals ==5 == 1. Goals == 6 6 7 7 The TSAR architecture implements atomic read-then write operations to support various software synchronization mechanisms. The constraints are the following : … … 10 10 * As the atomic access can be used to implement spin-locks, the lock address must be cachable in order to benefit from the general coherence protocol, and avoid unnecessary transactions on the interconnection network. 11 11 12 == 2 )LL/SC mechanism ==12 == 2. LL/SC mechanism == 13 13 14 The TSAR memory sub-system supports the LL/SC mechanism , as the LL & SC commands are defined in the VCI/OCP protocol. The LL and SC instructions must be defined bythe processor Instructon Set Architecture. This is natively the case for the MIPS32 & PowerPC processors.15 On the direct network, the VCI CMD field can take four values : READ, WRITE, LINKED_LOAD (LL), and STORE_CONDITIONAL (SC). From a conceptual point of view, the atomicity his handled on the memory controller side (actually the memory cache controllers), as the memory controllers must maintain a list of all pending atomic operations in a reservation table:14 The TSAR memory sub-system supports the LL/SC mechanism. The LL & SC commands are defined in the VCI/OCP protocol, and the LL and SC instructions must be defined in the processor Instructon Set Architecture. This is natively the case for the MIPS32 & PowerPC processors. 15 On the direct network, the VCI CMD field can take four values : READ, WRITE, LINKED_LOAD (LL), and STORE_CONDITIONAL (SC). From a conceptual point of view, the atomicity his handled on the memory controller side (actually the memory cache controllers), as the memory controllers must maintain a list of all pending atomic operations in a ''reservation table'' : 16 16 17 - When a processor, identified by its SRCID, executes the LL(X) instruction, the memory controller registers an entry (SRCID, X) in the reservation table, and returns the memory value stored at address X in the VCI RDATA field. If there was another reservation for the same processor SRCID, but for another address X’, the previous reservation for X’ is canceled.18 - When a processor, identified by its SRCID, executes the SC(X) instruction, there is two possibilities. If there is a pending entry (SRCID, X) indicating that there no other access to the X address has been received, the atomic operation is a success : the write is done, the memory cache controller returns a “true” value in he RDATA VCI field, and all entries in the reservation table for the X address are cancelled. If there is no pendingentry (SRCID, X) in the reservation table, the atomic operation is a failure : The write is not done, and the memory cache returns a “false” value in the RDATA field.17 * When a processor, identified by its SRCID, executes the LL(X) instruction to an address X, the memory controller registers an entry (SRCID, X) in the reservation table, and returns the memory value stored at address X in the VCI RDATA field. If there was another reservation for the same processor SRCID, but for another address X’, the previous reservation for X’ is lost (it means that the previous reservation can be cancelled). 18 * When a processor, identified by its SRCID, executes the SC(X) instruction, there is two possibilities. If there is a valid reservation entry (SRCID, X) indicating that no other access to the X address has been received, the atomic operation is a success : the write is done, the memory cache controller returns a “true” value in he RDATA VCI field, and all entries in the reservation table for the X address are cancelled. If there is no valid reservation entry (SRCID, X) in the reservation table, the atomic operation is a failure : The write is not done, and the memory cache returns a “false” value in the RDATA field. 19 19 20 This mechanism can be used to implement a spin-lock, using any memory address : 21 - The lock acquisition is done by an atomic LL/SC operation. 22 - The lock release is done by a simple WRITE instruction. 20 Clearly, in case of concurrent access, the winner is defined by the first SC instruction received by the memory controller. 23 21 22 As described below, this mechanism can be used to implement a spin-lock, using any memory address : 23 * The lock acquisition is done by an atomic LL/SC operation. 24 * The lock release is done by a simple WRITE instruction. 25 26 {{{ 24 27 _itmask # enter critical section 25 28 # lock acquisition 26 29 loop LL Reg1 @ # Reg1 <= M[@] 27 30 BNE Reg1 loop # continue if lock not taken (Reg1 == 0) 28 SC 1 @ # M[@] <= 1 / Reg2 <= KO31 SC 1 @ # M[@] <= 1 / Reg2 <= KO 29 32 BNE Reg2 loop # retry if not atomic (Reg2 != 0) 30 33 ... 31 34 ... 32 35 # lock release 33 SW 0 @ # M[@] <= 036 SW 0 @ # M[@] <= 0 34 37 _itunmask # exit critical section 38 }}} 35 39 40 == 3. Cachable atomic operations == 36 41 42 In order to support cachable spin-locks, the memory cache controller, and the L1 cache controller must cooperate to implement the LL/SC mechanism. 37 43 44 === 3.1 memory cache controller === 38 45 39 40 5.3 Cachable atomic operations 41 In order to support cachable spin-locks, the memory cache controller, and the L1 cache controller must cooperate to implement the LL/SC mechanism. 42 5.3.1 memory cache controller 43 The memory cache controller contains a dedicated storage that is used to register, for each cache line the set of L1 caches that have copies. These sets of copies are implemented as linked lists of SRCIDs. To implement the Reservation Table, we just introduce, for each registered copy of a cache line, (i.e. each entry in this Reservation tTable) one extra bit to register a pending LL/SC atomic operation. 46 The memory cache controller contains a dedicated storage that is used to register, for each cache line the set of L1 caches that have copies. These sets of copies are implemented as linked lists of SRCIDs. To implement the Reservation Table, we just introduce, for each registered copy of a cache line, (i.e. each entry in this Reservation Table) one extra bit to register a pending LL/SC atomic operation. 44 47 This approach is scalable, but creates the possibility of “false conflicts”, when several atomic access are done to the same cache line. 45 48 46 Request type Actions in the memory cache controller 47 48 LL(SRCID, X) 49 50 49 The actions done by the memory cache controller for the various commands are described below : 50 51 '''LL(SRCID, X)''' 52 {{{ 53 Scan all copies associated to the cache line containing the X address 51 54 If ( a copy corresponding to SRCID.exists) { 52 55 RESERVED = true … … 55 58 and marked RESERVED in the linked list 56 59 } 57 58 SC(SRCID, X) 60 }}} 59 61 60 Scan all copies associated to the cache line containing the X address 62 '''SC(SRCID, X) 63 {{{ 64 Scan all copies associated to the cache line containing the X address 61 65 If ( a copy corresponding to SRCID.exists and RESERVED == true ) { 62 66 - scan again the linked list of copies to send an UPDATE request … … 68 72 return false to the L1 cache 69 73 } 70 71 WRITE(SRCID, X) 72 73 - Scan the linked list of copies to send an UPDATE request 74 }}} 75 76 '''WRITE(SRCID, X)''' 77 {{{ 78 - Scan the linked list of copies to send an UPDATE request 74 79 to the L1 caches (other than SRCID), and invalidate all RESERVED bits 75 80 - Write data in the memory cache 76 81 - after all responses to UPDATE have been received, acknowledge the 77 82 write request. 83 }}} 78 84 79 READ(SRCID, X) If ( cachable request ) { 80 - register the SRCID in the list of copies associated to the X address. 81 - return the complete cache line 85 '''READ(SRCID, X)''' 86 {{{ 87 If ( cachable request ) { 88 - register the SRCID in the list of copies associated to the X address. 89 - return the complete cache line 82 90 } else { 83 91 - return a single word. 84 92 } 93 }}} 85 94 95 === 3.2 L1 cache controller === 86 96 87 88 89 5.3.2 L1 cache controller90 97 The L1 cache controller receiving a new LL(X) request from the processor must locally register this reservation on the X address to validate the use of the locally cached copy, and to check the address when it receives a SC(X) request from the processor. This requires an extra register to store the address, and a RESERVED flip-flop in the L1 cache controller. 91 98 92 Request type Actions in the L1 cache controller 93 94 LL(X) 99 The actions done by the L1 cache controller for the various commands are described below : 95 100 96 (from processor) If (RESERVED = true & ADDRESS = X) { // local spin-lock 101 '''LL(X) from processor''' 102 {{{ 103 If (RESERVED = true & ADDRESS = X) { // local spin-lock 97 104 return the read data to the processor 98 105 } else { // first LL access … … 102 109 and return the read value to the processor 103 110 } 104 105 SC(X) 111 }}} 106 112 107 (from processor) If (RESERVED = true & ADDRESS = X) { // possible succès 113 '''SC(X) from processor''' 114 {{{ 115 If (RESERVED = true & ADDRESS = X) { // possible succès 108 116 send a SC(X) request to the memory cache, 109 117 and return the Boolean response to the processor … … 113 121 RESERVED <= false 114 122 } 123 }}} 115 124 116 INVAL(L) or 117 UPDATE(L) 118 119 (from memory) 120 If (ADDRESS = L) { // invalidate reservation 125 '''INVAL(L) or UPDATE(L) from memory controller''' 126 {{{ 127 If (ADDRESS = L) { // invalidate reservation 121 128 RESERVED <= false 122 129 } 130 and the L1 cache is updated or invalidated. 131 }}} 123 132 124 and the L1 cache must be updated or invalidated.125 126