Changes between Version 9 and Version 10 of AtomicOperations
- Timestamp:
- Dec 14, 2011, 5:46:07 PM (13 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
AtomicOperations
v9 v10 15 15 On the direct network, the VCI CMD field can take four values : READ, WRITE, LINKED_LOAD (LL), and STORE_CONDITIONAL (SC). From a conceptual point of view, the atomicity his handled on the memory controller side (actually the memory cache controller), as the memory controllers must maintain a list of all pending atomic operations in a ''reservation table'' : 16 16 17 === 2.1 General principle === 18 17 19 * When a processor, identified by its SRCID, executes the LL(X) instruction to an address X, the memory controller registers an entry (SRCID, X) in the reservation table, and returns the memory value stored at address X in the VCI RDATA field. If there was another reservation for the same processor SRCID, but for another address X’, the previous reservation for X’ is lost (it means that the previous reservation is cancelled). 18 * When a processor, identified by its SRCID, executes the SC(X) instruction, there is two possibilities. If there is a valid reservation entry (SRCID, X) indicating that no other access to the X address has been received, the atomic operation is a success : the write is done, the memory cache controller returns a “true” value in he RDATA VCI field, and all entries in the reservation table for the X address are cancelled. If there is no valid reservation entry (SRCID, X) in the reservation table, the atomic operation is a failure : The write is not done, and the memory cache returns a “false”value in the RDATA field.20 * When a processor, identified by its SRCID, executes the SC(X) instruction, there is two possibilities. If there is a valid reservation entry (SRCID, X) indicating that no other access to the X address has been received, the atomic operation is a success : the write is done, the memory cache controller returns a ''success'' value in he RDATA VCI field and all entries in the reservation table for the X address are cancelled. If there is no valid reservation entry (SRCID, X) in the reservation table, the write is not done, and the memory cache returns a ''fail'' value in the RDATA field. 19 21 20 22 Clearly, in case of concurrent access, the winner is defined by the first SC instruction received by the memory controller. 21 23 22 As described below (using MIPS32 instruction set), this mechanism can be used to implement a spin-lock, using any memory address : 24 === 2.2 Failure / Success encoding === 25 26 The actual encoding of the (success/failure) return value for a SC access depends on the processor core: For the MIPS2 27 and ARM processors, a success is encoded as a non-zero value. For the PPC processor, a success is encoded as a zero value. 28 In the TSAR architecture, the memory cache controller returns the value 0 for a success, and the value 1 for a failure. 29 If the architecture uses a MIPS or ARM processor, the SC value must be transcoded by the L1 cache controller before 30 to be transmitted to the processor core. 31 32 === 2.3 Software implementation on MIPS32 processor === 33 34 As described below, the LL/SC mechanism can be used to implement a spin-lock, using any memory address : 23 35 * The lock acquisition is done by an atomic LL/SC operation. 24 36 * The lock release is done by a simple WRITE instruction. 25 37 Remember that a SC failure in encoded by a zero value for the MIPS processor. 38 26 39 {{{ 27 40 _itmask # enter critical section … … 41 54 == 3. Cachable atomic operations == 42 55 43 In order to support cachable spin-locks and a better scalability, the memory cache controller, and the L1 cache controller must cooperate to implement the LL/SC mechanism. But the standard semantic of the LL/SC mechanism has to be modified. 44 56 In order to support cachable spin-locks and a better scalability, the TSAR memory cache controller, and the L1 cache controller cooperate to implement the LL/SC mechanism. But the standard semantic of the LL/SC mechanism has to be modified: 57 * The LL operation is implemented by the L1 cache controller as a standard Read operation. 58 * The SC opration is implemented as a Compare and Swap operation. 45 59 Furthermore, the LL/SC mechanism is extended to support both 32 and 64 bits atomic accesses. 46 60 47 61 === 3.1 new semantic === 48 62 49 The previous semanticis :63 The formal semantic of a LL/SC access is : 50 64 (1) The Store Conditional succeeds if there was no other Store Conditional at the same address since the last Linked Load. 51 65 52 The newsemantic is :66 The implemented semantic is : 53 67 (2) The Store Conditional succeeds if the content of the memory has not changed since the last Linked Load. 54 68 … … 59 73 In the new protocol, there is no more LL on the network : 60 74 61 * Linked Loads become simple Reads, where the data sent to the processor is recorded in a register of the L1 cache. When the processor issues a Store Conditional, the L1 cache sends a "SC" packet, where the first flits contain the data previously read and the last flits contain the data to write in the memory. So this new SC packet is 2 (32 bits accesses) or 4 flits (64 bits accesses) long.75 * Linked Loads become simple Reads, where the data sent to the processor is recorded in a register of the L1 cache. When the processor issues a Store Conditional, the L1 cache sends a "SC" packet, where the first flits contain the data previously read and the next flits contain the data to write in the memory. So this new SC packet is 2 (32 bits accesses) or 4 flits (64 bits accesses) long. 62 76 63 * The memory cache controller then compares the data read by the L1 cache to the data in the memory cache. If these two values are equal, the Store Conditional is issued to the memory and the response to the SC is TRUE, else the Store Conditional is not issued and the response is FALSE.77 * The memory cache controller then compares the data read by the L1 cache to the data in the memory cache. If these two values are equal, the Store Conditional is done, and the response to the SC is success (0 value), else the Store Conditional is not done, and the response is failure (1 value). 64 78 65 79 === 3.3 memory cache controller ===