Changes between Initial Version and Version 1 of AtomicOperations


Ignore:
Timestamp:
Jun 30, 2009, 2:11:16 PM (15 years ago)
Author:
alain
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • AtomicOperations

    v1 v1  
     1[[PageOutline]]
     2
     3= Atomic Operations =
     4
     5== 1) Goals ==
     6
     7The TSAR architecture implements atomic read-then write operations to support various software synchronization mechanisms. The constraints are the following :
     8 * A software program must have the possibility to read a data at address X, test this data, and write the (possibly modified) data at the same address X, with the guaranty that no other access to this data was done between the read and write access. 
     9 * As we want to support commodity operating systems and existing software applications, any memory address can be the target of an atomic access.
     10 * As the atomic access can be used to implement spin-locks, the lock address must be cachable in order to benefit from the general coherence protocol, and avoid unnecessary transactions on the interconnection network.
     11
     12== 2) LL/SC mechanism ==
     13
     14The TSAR memory sub-system supports the LL/SC mechanism, as the LL & SC commands are defined in the VCI/OCP protocol. The LL and SC instructions must be defined by the processor Instructon Set Architecture. This is natively the case for the MIPS32 & PowerPC processors.
     15On the direct network, the VCI CMD field can take four values : READ, WRITE, LINKED_LOAD (LL), and STORE_CONDITIONAL (SC). From a conceptual point of view, the atomicity his handled on the memory controller side (actually the memory cache controllers), as the memory controllers must maintain a list of all pending atomic operations in a reservation table :
     16
     17-       When a processor, identified by its SRCID, executes the LL(X) instruction, the memory controller registers an entry (SRCID, X) in the reservation table, and returns the memory value stored at address X in the VCI RDATA field. If there was another reservation for the same processor SRCID, but for another address X’, the previous reservation for X’ is canceled.
     18-       When a processor, identified by its SRCID, executes the SC(X) instruction, there is two possibilities. If there is a pending entry (SRCID, X) indicating that there no other access to the X address has been received, the atomic operation is a success : the write is done, the memory cache controller returns a “true” value in he RDATA VCI field, and all entries in the reservation table for the X address are cancelled. If there is no pending entry (SRCID, X) in the reservation table, the atomic operation is a failure : The write is not done, and the memory cache returns a “false” value in the RDATA field.
     19
     20This mechanism can be used to implement a spin-lock, using any memory address :
     21-       The lock acquisition is done by an atomic LL/SC operation.
     22-       The lock release is done by a simple WRITE instruction.
     23
     24_itmask         # enter critical section
     25# lock acquisition
     26loop            LL Reg1 @               # Reg1 <= M[@]
     27                        BNE Reg1 loop   # continue if lock not taken (Reg1 == 0)
     28                        SC  1 @         # M[@] <= 1 / Reg2 <= KO
     29                        BNE Reg2 loop   # retry if not atomic (Reg2 != 0)
     30                        ...
     31                ...
     32# lock release
     33                        SW 0 @          # M[@] <= 0
     34                        _itunmask               # exit critical section
     35
     36
     37
     38
     39
     405.3     Cachable atomic operations
     41In order to support cachable spin-locks, the memory cache controller, and the L1 cache controller must cooperate to implement the LL/SC mechanism.
     425.3.1   memory cache controller
     43The memory cache controller contains a dedicated storage that is used to register, for each cache line the set of  L1 caches that have copies. These sets of copies are implemented as linked lists of SRCIDs. To implement the Reservation Table, we just introduce, for each registered copy of a cache line, (i.e. each entry in this Reservation tTable) one extra bit  to register a pending LL/SC atomic operation.
     44This approach is scalable, but creates the possibility of “false conflicts”, when several atomic access are done to the same cache line.
     45
     46  Request type    Actions in the memory cache controller
     47 
     48  LL(SRCID, X)
     49
     50          Scan all copies associated to the cache line containing the X address
     51  If ( a copy corresponding to SRCID.exists) {
     52      RESERVED = true
     53  } else {
     54      a new copy corresponding to SRCID is created in the linked list
     55      and marked RESERVED in the linked list
     56  }
     57 
     58  SC(SRCID, X)
     59
     60         Scan all copies associated to the cache line containing the X address
     61  If ( a copy corresponding to SRCID.exists and RESERVED == true ) {
     62      - scan again the linked list of copies to send an UPDATE request
     63        to the other L1 caches, and invalidate all RESERVED bits
     64      - write data in the memory cache
     65      - after all responses to UPDATE have been received, return true
     66        to the L1 cache.   
     67  } else {
     68      return false to the L1 cache
     69  }
     70 
     71  WRITE(SRCID, X)
     72 
     73          - Scan the linked list of copies to send an UPDATE request
     74    to the L1 caches (other than SRCID), and invalidate all RESERVED bits
     75  - Write data in the memory cache
     76  - after all responses to UPDATE have been received, acknowledge the
     77    write request.
     78
     79  READ(SRCID, X)          If ( cachable request ) {
     80-       register the SRCID in the list of copies associated to the X address.
     81-       return the complete cache line
     82  } else {
     83     - return a single word.
     84  }
     85
     86
     87
     88
     895.3.2   L1 cache controller
     90The L1 cache controller receiving a  new LL(X) request from the processor must locally register this reservation on the X address to validate the use of the locally cached copy, and to check the address when it receives a SC(X) request from the processor. This requires an extra register to store the address, and a RESERVED flip-flop in the L1 cache controller.
     91
     92  Request type    Actions in the L1 cache controller
     93 
     94  LL(X)
     95
     96  (from processor)        If (RESERVED = true & ADDRESS = X) {    // local spin-lock         
     97    return the read data to the processor
     98  } else {                                                           // first LL access                       
     99    RESERVED <= true
     100    ADDRESS <= X
     101    send a LL(X) request to memory cache,
     102    and return the read value to the processor
     103  }
     104 
     105  SC(X)
     106
     107  (from processor)        If (RESERVED = true & ADDRESS = X)  {    // possible succès
     108    send a SC(X) request to the memory cache,
     109    and return the Boolean response to the processor
     110    RESERVED <= false
     111  } else {                                                            // failure 
     112   return a false value to the processor
     113   RESERVED <= false
     114  }
     115
     116  INVAL(L) or
     117  UPDATE(L)
     118
     119   (from memory)
     120          If (ADDRESS = L)  {                                        // invalidate reservation
     121    RESERVED <= false
     122  }
     123
     124  and the L1 cache must be updated or invalidated.
     125 
     126