Changes between Version 1 and Version 2 of Specification
- Timestamp:
- Jun 27, 2009, 2:06:44 PM (15 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Specification
v1 v2 1 1 [[PageOutline]] 2 2 3 = General Principles=3 = Architecture Overview = 4 4 5 5 The TSAR shared memory architecture is a scalable, cache coherent, general-purpose multicore architecture. It is intended to support commodity applications and operating systems running on standard PCs, such as LINUX or FreeBSD. Therefore, the cache coherence must be entirely guaranteed by the hardware. Moreover, the TSAR architecture must provide hardware support for a paginated virtual memory and efficient atomic operations for synchronization. … … 25 25 == The virtual memory support == 26 26 27 The TSAR architecture implements a paginated virtual memory. It defines a generic MMU (Memory Management Unit), physically implemented in the L1 cache controller. This generic MMU is independent on the processor core, and can be used with any 32 bits, single instruction issue RISC processor. T he TLB MISS are handled by an hardwired FSM, and do not use any specific instructions.27 The TSAR architecture implements a paginated virtual memory. It defines a generic MMU (Memory Management Unit), physically implemented in the L1 cache controller. This generic MMU is independent on the processor core, and can be used with any 32 bits, single instruction issue RISC processor. To be independent from the processor core, the TLB MISS are handled by an hardwired FSM, and do not use any specific instructions. 28 28 29 29 The virtual address is 32 bits, and the physical address has up to 40 bits. It defines two types of pages (4 Kbytes pages, and 2 Mbytes pages). The page tables are mapped in memory and have a classical two level hierarchical structure. There is of course two separated TLB (Translation Look-aside Buffers) for instruction addresses and data addresses. 30 30 31 31 In order to help the operating system to implement efficient page replacement policies, each entry in the page table contains three bits that are updated by the hardware MMU : a dirty bit to indicate modifications, and two separated access bits for “local access” (processor and memory cache located in the same cluster), and “remote access” (processor and memory cache located in different clusters). 32 1.4 The DHCCP protocol 32 33 == The DHCCP cache coherence protocol == 34 33 35 The shared memory TSAR architecture implements the DHCCP protocol (Distributed Hybrid Cache Coherence Protocol). As it is not possible to monitor all simultaneous transaction in a distributed network on chip, the DHCCP protocol is based on the global directory paradigm. 34 36 … … 37 39 This choice increases the number of write transactions, and enforces the importance of a proper placement of the data on this NUMA architecture. This is the price to pay for the scalability. 38 40 39 Finally, the DHCCP protocol is called “hybrid”, as it uses a multicast/update policy for data cache, and a broadcast/invaidate policy for instruction caches. 40 1.5 The interconnection networks 41 Finally, the DHCCP protocol is called “hybrid”, as it uses a multicast/update policy when the number of copies is lower than a given threshold, and automatically switches to a broadcast/invalidate policy when this number of copies exceeds this threshold. 42 43 == The interconnection networks == 44 41 45 The TSAR architecture requires a hierarchical two levels interconnect : each cluster must contain a local interconnect, and the communications between clusters relies on a global interconnect. 42 46 … … 48 52 49 53 50 1.6 Atomic instructions 51 Any multi-processor architecture must provide an hardware support for atomic operations. These “read-then-write” atomic operations are used by the software for synchronization. 54 == Atomic instructions == 52 55 53 In a distributed, yet shared memory, architecture using a NoC, these atomic operations must be implemented in both the memory controller (in our case, the memory caches), and the L1 cache controller.56 Any multi-processor architecture must provide an hardware support for atomic operations. These “read-then-write” atomic operations are used by the software for synchronization. 54 57 55 Each processor instruction set defines a different set of atomic instruction. The TSAR architecture implements the LL/SC mechanism, that are natively defined by the MIPS32 & PPC405 processors, and are directly supported by the VCI/OCP standard. Other atomic instructions, such as the SWAP, or LDSTUB instructions defined by the SPARC processor can be emulated using the LL/SC instructions. 58 In a distributed architecture using a NoC, these atomic operations must be implemented in both the memory controller (in our case, the memory caches), and the L1 cache controller. 59 60 Each processor instruction set defines a different set of atomic instruction. The TSAR architecture implements the LL/SC mechanism, that are natively defined by the MIPS32 & PPC405 processors, and are directly supported by the VCI/OCP standard. Other atomic instructions, such as the SWAP, or LDSTUB instructions defined by the SPARC processor can be emulated using the LL/SC instructions. 56 61 57 62 With this mechanism, the TSAR architecture allows the system developers to use cachable spin-locks. 58 63 59 = Virtual memory =60 61 = Cache Coherence Protocol =62 63 = Atomic Operations =64 65 = Interconnection Networks =66 67 = VCI/OCP parameters =