| 5 | This document describes the general principles of ALMOS-MKH, which is an operating system targeting manycore architectures with CC-NUMA (Coherent Cache, Non Uniform Memory Access) shared address space, such as the TSAR architecture which can support up to 1024 32-bit MIPS cores. ALMOS-MKH also targets INTEL / AMD multi-core architectures using 64-bit I86 cores. |
| 6 | |
| 7 | Targeted architectures are assumed to be clustered, with one or more core and a physical memory bank per cluster. These architecture must support POSIX standard multi-threaded parallel applications. |
| 8 | |
| 9 | ALMOS-MKH is the heir to ALMOS system, developed by Ghassan Almaless, and the general principles of the ALMOS system are described in his thesis. |
| 10 | |
| 11 | A first version of ALMOS-MKH, and in particular the distributed file system and the communication mechanism by RPC were developed by Mohamed Karaoui, and the general principles of the proposed Multi-Kernel approach are described in his thesis. The system was called ALMOS-MK without H. |
| 12 | |
| 13 | ALMOS-MKH is based on the "Multi-Kernel" approach to ensure scalability, and support the distribution of system services. In this approach, each cluster of the architecture owns an instance of the kernel. Each instance controls the local resources (memory and computing cores). These multiple instances cooperate with each other to give applications the image of a single system controlling all resources. They communicate with each other on the client / server model by using Remote Procedure Calls (RPCs). |
| 14 | |
| 15 | To reduce energy consumption, ALMOS-MKH supports architectures using 32-bit cores. In this case, each cluster has a 32-bit physical address space, and the local physical addresses (internal to a cluster) have therefore 32-bit. To access the physical addressing space of other clusters, ALMOS-MKH uses 64-bit global physical addresses. For example, the physical space of the TSAR architecture uses 40 bits, and the 8 most significant bits define the target cluster number. ALMOS-MKH thus explicitly distinguishes two types of access: |
| 16 | * Local access (internal to a cluster) uses 32-bit addresses. |
| 17 | * remote accesses (to another cluster) use 64-bit addresses. |
| 18 | |
| 19 | On a hardware platform containing 32-bit cores, ALMOS-MKH runs entirely in physical addressing: the MMU is only used by the application code. The MMU is deactivated as soon as you enter the kernel, and it is reactivated when you leave it. 32-bit physical addresses allow the kernel instance of a K cluster to directly access all local resources (memory or devices). To directly access the address space of another cluster, ALMOS-MKH uses ''remote_read'' and ''remote_write'' primitives using 64-bit extended physical addresses (CXY / PTR). CXY is the 32-bit target cluster identifier, and PTR is the local physical address in the 32-bit target cluster. These primitives are used to implement the RPC mechanism, but are also used to speed up some access to kernel distributed data structures, which are critical in performance. |
| 20 | |
| 21 | On a hardware platform containing 64-bit cores, it is no longer necessary to run the kernel in physical addressing, |
| 22 | since all of the physical space can be mapped into the 64-bit virtual space. However, to enhance access localization while minimizing contention points, ALMOS-MKH continues to distinguish between local and remote accesses, and the communication model between kernel instances is not changed. |
| 23 | |
| 24 | In both cases, communications between kernel instances are therefore implemented by a mix of RPCs (on the client / server model), and direct access to remote memory (when this is useful for performance). It is this hybrid approach that is the main originality of ALMOS-MKH and that is the reason of the H added after MK. |
| 25 | |
| 26 | {{{#!comment |
42 | | To avoid contention when several threads access the same page table to handle TLB miss, ALMOS-MK replicates the page tables. For each multi-threaded user application P, the Generic Page Table (GPT), and the Virtual Segments List (VSL) are replicated in each cluster K containing at least one thread of the application. According to the "on-demand paging" principle, these replicated structures GPT(K,P) and VSL(K,P) are dynamically updated when page faults are detected. This section describes this building mechanism and the coherence protocol required by these multiple copies. |
| 65 | To avoid contention when several threads access the same page table to handle TLB miss, ALMOS-MKH replicates the page tables. For each multi-threaded user application P, the Generic Page Table (GPT), and the Virtual Segments List (VSL) are replicated in each cluster K containing at least one thread of the application. According to the "on-demand paging" principle, these replicated structures GPT(K,P) and VSL(K,P) are dynamically updated when page faults are detected. This section describes this building mechanism and the coherence protocol required by these multiple copies. |