= ALMOS-MKH Specification = [[PageOutline]] This document describes the general principles of ALMOS-MKH, which is an operating system targeting manycore architectures with CC-NUMA (Coherent Cache, Non Uniform Memory Access) shared address space, such as the TSAR architecture which can support up to 1024 32-bit MIPS cores. ALMOS-MKH also targets INTEL based multi-core servers using 64-bit I86 cores. Targeted architectures are assumed to be clustered, with one or several cores, and one physical memory bank per cluster. These architectures are supposed to support POSIX-compliant multi-threaded parallel applications. ALMOS-MKH inherited from the ALMOS system, developed by Ghassan Almaless. The general principles of the ALMOS system are described in his thesis. A first version of ALMOS-MKH, and in particular the distributed file system and the RPC communication mechanism were developed by Mohamed Karaoui. The general principles of the proposed Multi-Kernel approach are described in his thesis. This system was called ALMOS-MK (without H). ALMOS-MKH is based on the "Multi-Kernel" approach to ensure scalability, and support the distribution of system services. In this approach, each cluster of the architecture contains an instance of the kernel. Each instance controls the local resources (memory and computing cores). These multiple instances cooperate with each other to give applications the image of a single system controlling all resources. They communicate with each other using both (i) the client / server model, sending a remote procedure call (RPC) to a remote cluster for a complex service, (ii) the shared memory paradigm, making direct read/write access to remote memory when required. The main ALMOS-MKH specific feature is the following: the physical address space is supposed to be distributed between the clusters, and the MSB bits of the physical address are supposed to define the target cluster. In a physical address, the LSB bits contain the '''lpa''' (Local Physical Address), and the MSB bits define the '''cxy''' (Cluster Identifier). The physical address space is therefore described as a two levels array M[cxy][lpa]. To enforce locality, a kernel instance can use a normal pointer '''ptr''' to access the local physical memory. But, to access the physical memory of a remote cluster (cxy), it must use the specific remote access functions ''remote_read(cxy,ptr)'' and ''remote_write(cxy,ptr)'', where ''ptr'' is the local pointer in the target cluster. This (cxy,ptr) couple is called an ''extended pointer''. As mentioned, ALMOS-MKH supports both architectures using 64-bits cores, and architectures using 32-bits cores: On a typical Intel-based hardware platform containing '''64-bit cores''', the physical address has 44 bits : the 4 MSB bits define the target cluster identifier '''cxy''', and the 40 LSB bits define the local physical address '''lpa'''. To avoid contention, the kernel code is replicated in all clusters to define in each cluster one KCODE physical segment. and ALMOS-MKH uses the '''Instruction MMU''' to map - in each cluster - the local kernel code copy in the kernel virtual space. Regarding the data accesses, each cluster contains one KDATA and one KHEAP physical segments (the KHEAP physical segment contains all local physical memory not occupied by KCODE and KDATA). As the 48 bits virtual address space is large enough to map all these distributed KDATA[cxy] and KHEAP[cxy] segments, they can be all mapped in the kernel virtual space, and the '''Data MMU''' is used to translate both the local and the remote data accesses. On the TSAR hardware platforms containing '''32-bit cores''', the physical address has 40 bits : the 8 MSB bits define the target cluster identifier '''cxy''', and the 32 LSB bits define the local physical address '''lpa'''. On these architectures, the virtual address is 32 bits, and this virtual space is too small to map all the distributed KDATA[cxy] and KHEAP[cxy] physical segments. On these architectures, ALMOS-MKH kernel runs partially in physical addressing: the kernel code is still replicated in all clusters, and uses the '''Instruction MMU''' to map the local kernel code copy in the kernel virtual space. But, for data accesses, the '''Data MMU''' is deactivated as soon as a core enters the kernel, and it is reactivated when it returns to user. ALMOS-MK uses a software controlable (TSAR-specific) extension register, containing a '''cxy''' value. When the Data MMU is deactivated, this '''cxy''' cluster identifier is concatenated to the 32 bits '''ptr''' pointer to build (in pseudo identity mapping) a 40 bits physical address. The default value for this extension register is the local cluster identifier, and is used to access the local memory. To access memory in a remote cluster, the ''remote_read'' and ''remote_write'' primitives modify the extension register before the remote memory access, and restore it after the remote memory access. In both cases, communications between kernel instances are therefore implemented by a mix of RPCs (on the client / server model), and direct access to remote memory (when this is useful for performance). This hybrid approach is the main originality of ALMOS-MKH. == A) [wiki:arch_info Hardware Platform Definition] == This section describes the general assumptions made by ALMOS-MKH regarding the hardware architecture, and the mechanism to configure ALMOS-MKH for a given target architecture. == B) [wiki:processus_thread Process & threads creation/destruction] == ALMOS-MKH supports the POSIX threads API. In order to avoid contention in massively multi-threaded applications, ALMOS-MKH replicates the user process descriptors in all clusters containing threads of this process. This section describes the mechanisms for process and thread creation / destruction. == C) [wiki:replication_distribution Data replication & distribution policy] == This section describes the general ALMOS-MKH policy for replication/distribution of the information on the various physical memory banks. We have two main goals: enforce memory access locality, and avoid contention when several threads access simultaneously the same information. To control the placement and the replication of the physical memory banks, the kernel uses the paged virtual memory. == D) [wiki:page_tables GPT & VSL implementation] == To avoid contention when several threads access the same page table to handle TLB miss, ALMOS-MKH replicates the process descriptors : For each multi-threaded process P, the Generic Page Table (GPT), and the Virtual Segments List (VSL) are replicated in each cluster K containing at least one thread of the application. According to the "on-demand paging" principle, these replicated structures GPT(K,P) and VSL(K,P) are dynamically updated when page faults are detected. This section describes this building mechanism and the coherence protocol required by these multiple copies. == E) [wiki:thead_scheduling Trans-cluster lists] == ALMOS-MKH must handle dynamic sets of objects, such as the set of all threads waiting to access a given peripheral device. These sets of threads are implemented as circular double linked lists. As these threads can be running on any cluster, these linked lists are ''trans-cluster'' lists, and require specific technics in a multi kernel OS. == F) [wiki:rpc_implementation Remote Procedure Calls] == To enforce locality for complex operations requiring a large number of remote memory accesses, the various kernel instances can communicate using RPCs (Remote Procedure Call), following the client/server model. This section describe the RPC mechanism implemented by ALMOS-MKH. == G) [wiki:io_operations Input/Output Operations] == This section describes the ALMOS-MKH policy regarding I/O operations and access to peripherals. == H) [wiki:file_system Distributed File System] == This section describes the implementation of the ALMOS-MKH Virtual File System, that is the largest trans-clusters, distributed kernel structure, using the ''extended pointers'' technology for remote accesses. == I) [wiki:boot_procedure Boot procedure] == This section describes the ALMOS-MKH boot procedure. == J) [wiki:scheduler Threads Scheduling] == This section describes the ALMOS-MKH policy for threads scheduling. == K) [wiki:kernel_synchro Kernel level synchronisations] == This section describes the synchronisation primitives used by ALMO-MKH, namely the barriers used during the parallel kernel initialization, and the locks used to protect concurrent access to the shared kernel data structures. == L) [wiki:user_synchro User level synchronisations] == This section describes the ALMOS-MKH implementation of the POSIX compliant, user-level synchronisation services: mutex, condvar, barrier and semaphore.