wiki:WikiStart

Version 67 (modified by alain, 5 years ago) (diff)

--

ALMOS-MKH Specification

This document describes the general principles of ALMOS-MKH, which is an operating system targeting manycore architectures with CC-NUMA (Coherent Cache, Non Uniform Memory Access) shared address space, such as the TSAR architecture which can support up to 1024 32-bit MIPS cores. ALMOS-MKH also targets INTEL based multi-core servers using 64-bit I86 cores.

Targeted architectures are assumed to be clustered, with one or several cores, and one physical memory bank per cluster. These architectures are supposed to support POSIX-compliant multi-threaded parallel applications.

ALMOS-MKH inherited from the ALMOS system, developed by Ghassan Almaless. The general principles of the ALMOS system are described in his thesis.

A first version of ALMOS-MKH, and in particular the distributed file system and the RPC communication mechanism were developed by Mohamed Karaoui. The general principles of the proposed Multi-Kernel approach are described in his thesis. This system was called ALMOS-MK (without H).

ALMOS-MKH is based on the "Multi-Kernel" approach to ensure scalability, and support the distribution of system services. In this approach, each cluster of the architecture contains an instance of the kernel. Each instance controls the local resources (memory and computing cores). These multiple instances cooperate with each other to give applications the image of a single system controlling all resources. They communicate with each other using both (i) the client / server model, sending a remote procedure call (RPC) to a remote cluster for a complex service, (ii) the shared memory paradigm, making direct read/write access to remote memory when required.

The main ALMOS-MKH specific feature is the following: the physical address space is supposed to be distributed between the clusters, and the MSB bits of the physical address are supposed to define the target cluster. In a physical address, the LSB bits contain the lpa (Local Physical Address), and the MSB bits define the cxy (Cluster Identifier). The physical address space is therefore described as a two levels array M[cxy][lpa]. To enforce locality, a kernel instance can use a normal pointer ptr to access the local physical memory. But, to access the physical memory of a remote cluster (cxy), it must use the specific remote access functions remote_read(cxy,ptr) and remote_write(cxy,ptr), where ptr is the local pointer in the target cluster. This (cxy,ptr) couple is called an extended pointer.

As mentioned, ALMOS-MKH supports both architectures using 64-bits cores, and architectures using 32-bits cores:

On a typical Intel-based hardware platform containing 64-bit cores, the physical address has 44 bits : the 4 MSB bits define the target cluster identifier cxy, and the 40 LSB bits define the local physical address lpa. To avoid contention, the kernel code is replicated in all clusters to define in each cluster one KCODE physical segment. and ALMOS-MKH uses the Instruction MMU to map - in each cluster - the local kernel code copy in the kernel virtual space. Regarding the data accesses, each cluster contains one KDATA and one KHEAP physical segments (the KHEAP physical segment contains all local physical memory not occupied by KCODE and KDATA). As the 48 bits virtual address space is large enough to map all these distributed KDATA[cxy] and KHEAP[cxy] segments, they can be all mapped in the kernel virtual space, and the Data MMU is used to translate both the local and the remote data accesses.

On the TSAR hardware platforms containing 32-bit cores, the physical address has 40 bits : the 8 MSB bits define the target cluster identifier cxy, and the 32 LSB bits define the local physical address lpa. On these architectures, the virtual address is 32 bits, and this virtual space is too small to map all the distributed KDATA[cxy] and KHEAP[cxy] physical segments. On these architectures, ALMOS-MKH kernel runs partially in physical addressing: the kernel code is still replicated in all clusters, and uses the Instruction MMU to map the local kernel code copy in the kernel virtual space. But, for data accesses, the Data MMU is deactivated as soon as a core enters the kernel, and it is reactivated when it returns to user. ALMOS-MK uses a software controlable (TSAR-specific) extension register, containing a cxy value. When the Data MMU is deactivated, this cxy cluster identifier is concatenated to the 32 bits ptr pointer to build (in pseudo identity mapping) a 40 bits physical address. The default value for this extension register is the local cluster identifier, and is used to access the local memory. To access memory in a remote cluster, the remote_read and remote_write primitives modify the extension register before the remote memory access, and restore it after the remote memory access.

In both cases, communications between kernel instances are therefore implemented by a mix of RPCs (on the client / server model), and direct access to remote memory (when this is useful for performance). This hybrid approach is the main originality of ALMOS-MKH.

A) Hardware Platform Definition

This section describes the general assumptions made by ALMOS-MKH regarding the hardware architecture, and the mechanism to configure ALMOS-MKH for a given target architecture.

B) Process & threads creation/destruction

ALMOS-MKH supports the POSIX threads API. In order to avoid contention in massively multi-threaded applications, ALMOS-MKH replicates the user process descriptors in all clusters containing threads of this process. This section describes the mechanisms for process and thread creation / destruction.

C) Data replication & distribution policy

This section describes the general ALMOS-MKH policy for replication/distribution of the information on the various physical memory banks. We have two main goals: enforce memory access locality, and avoid contention when several threads access simultaneously the same information. To control the placement and the replication of the physical memory banks, the kernel uses the paged virtual memory.

D) GPT & VSL implementation

To avoid contention when several threads access the same page table to handle TLB miss, ALMOS-MKH replicates the process descriptors : For each multi-threaded process P, the Generic Page Table (GPT), and the Virtual Segments List (VSL) are replicated in each cluster K containing at least one thread of the application. According to the "on-demand paging" principle, these replicated structures GPT(K,P) and VSL(K,P) are dynamically updated when page faults are detected. This section describes this building mechanism and the coherence protocol required by these multiple copies.

E) Trans-cluster lists

ALMOS-MKH must handle dynamic sets of objects, such as the set of all threads waiting to access a given peripheral device. These sets of threads are implemented as circular double linked lists. As these threads can be running on any cluster, these linked lists are trans-cluster lists, and require specific technics in a multi kernel OS.

F) Remote Procedure Calls

To enforce locality for complex operations requiring a large number of remote memory accesses, the various kernel instances can communicate using RPCs (Remote Procedure Call), following the client/server model. This section describe the RPC mechanism implemented by ALMOS-MKH.

G) Input/Output Operations

This section describes the ALMOS-MKH policy regarding I/O operations and access to peripherals.

H) Distributed File System

This section describes the implementation of the ALMOS-MKH Virtual File System, that is the largest trans-clusters, distributed kernel structure, using the extended pointers technology for remote accesses.

I) Boot procedure

This section describes the ALMOS-MKH boot procedure.

J) Threads Scheduling

This section describes the ALMOS-MKH policy for threads scheduling.

K) Kernel level synchronisations

This section describes the synchronisation primitives used by ALMO-MKH, namely the barriers used during the parallel kernel initialization, and the locks used to protect concurrent access to the shared kernel data structures.

L) User level synchronisations

This section describes the ALMOS-MKH implementation of the POSIX compliant, user-level synchronisation services: mutex, condvar, barrier and semaphore.