Changes between Version 52 and Version 53 of replication_distribution


Ignore:
Timestamp:
Dec 9, 2019, 4:19:53 PM (5 years ago)
Author:
alain
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • replication_distribution

    v52 v53  
    77The data to be placed are the virtual segments defined - at compilation time - in the virtual space of the various user processes currently running, or in the virtual space of the operating system itself. 
    88
     9==__1. general principles__==
     10
    911To actually control the placement of all these virtual segments on the physical memory banks, the kernel uses the paged virtual memory MMU to map a virtual segment to a given physical memory bank in a given cluster.
    1012
    11 A '''vseg''' is a contiguous memory zone in the process virtual space, defined by the two (base, size) values. All adresses in this interval can be accessed by in this process without segmentation violation: if the corresponding is not mapped, the page fault will be handled by the kernel, and a physical page will be dynamically allocated (and initialized if required). A '''vseg''' always occupies always an integer number of pages, as a given page cannot be shared by two different vsegs.
     13A '''vseg''' is a contiguous memory zone in the process virtual space, defined by the two (base, size) values. All adresses in this interval can be accessed without segmentation violation: if the corresponding page is not mapped, the page fault will be handled by the kernel, and a physical page will be dynamically allocated (and initialized if required). A '''vseg''' always occupies an integer number of pages, as a given page cannot be shared by two different vsegs.
    1214
    13 Depending on its type, a '''vseg''' has some specific attributes regarding access rights, and defining the replication and/or distribution policy:
     15In all UNIX system (including almos-mkh), a '''vseg''' has some specific attributes defining access rights (readable, writable, executable, catchable, etc).
     16But for almos-mkh, the vseg type defines also the replication and distribution policy:
    1417 * A vseg is '''public''' when it can be accessed by any thread T of the involved process, whatever the cluster running the thread T.  It is '''private''' when it can only be accessed by the threads running in the cluster containing the physical memory bank where this vseg is defined and mapped.
    1518 * For a '''public''' vseg, ALMOS-MKH implements a global mapping : In all clusters, a given virtual address is mapped to the same physical address. For a '''private''' vseg, ALMOS-MKH implements a local mapping : the same virtual address can be mapped to different physical addresses, in different clusters.
    1619 * A '''public''' vseg can be '''localized''' (all vseg pages are mapped in the same cluster), or '''distributed''' (different pages are mapped on different clusters). A '''private''' vseg is always '''localized'''.
    1720
    18 To avoid contention, in case of parallel applications defining a large number of threads in one single process P, almos-mkh replicates, the process descriptor in all clusters containing at least one thread of P, and these clusters are called active clusters. Each process descriptor contains a VMM structure (Virtual Memory Manager)
    19 that register all informations required by the MMU to make the address translation for the process. For a process P in cluster K, the VMM(P,K) structure, contains two main sub-structures:
    20  * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vmm.h VSL(P,K)] is the list of all vsegs registered for process P in cluster K,
    21  * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vmm.h GPT(P,K)] is the generic page table, defining the actual physical mapping of these vsegs.
     21The '''vseg''' structure and API is defined in the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vseg.h almos_mk/kernel/mm/vseg] and [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vseg.c almos-mkh/kernel/mm/vseg.c] files.
     22
     23In all UNIX systems, the process descriptor contains the table used by the MMU to make the virtual to physical address translation. An important feature of almos-mkh is the following: To avoid contention, in parallel applications creating a large number of threads in one single process P, almos-mkh replicates, the process descriptor in all clusters containing at least one thread of this process. These clusters are called ''active'' clusters.
     24
     25In almos-mkh, the structure used by the MMU for address translation is called VMM (Virtual Memory Manager).
     26For a process P in cluster K, the '''VMM(P,K)''' structure, contains two main sub-structures:
     27 * The '''VSL(P,K)''' is the list of virtual segments registered for process P in cluster K,
     28 * The '''GPT(P,K)''' is the generic page table, defining the actual physical mapping for each page of each vseg.
    2229
    2330For a given process P, the different VMM(P,K)  in different clusters can have different contents for several reasons :
     
    2633 1. Similarly, the mapping of a given virtual page VPN of a given vseg (i.e. the allocation of a physical page PPN to a virtual page VPN, and the registration of this PPN in the GPT(P,K) is ''on demand'': the page table entry will be updated in the GPT(P,K) only when a thread of process P in cluster K try to access this VPN. 
    2734 
    28 The replication of the VSL(P,K) and GPT(P,K) kernel structures in several clusters creates a coherence problem for the ''public'' vsegs:
    29  * A VSL(P,K) contains all private vsegs in cluster K, but contains only the public vsegs that have been actually accessed by a thread of P running in cluster K. Only the '''reference''' process descriptor stored in the reference cluster KREF contains the complete list VSL(P,KREF) of all public vsegs for the P process.
    30  * A GPT(P,K) contains all mapped entries corresponding to private vsegs but  for public vsegs, it contains only the entries corresponding to pages that have been accessed by a thread running in cluster K. Only the reference cluster KREF contains the complete GPT(P,KREF) of all mapped entries of public vsegs for process P.
     35We have the following properties for the '''private''' vsegs:
     36 * the VSL(P,K) contains always all private vsegs in cluster K,
     37 * The GPT(P,K) contains all mapped entries corresponding to a private vseg in cluster K.
    3138
    32 Therefore, almos-mkh defines the following rules :
     39We have the following properties for the '''public''' vsegs:
     40 * the VSL(P,K) contains only the public vsegs that have been actually accessed by a thread of P running in cluster K.
     41 * Only the reference cluster KREF contains the complete VSL(P,KREF) of all public vsegs for the P process.
     42 * The GPT(P,K) contains only  the entries that have been accessed by a thread running in cluster K.
     43 * Only the reference cluster KREF contains the complete GPT(P,KREF) of all mapped entries of public vsegs for the P process.
    3344
    34 For the '''public''' vsegs, the VMM(P,K) structures - other than the reference one - can be considered as read-only caches.
    35 When a given vseg or a given entry in the page table must be removed by the kernel, this modification must be done first in the reference cluster, and broadcast to all other clusters for update.
    36 When a miss is detected in a non-reference cluster, the reference VMM(P,KREF) must be accessed first to check a possible ''false segmentation fault'' or a false page fault''.
     45For the '''public''' vsegs,  the VMM(P,K) structures - other than the reference one - can be considered as local caches.
     46This creates a coherence problem, that is solved by the following rules :
     471. For the '''private''' vsegs, and  the corresponding entries in the page table,  the VSL(P,K) and the GPT(P,K) are only shared by the threads of P running in cluster K, and these structures can be privately handled by the local kernel instance in cluster K.
     481. When a given public vseg in the VSL, or a given entry in the GPT must be removed or modified, this modification must be done first in the reference cluster, and broadcast to all other clusters for update of local VSL or GPT copies.
     491. When a miss is detected in a non-reference cluster, the reference VMM(P,KREF) must be accessed first to check a possible ''false segmentation fault'' or a 'false page fault''.
    3750
    38 For the '''private''' vsegs, and  the corresponding entries in the page table,  the VSL(P,K) and the GPT(P,K) are only shared by the threads of P running in cluster K, and these structures can be privately handled by the local kernel instance in cluster K.
     51For more details on the VMM implementation, the API is defined in the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vmm.h almos_mkh/kernel/mm/vmm.h] and [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vmm.c almos-mkh/kernel/mm/vmm.c] files.
    3952
    40 For more details on implementation:
     53== __2. User segments__  ==
    4154
    42 The '''vseg''' API is defined in the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vseg.h almos_mk/kernel/mm/vseg] and [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vseg.c almos-mkh/kernel/mm/vseg.c] files.
     55This section describes the six types of user virtual segments and the associated replication / distribution policy defined and implemented by almost-mkh:
    4356
    44 The Virtual Memory Manager API is defined in the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vmm.h almos_mkh/kernel/mm/vmm.h] and [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vmm.c almos-mkh/kernel/mm/vmm.c] files.
    45 
    46 == __1. User segments__  ==
    47 
    48 This section describes the six types of user virtual segments defined by almost-mkh:
    49 
    50 === 1.1 CODE vsegs ===
     57=== 2.1 CODE vsegs ===
    5158
    5259This '''private''' vseg contains the application code. It is replicated in all clusters. ALMOS-MK creates one CODE vseg per active cluster. For a process P, the CODE vseg is registered in the VSL(P,KREF) when the process is created in reference cluster KREF. In the other clusters K, the CODE vseg is registered in VSL(P,K) when a page fault is signaled by a thread of P running in cluster K. In each active cluster K, the CODE vseg is mapped in cluster K.
    5360
    54 === 1.2 DATA vseg ===
     61=== 2.2 DATA vseg ===
    5562 
    5663This '''public''' vseg contains the user application global data. ALMOS-MK creates one single DATA vseg, that is registered in the reference VSL(P,KREF) when the process P is created in reference cluster KREF.  In the other clusters K, the DATA vseg is registered  in VSL(P,K) when a page fault is signaled by a thread of P running in cluster K. To avoid contention, this vseg is physically '''distributed''' on all clusters, with a page granularity. For each page, the physical mapping is defined by the LSB bits of the VPN.
    5764
    58 === 1.3 STACK vseg ===
     65=== 2.3 STACK vseg ===
    5966
    6067This '''private''' vseg contains the execution stack of a thread. Almos-mkh creates one STACK vseg for each thread of P running in cluster K. This vseg is registered in the VSL(P,K) when the thread descriptor is created in cluster K. To enforce locality, this vseg is of course mapped in cluster K.
    6168
    62 === 1.4 ANON vseg ===
     69=== 2.4 ANON vseg ===
    6370
    6471This '''public''' vseg is dynamically created by ALMOS-MK to serve an anonymous [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap] system call executed by a client thread running in a cluster K. The vseg is registered in VSL(P,KREF), but the vseg is mapped in the client cluster K.
    6572
    66 === 1.5 FILE vseg ===
     73=== 2.5 FILE vseg ===
    6774
    6875This '''public''' vseg is dynamically created by ALMOS-MK to serve a file based [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap] system call executed by a client thread running in a cluster K. The vseg is registered in VSL(P,KREF), but the vseg is mapped in cluster Y containing the file cache.
    6976
    70 === 1.6 REMOTE vseg ===
     77=== 2.6 REMOTE vseg ===
    7178
    7279This '''public''' vseg is dynamically created by ALMOS-MK to serve a remote [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap] system call, where a client thread running in a cluster X requests to create a new vseg mapped in another cluster Y. The vseg is registered in VSL(P,KREF), but the vseg is mapped in cluster Y specified by the user.
    7380
    74 This table summarize the user vsegs features:
     81=== 2.7 summary ===
     82
     83This table summarize the replication, distribution & mapping rules for user vsegs:
    7584
    7685|| Type        ||              ||                  ||    Access     ||    Replication     ||  Mapping in physical space       ||  Allocation policy in virtual  space              ||
     
    8392
    8493
    85 == __ 2. kernel segments__==
     94== __ 3. kernel segments__==
    8695
    8796For any process descriptor P in a cluster K, the VMM(P,K) contains not only the user vsegs defined above, but also the kernel vsegs, because all user theads can make system calls, that must access both the kernel instructions and the kernel data structures, and this requires address translation. This section describes the four types of kernel virtual segments defined by almost-mkh.
    8897
    89 === 2.1. KCODE vsegs ===
     98=== 3.1. KCODE vsegs ===
    9099
    91100The KCODE vseg contains the kernel code defined in the ''kernel.elf'' file.  Almos-mkh creates one KCODE vseg in each cluster K, to avoid contention. It is a ''private'' vseg, that is accessed only by the threads running in cluster K. It can be an user thread executing a syscall, or it can be a specialized kernel thread (such as an IDLE thread, a DEV thread, or a RPC thread). In each cluster K, the KCODE vseg is registered in the VMM(0,K) associated to the kernel ''process_zero'', that contains all kernel threads, and in the VMM(P,K) of each user process P that has at least one thread running in cluster K. This vseg uses only big pages, that are mapped by the kernel_init function (no on demand paging for this vseg).
    92101
    93 === 2.2. KDATA vsegs ===
     102=== 3.2. KDATA vsegs ===
    94103
    95104This '''public''' vseg contains the global data,  statically allocated at compilation time. This vseg is also replicated in all clusters. The values initially contained in these KDATA vsegs are identical, as they are defined in the ''kernel.elf'' file. But they are not read-only,  and can evolve differently in different clusters.  As the KDATA vsegs are replicated in all clusters, most accesses to a KDATA segment are expected to be done by local threads. These local accesses can use the normal pointers in virtual kernel space.
     
    97106But there is only one vseg defined in the ''kernel.elf'' file, and there is as many KDATA segments as the number of clusters. Even if most accesses are local, a thread running in cluster K must be able to access a global variable stored in another cluster X, or to send a request to another kernel instance in cluster X, or to scan a globally distributed structure, such as the DQDT or the VFS. To support this cooperation between kernel instances, almos-mkh defines the ''remote_load( cxy , ptr )'' and ''remote_store( cxy , ptr ) functions, where ''ptr'' is a normal pointer in kernel virtual space on a variable stored in the KDATA vseg, and ''cxy'' is the remote cluster identifier. Notice that a given global variable is now identified by and extended pointer ''XPTR( cry , ptr )''. With these remote access primitives, any kernel instance in cluster K can access any global variable in any cluster.
    98107
    99 === 2.3. KHEAP vsegs ===
     108=== 3.3. KHEAP vsegs ===
    100109
    101110Beside the statically allocated global variables, a large number of kernel structures, such as the user ''process'' descriptors, the ''thread'' descriptors, the ''vseg'' descriptors, the ''file'' descriptors, etc. are dynamically descriptors
    102111
    103 === 2.4. KDEV vsegs ===
     112=== 3.4. KDEV vsegs ===
    104113
    105114
    106 ==__3. Physical mapping of kernel vsegs__ ==
     115== __4. Physical mapping of kernel vsegs__ ==
    107116
    108117The implementation of these remote access functions depends on the target architecture.
     
    117126
    118127
    119 === 2.1 TSAR-MIPS32 ===
     128=== 4.1 TSAR-MIPS32 ===
    120129
    121130As the TSAR architecture uses 32 bits cores, to reduce the power consumption, the virtual space is much  smaller (4 Gbytes) than the physical space.
     
    135144
    136145
    137 === 3.2 Intel 64 bits ===
     146=== 4.2 Intel 64 bits ===
    138147
    139148TODO
     
    141150== __4. virtual space organisation__ ==
    142151
    143 === 4.1 TSAR-MIP32 ===
     152This section describes the almost-mkh assumptions regarding the virtual space organisation, that is strongly dependent on the size of the virtual space. 
     153
     154=== 5.1 TSAR-MIP32 ===
    144155
    145156The virtual address space of an user process P is split in 5 fixed size zones, defined by configuration parameters in [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config.h].  Each zone contains one or several vsegs, as described below.
    146157
    147 '''4.1.1 The ''kernel'' zone'''
     158'''5.1.1 The ''kernel'' zone'''
    148159
    149160It contains the ''kcode'' vseg (type KCODE), that must be mapped in all user processes.
    150161It is located in the lower part of the virtual space, and starts a address 0. Its size cannot be less than a big page size (2 Mbytes for the TSAR architecture), because it will be mapped as one (or several big) pages.
    151162 
    152 '''4.1.2 The ''utils'' zone'''
     163'''5.1.2 The ''utils'' zone'''
    153164
    154165It contains the two ''args'' and ''envs'' vsegs, whose sizes are defined by [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config. specific configuration parameters].  The ''args'' vseg (DATA type) contains the process main() arguments.
     
    156167It is located on top of the '''kernel''' zone, and starts at address defined by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config.h  CONFIG_VMM_ELF_BASE] parameter.
    157168
    158 '''4.1.3 The ''elf'' zone'''
     169'''5.1.3 The ''elf'' zone'''
    159170
    160171It contains the ''text'' (CODE type) and ''data'' (DATA type) vsegs, defining the process binary code and global data. The actual vsegs base addresses and sizes are defined in the .elf file and reported in the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/tools/arch_info/boot_info.h boot_info_t] structure by the boot loader.
    161172
    162 '''4.1.4 The ''heap'' zone'''
     173'''5.1.4 The ''heap'' zone'''
    163174
    164175It contains all vsegs dynamically allocated / released  by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap] / [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_munmap.c munmap] system calls (i.e. FILE / ANON / REMOTE types).
    165176It is located on top of the '''elf''' zone, and starts at the address defined by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config.h CONFIG_VMM_HEAP_BASE]  parameter. The VMM defines a specific MMAP allocator for this zone, implementing the ''buddy'' algorithm. The mmap( FILE ) syscall maps directly a file in user space. The user level ''malloc'' library uses the mmap( ANON ) syscall to allocate virtual memory from the heap and map it in the same cluster as the calling thread. Besides the standard malloc() function, this library implements a non-standard [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/libs/libalmosmkh/almosmkh.c remote_malloc()] function, that uses the mmap( REMOTE ) syscall to dynamically allocate virtual memory from the heap, and map it to a remote physical cluster. 
    166177
    167 '''4.1.5 The ''stack'' zone'''
     178'''5.1.5 The ''stack'' zone'''
    168179
    169180It is located on top of the '''mmap''' zone and starts at the address defined by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config.h CONFIG_VMM_STACK_BASE] parameter. It contains an array of fixed size slots, and each slot contains one ''stack'' vseg. The size of a slot is defined by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config.h CONFIG_VMM_STACK_SIZE]. In each slot, the first page is not mapped, in order to detect stack overflows. As threads are dynamically created and destroyed, the VMM implements a specific STACK allocator for this zone, using a bitmap vector. As the ''stack'' vsegs are private (the same virtual address can have different mappings, depending on the cluster) the number of slots in the '''stack''' zone actually defines the max number of threads for given process in a given cluster.