Changes between Version 56 and Version 57 of replication_distribution


Ignore:
Timestamp:
Dec 20, 2019, 12:15:15 AM (5 years ago)
Author:
alain
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • replication_distribution

    v56 v57  
    1111== __1. General principles__ ==
    1212
    13 To actually control the placement of all these virtual segments on the physical memory banks, the kernel uses the paged virtual memory MMU to map a virtual segment to a given physical memory bank in a given cluster.
     13To actually control the placement of all these segments on the physical memory banks, the kernel uses the paged virtual memory MMU to map a virtual segment to a given physical memory bank in a given cluster.
    1414
    1515A '''vseg''' is a contiguous memory zone in the process virtual space, defined by the two (base, size) values. All adresses in this interval can be accessed without segmentation violation: if the corresponding page is not mapped, the page fault will be handled by the kernel, and a physical page will be dynamically allocated (and initialized if required). A '''vseg''' always occupies an integer number of pages, as a given page cannot be shared by two different vsegs.
    1616
    17 In all UNIX system (including almos-mkh), a '''vseg''' has some specific attributes defining access rights (readable, writable, executable, catchable, etc).
     17In all UNIX system (including almos-mkh), a '''vseg''' has some specific attributes defining access rights (readable, writable, executable, cachable, etc).
    1818But for almos-mkh, the vseg type defines also the replication and distribution policy:
    1919 * A vseg is '''public''' when it can be accessed by any thread T of the involved process, whatever the cluster running the thread T.  It is '''private''' when it can only be accessed by the threads running in the cluster containing the physical memory bank where this vseg is defined and mapped.
     
    5353For more details on the VMM implementation, the API is defined in the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vmm.h almos_mkh/kernel/mm/vmm.h] and [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vmm.c almos-mkh/kernel/mm/vmm.c] files.
    5454
    55 == __2. User vsegs__  ==
    56 
    57 This section describes the six types of user virtual segments and the associated replication / distribution policy defined and implemented by almost-mkh:
    58 
    59 === 2.1 CODE vsegs ===
    60 
    61 This '''private''' vseg contains the application code. It is replicated in all clusters. ALMOS-MK creates one CODE vseg per active cluster. For a process P, the CODE vseg is registered in the VSL(P,KREF) when the process is created in reference cluster KREF. In the other clusters K, the CODE vseg is registered in VSL(P,K) when a page fault is signaled by a thread of P running in cluster K. In each active cluster K, the CODE vseg is mapped in cluster K.
    62 
    63 === 2.2 DATA vseg ===
     55== __2. User segments__  ==
     56
     57This section describes the six types of user segments and the associated replication / distribution policy defined and implemented by almost-mkh:
     58
     59=== 2.1 CODE ===
     60
     61This '''private''' segment contains the application code. It is replicated in all clusters. ALMOS-MK creates one CODE vseg per active cluster. For a process P, the CODE vseg is registered in the VSL(P,KREF) when the process is created in reference cluster KREF. In the other clusters K, the CODE vseg is registered in VSL(P,K) when a page fault is signaled by a thread of P running in cluster K. In each active cluster K, the CODE vseg is mapped in cluster K.
     62
     63=== 2.2 DATA ===
    6464 
    65 This '''public''' vseg contains the user application global data. ALMOS-MK creates one single DATA vseg, that is registered in the reference VSL(P,KREF) when the process P is created in reference cluster KREF.  In the other clusters K, the DATA vseg is registered  in VSL(P,K) when a page fault is signaled by a thread of P running in cluster K. To avoid contention, this vseg is physically '''distributed''' on all clusters, with a page granularity. For each page, the physical mapping is defined by the LSB bits of the VPN.
    66 
    67 === 2.3 STACK vseg ===
    68 
    69 This '''private''' vseg contains the execution stack of a thread. Almos-mkh creates one STACK vseg for each thread of P running in cluster K. This vseg is registered in the VSL(P,K) when the thread descriptor is created in cluster K. To enforce locality, this vseg is of course mapped in cluster K.
    70 
    71 === 2.4 ANON vseg ===
    72 
    73 This '''public''' vseg is dynamically created by ALMOS-MK to serve an anonymous [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap] system call executed by a client thread running in a cluster K. The vseg is registered in VSL(P,KREF), but the vseg is mapped in the client cluster K.
    74 
    75 === 2.5 FILE vseg ===
    76 
    77 This '''public''' vseg is dynamically created by ALMOS-MK to serve a file based [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap] system call executed by a client thread running in a cluster K. The vseg is registered in VSL(P,KREF), but the vseg is mapped in cluster Y containing the file cache.
    78 
    79 === 2.6 REMOTE vseg ===
    80 
    81 This '''public''' vseg is dynamically created by ALMOS-MK to serve a remote [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap] system call, where a client thread running in a cluster X requests to create a new vseg mapped in another cluster Y. The vseg is registered in VSL(P,KREF), but the vseg is mapped in cluster Y specified by the user.
     65This '''public''' segment contains the user application global data. ALMOS-MK creates one single DATA vseg, that is registered in the reference VSL(P,KREF) when the process P is created in reference cluster KREF.  In the other clusters K, the DATA vseg is registered  in VSL(P,K) when a page fault is signaled by a thread of P running in cluster K. To avoid contention, this vseg is physically '''distributed''' on all clusters, with a '''page''' granularity : Two contiguous pages are generally stored in two different clusters, as the physical mapping is defined by the LSB bits of the VPN.
     66
     67=== 2.3 STACK ===
     68
     69This '''private''' segment contains the execution stack of a thread. Almos-mkh creates one STACK vseg for each thread of P running in cluster K. This vseg is registered in the VSL(P,K) when the thread descriptor is dynamically created in cluster K. To enforce locality, this vseg is of course physically mapped in cluster K.
     70
     71=== 2.4 ANON ===
     72
     73This '''public''' segment is dynamically created by ALMOS-MK to serve an anonymous [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap] system call executed by a client thread running in a cluster K. The vseg is registered in VSL(P,KREF), but the vseg is mapped in the client cluster K.
     74
     75=== 2.5 FILE ===
     76
     77This '''public''' segment is dynamically created by ALMOS-MK to serve a file based [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap] system call executed by a client thread running in a cluster K. The vseg is registered in VSL(P,KREF), but the vseg is mapped in cluster Y containing the file cache.
     78
     79=== 2.6 REMOTE ===
     80
     81This '''public''' segment is dynamically created by ALMOS-MK to serve a remote [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap] system call, where a client thread running in a cluster X requests to create a new vseg mapped in another cluster Y. The vseg is registered in VSL(P,KREF), but the vseg is mapped in cluster Y specified by the user.
    8282
    8383=== 2.7 summary ===
    8484
    85 This table summarize the replication, distribution & mapping rules for user vsegs:
     85This table summarize the replication, distribution & mapping rules for user segments:
    8686
    8787|| Type        ||              ||                  ||    Access     ||    Replication     ||  Mapping in physical space       ||  Allocation policy in virtual  space              ||
     
    9494
    9595
    96 == __ 3. kernel vsegs__==
    97 
    98 For any process descriptor P in a cluster K, the VMM(P,K) contains not only the user vsegs defined above, but also the kernel vsegs, because all user theads can make system calls, that must access both the kernel instructions and the kernel data structures, and this requires address translation. This section describes the four types of kernel virtual segments defined by almost-mkh.
    99 
    100 === 3.1. KCODE vsegs ===
    101 
    102 A '''KCODE''' vseg contains the kernel code defined in the ''kernel.elf'' file.  To avoid contention and improve locality, almos-mkh replicates this code in all clusters. This code has already been copied in all clusters by the bootloader. In each cluster K, and for all process P in cluster K (including the kernel process_zero), almos-mkh registers the KCODE vseg in all VSL(P,K), and map it to the local copy in all the GPT(P,K). This vseg uses only big pages, and there is no on-demand paging for this type of vseg. With this local mapping all access to the virtual instruction addresses will be simply translated by the MMU to the local physical address.
    103 
    104 '''WARNING''' : there is only one vseg defined in the ''kernel.elf'' file, but there is as many KCODE vsegs as the number of clusters. All these vsegs have the same virtual base address and the same size. but the physical adresses (defined in the GPTs) depend on the cluster, because we want to access the local copy.
     96== __ 3. kernel segments__==
     97
     98An user thread makes system calls to access protected resources. Therefore,  the VMM(P,K) of a process descriptor P in a cluster K, must contains not only the user segments defined above, but also the kernel segments, to allow the user threads to access - after a syscall - the kernel code and the kernel data structures : Both the user segments virtual adresses, and the kernel segments virtual adresses must be translated.
     99
     100Almost-mkh defines 3 types of these kernel segments described below. To avoid contention and improve locality,these four segments are replicated or distributed in each cluster.
     101
     102=== 3.1. KCODE ===
     103
     104The '''KCODE''' segment contains the kernel code defined in the ''kernel.elf'' file.  To avoid contention and improve locality, almos-mkh '''replicates''' this code in all clusters. This code has already been copied in all clusters by the bootloader. In each cluster K, and for each process P in cluster K (including the kernel process_zero), almos-mkh registers the KCODE vseg in all VSL(P,K), and map it to the local copy in all the GPT(P,K). This vseg uses only big pages, and there is no on-demand paging for this type of vseg. With this local mapping all access to the virtual instruction addresses will be simply translated by the MMU to the local physical address.
     105
     106'''WARNING''' : there is only one segment defined in the ''kernel.elf'' file, but there is as many KCODE vsegs as the number of clusters. All these vsegs have the same virtual base address and the same size, but the physical adresses (defined in the GPTs) depend on the cluster, because we want to access the local copy.
    105107This is not a problem because A KCODE vseg is a ''private'' vseg, that is accessed only by local threads.
    106108
    107 === 3.2. KDATA vsegs ===
    108 
    109 A '''KDATA''' vseg contains the kernel global data,  statically allocated at compilation time, and defined in the ''kernel.elf'' file. To avoid contention and improve locality, almos-mkh replicates the KDATA vseg in all clusters. The corresponding data have already been copied in all clusters.  As a physical copy of the KDATA vseg is available in any cluster K, almos-mkh can register this vseg in all VSL(P,K), and map it to this local copy in all GPT(P,K). With this local mapping we expect that most accesses to any KDATA segment will be done by a local thread.
    110 
    111 
    112 '''WARNING''' : there is only one vseg defined in the ''kernel.elf'' file, and there is as many KDATA vsegs as the number of clusters. All these vsegs have the same virtual base address and the same size, but the physical addresses (defined in the  GPTs), depends on the cluster, because we generally want to access the local copy.
    113 This is a problem, because there is two  big differences between the KCODE and the KDATA vsegs :
    114 1. The values contained in the N KDATA vsegs are initially identical, as they are all defined by the same ''kernel.elf'' file. But they are not read-only,  and can evolve differently in different clusters.
    115 1. The N KDATA vsegs are ''public'', and define an addressable storage space N times larger than one single KDATA vseg.  Even if most accesses are local, a thread running in cluster K must be able to access a global variable stored in another cluster X, or to send a request to another kernel instance in cluster X, or to scan a globally distributed structure, such as the DQDT or the VFS.
    116 To support this inter-cluster kernel-to-kernel communication, almos-mkh defines the ''hal_remote_load( cxy , ptr )'' and ''hal_remote_store( cxy , ptr ) functions, where ''ptr'' is a normal pointer (in kernel virtual space) on a variable stored in the KDATA vseg, and ''cxy'' is the remote cluster identifier. Notice that a given global variable is now identified by and extended pointer ''XPTR( cry , ptr )''. With these remote access primitives, any kernel instance in cluster K can access any global variable in any cluster. Notice that local accesses can use the normal pointers in virtual kernel space, as the virtual adresses will be simply translated by the MMU to the local physical address.
    117 
    118 In other words, almost-mkh clearly distinguish the ''local accesses'', that can use standard pointers, from the ''remote access'' that '''must''' extended pointers.
    119 This can be seen as a bad constraint, but it can also help to imp to improve the locality, and to identify (and remove) the contention.
     109=== 3.2. KDATA ===
     110
     111The '''KDATA''' segment contains the kernel global data,  statically allocated at compilation time, and defined in the ''kernel.elf'' file. To avoid contention and improve locality, almos-mkh defines a KDATA vseg in each cluster. The corresponding data have already been copied by the boot-loader in all clusters. 
     112
     113'''WARNING''' : there is only one segment defined in the ''kernel.elf'' file, but there is as many KDATA vsegs as the number of clusters. All these vsegs have the same virtual base address and the same size, but the physical addresses (defined in the  GPTs), depends on the cluster, because we generally want to access the local copy.
     114This seems very similar to the KCODE replication, but there is two  big differences between the KCODE and the KDATA segments :
     1151. The values contained in the N KDATA vsegs are initially identical, as they are all defined by the same ''kernel.elf'' file. But they are not read-only,  and will evolve differently in different clusters.
     1161. The N KDATA vsegs are ''public'', and can be accessed by any instance of the kernel running in any cluster.  Even if most accesses are local, a thread running in cluster K must be able to access a global variable stored in another cluster X, or to send a request to another kernel instance in cluster X, or to scan a globally distributed structure, such as the DQDT or the VFS.
     117
     118To allows any thread running in any cluster to access the N KDATA vsegs, almos-mkh can register these N vsegs in all VSL(P,K), and map them in all GPT(P,K).
     119
     120=== 3.3. KHEAP  ===
     121
     122The '''KHEAP''' segment contains, in each cluster K, the  kernel structures dynamically allocated by the kernel in cluster K to satisfy the users requests (such as the ''process'' descriptors, the ''thread'' descriptors, the ''vseg'' descriptors, the ''file'' descriptors, etc.). To avoid contention and improve locality, almos-mkh defines one KHEAP segment in each cluster implementing a ''physically distributed  kernel heap''. In each cluster, this KHEAP segment contains actually all physical memory that is not already allocated to store the KCODE and KDATA segments.
     123
     124'''WARNING''' :  most of these structures are locally allocated in cluster K by a thread running in cluster K, and are mostly accessed by the threads running in cluster K. But these structures are global variables : they can be created in any cluster, by any thread running in any other cluster, and can be accessed by any thread executing kernel code in any other cluster. 
     125
     126To allows any thread running in any cluster to access the N KHEAP vsegs, almos-mkh can register these N vsegs in all VSL(P,K), and map them in all GPT(P,K).
     127
     128=== 3.4 KSTACK ===
     129
     130Any thread entering the kernel to execute a system call needs a kernel stack, that must be in the protected memory.
     131This requires as many kernel stacks as the total number of threads (user threads + dedicated kernel threads) in the system.
     132For each thread, almos-mkh implements the kernel stack in the thread descriptor, that is dynamically allocated in the KHEAP segment, when the thread is created.
     133Therefore, there is no specific KSTACK segment for the kernel stacks.
     134
     135=== 3.5 Local & Remote accesses ===
     136
     137Almos-mkh defines two different policies to access the shared data stored in the N  KDATA and KHEAP segments :
     138 * The local accesses to the local kernel structures can use normal pointers that will be translated by the MMU to local physical adresses.
     139 * The remote access to remotely allocated kernel structures '''must''' use  the ''hal_remote_load( cxy , ptr )'' and ''hal_remote_store( cxy , ptr ) functions, where ''ptr'' is a normal pointer on a variable stored in the KDATA or KHEAP vseg, and ''cxy'' is the remote cluster identifier. Notice that a given kernel global variable is now identified by and extended pointer ''XPTR( cxy , ptr )''. With these remote access primitives, any kernel instance in cluster K can access any global variable in any cluster.
     140
     141In other words, almost-mkh clearly distinguish the ''local accesses'', that can use standard pointers, from the ''remote access'' that must use extended pointers.
     142This can be seen as a bad constraint, but it clearly to improve locality and avoid  contention .
    120143
    121144The remote_access primitives API is defined in the  [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/hal/generic/hal_remote.h  almos_mkh/hal/generic/hal_remote.h] file.
    122145
    123 === 3.3. KHEAP vsegs ===
    124 
    125 The '''KHEAP''' vsegs are used to store dynamically allocated of kernel structures, such as the user ''process'' descriptors, the ''thread'' descriptors, the ''vseg'' descriptors, the ''file'' descriptors, etc. These structure can be requested by any thread running in any cluster, and are defined as global variables, that can be accessed by any thread. To avoid contention and improve locality, almos-mkh build a ''physically distributed  kernel heap''. In each cluster K, almos-mkh  can register this HEAP vseg in all VSL(P,K), and map it in all the GPT(P,K). with is local mapping, a kernel structure requested by a thread running in any cluster will always be allocated in the local physical memory.
    126 
    127 WARNING : To unify the access to remote data (i.e. data stored in a remote cluster), almos-mkh use the same policy for KHEAP and KDATA vsegs: all KHEAP segments have the same virtual base address. The local accesses to the locally allocated kernel structures can use normal pointers that will be translated by the MMU to local physical adresses. The remote access to remotely allocated kernel structures must use the ''remote_load()'' and ''remote_store()'' functions and handle extended pointers.
    128 
    129 === 3.4. KDEV vsegs ===
    130 
    131 Finally the '''KDEV''' vsegs are associated to the peripheral. There is one KDEV vseg per chdev (i.e. per channel device.
    132 
    133 
    134 
    135 
    136 == __4. Address Translation for kernel vsegs__ ==
    137 
    138 The detailed implementation of the virtual to physical address translation depends on the target architecture.
    139 
    140 
    141 === 4.1 TSAR-MIPS32 ===
    142 
    143 As the TSAR architecture uses 32 bits cores, to reduce the power consumption, the virtual space is bounded to 4 Gbytes.
    144 
    145 But the TSAR architecture provides two non standard, but very useful features to simplify the virtual to physical address translation for kernel vsegs :
     146The implementation of this API for the TSAR architecture is defined in the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/hal/tsar_mip32/core/hal_remote.c almos_mkh/hal/tsar_mips32/core/hal_remote.c] file.
     147
     148== 4. Remote accesses implementation ==
     149
     150The detailed implementation of the remote access functions described  in section 3.5 depends on the target architecture.
     151
     152=== 4.1 Intel 64 ===
     153
     154On hardware architecture using 64 bits core, the virtual space is generally much larger than the physical space.
     155The actual size of the virtual space is 256 Tbytes (virtual adresses are bounded to 48 bits) on Intel based multi-cores servers.
     156It is therefore possible to map all segments described above in the virtual space:
     157 * the users segments defined in section 2. are accessed with normal pointers. They can be mapped in the user land (lower half of the 256 Tbytes).
     158 * the three local kernel segments (KCODE, KDATA, KHEAP) defined in section 3. are accessed with normal pointers. They can be mapped in the kernel land( upper half of the 256 Tbytes).
     159 * The N distributed KDATA and  the N KHEAP segments are accessed using extended pointers XPTR(cxy,ptr). These 2N segments can also be mapped in the kernel land, and the translation from the  extended to a normal pointer can be done by the remote_load(), and remote_store() functions.
     160In all cases, the Intel64 MMU is in charge to translate the virtual address (defined by the normal pointer) to the relevant physical address. 
     161
     162=== 4.2 TSAR-MIPS32 ===
     163
     164The TSAR architecture uses 32 bits cores, to reduce the power consumption.
     165This creates a big problem to access the remote KDATA and KHEAP segments : with 1 Gbytes of physical memory per cluster, and 256 cluster, the total physical space covered by the N KHEAP segments is 256 Gbytes. This is much larger than the 4 Gbytes virtual space addressable by a 32 bits virtual address.
     166The consequence is very simple:  we cannot use the MIPS_32 MMU to make the virtual to physical address translation to access the KDATA and KHEAP segments.
     167
     168But the TSAR architecture provides two useful features to simplify the traduction from an extended pointer XPTR(cxy,ptr) to a 40 bits physical address :
    1461691. The TSAR 40 bits physical address has a specific format : it is the concaténation of an 8 bits CXY field, and a 32 bits LPADDR field, where the CXY defines
    147170the cluster identifier, and the LPADDR is the local physical address inside the cluster.
    1481711. the MIPS32 core used BY the TSAR architecture defines, besides the standard MMU, another non-standard,hardware mechanism for address translation : A 40  bits physical address is simply build by appending to each 32 bits virtual address a 8 bits extension contained in a software controllable register, called DATA_PADDR_EXT.
    149172
    150 In the TSAR architecture, and for any process P in any cluster K, almost-mkh registers only one extra KCODE vseg in the VMM[P,K), for kernel adressing, because almos-mkh uses the INST-MMU for instruction addresses translation, but does NOT not use the DATA-MMU for data addresses translation : When a core enters the kernel, the DATA-MMU is deactivated, and it is only reactivated when the core returns to user code.
    151 
    152 When the value contained in the extension register is the local cluster identifier, any local kernel structures stored in the KDATA or the KHEAP segments is accessed by using directly the local physical addresses (identity mapping).
    153 To access a remote kernel structure, almost-mkh '''must''' use the hardware architecture dependent remote access functions presented in section C. For the TSAR architecture these load/store functions simply modify the extension register DATA_PADDR_EXT before the memory access, and restore it after the memory access.
    154 
    155 This pseudo identity mapping impose some constraints on the KCODE and the KDATA segments when compiling the kernel.
     173In the TSAR architecture, and for any process P in any cluster K, almost-mkh registers only one extra KCODE vseg in the VMM[P,K), because almos-mkh uses the INST-MMU for instruction addresses translation, but does NOT not use the DATA-MMU for data addresses translation : When a core enters the kernel, the DATA-MMU is deactivated, and it is only reactivated when the core returns to user code.
     174
     175In the TSAR implementation, the default value contained in the DATA_PADDR_EXT register is the (ocal_cxy), to access the local physical memory. For remote accesses, the remote_load() and remote_store()  functions set the extension register DATA_PADDR_EXT to the target (cxy) before the remote access, and restore it to (local_cx) after the remote access.
     176
     177The price to pay for this physical addressing is to precisely control the KCODE and the KDATA segments when compiling the kernel.
    156178
    157179The implementation of the hal_remote_load() and hal_remote_store() functions for the TSAR architecture is available in the  [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/hal/tsar_mips32/core/hal_remote.c  almos_mkh/hal/tsar_mips32/core.hal_remote.c] file.
    158180
    159 
    160 === 4.2 Intel 64 bits ===
    161 
    162 TODO
    163 
    164181== __5. Virtual space organisation__ ==
    165182
    166 This section describes the almost-mkh assumptions regarding the virtual space organisation, that is strongly dependent on the size of the virtual space. 
     183This section describes the almost-mkh assumptions regarding the virtual space organisation. It clearly depends on the size of the virtual space. 
    167184
    168185=== 5.1 TSAR-MIP32 ===
     
    194211It is located on top of the '''mmap''' zone and starts at the address defined by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config.h CONFIG_VMM_STACK_BASE] parameter. It contains an array of fixed size slots, and each slot contains one ''stack'' vseg. The size of a slot is defined by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config.h CONFIG_VMM_STACK_SIZE]. In each slot, the first page is not mapped, in order to detect stack overflows. As threads are dynamically created and destroyed, the VMM implements a specific STACK allocator for this zone, using a bitmap vector. As the ''stack'' vsegs are private (the same virtual address can have different mappings, depending on the cluster) the number of slots in the '''stack''' zone actually defines the max number of threads for given process in a given cluster.
    195212
    196 === 5.2 Intel 64 bits ===
     213=== 5.2 Intel 64 ===
    197214
    198215TODO