= Virtual segments replication & distribution policy = [[PageOutline]] The replication / distribution policy of segments has two goals: enforce locality (as much as possible), and avoid contention (it is the main goal). To actually control data placement on the physical memory banks, the kernel uses the paged virtual memory MMU to map a virtual segment to a given physical memory bank in a given cluster. A '''vseg''' is a contiguous memory zone in the process virtual space, where all adresses in this '''vseg''' can be accessed by the process without segmentation violation: if the corresponding is not mapped, the page fault will be handled by the kernel, and a physical page will be dynamically allocated and initialized if required. A '''vseg''' always contains an integer number of pages. Depending on its type, a '''vseg''' has some specific attributes regarding access rights, replication policy, and distribution policy. The '''vseg''' API is defined in the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vseg.h almos_mk/kernel/mm/vseg] and [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vseg.c almos-mkh/kernel/mm/vseg.c] files. To avoid contention, almos-mkh replicates, for each process P, the process descriptor in all clusters containing at least one thread of P, and these clusters are called active clusters. The virtual memory manager VMM(P,K) of process P in cluster K, contains two main structures: * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vmm.h VSL(P,K)] is the list of all vsegs registered for process P in cluster K, * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vmm.h GPT(P,K)] is the generic page table, defining the actual physical mapping of those vsegs. The Virtual Memory Manager API is defined in the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vmm.h almos_mkh/kernel/mm/vmm.h] and [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vmm.c almos-mkh/kernel/mm/vmm.c] files. == __1. User segments types__ == * A vseg is '''public''' when it must be accessed by any thread T of the process, whatever the cluster running the thread T. It is '''private''' when it needs only to be accessed by the threads running in the cluster containing the physical memory bank where this vseg is mapped. A '''private''' vseg is entirely mapped in one single cluster K. * For a '''public''' vseg, ALMOS-MKH implements a global mapping : In all clusters, a given virtual address is mapped to the same physical address. For a '''private''' vseg, ALMOS-MKH implements a local mapping : the same virtual address can be mapped to different physical addresses, in different clusters. * A '''public''' vseg can be '''localized''' (all vseg pages are mapped in the same cluster), or '''distributed''' (different pages are mapped on different clusters, using the virtual page number (VPN) least significant bits as distribution key). A '''private''' vseg is always '''localized'''. ALMOS-MK defines six vseg types in [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vseg.h#L41 vseg_type_t]: || Type || || || Access || Replication || Placement || Allocation policy in user space || || STACK || private || localized || Read Write || one physical mapping per thread || same cluster as thread using it || dynamic (one stack allocator per cluster) || || CODE || private || localized || Read Only || one physical mapping per cluster || same cluster as thread using it || static (defined in .elf file) || || DATA || public || distributed || Read Write || same mapping for all threads || distributed on all clusters || static (defined in .elf file) || || ANON || public || localized || Read Write || same mapping for all threads || same cluster as calling thread || dynamic (one heap allocator per process || || FILE || public || localized || Read Write || same mapping for all threads || same cluster as the file cache || dynamic (one heap allocator per process) || || REMOTE || public || localized || Read Write || same mapping for all threads || cluster defined by user || dynamic (one heap allocator per process) || 1. '''CODE''' : This private vseg contains the user application code. ALMOS-MK creates one CODE vseg per active cluster. For a process P, the CODE vseg is registered in the VSL(P,Z) when the process is created in reference cluster Z. In the other clusters X, the CODE vseg is registered in VSL(P,X) when a page fault is signaled by a thread of P running in cluster X. In each active cluster X, the CODE vseg is localized, and physically mapped in cluster X. 1. '''DATA''' : This vseg contains the user application global data. ALMOS-MK creates one single DATA vseg per process, that is registered in the reference VSL(P,Z) when the process P is created in reference cluster Z. In the other clusters X, the DATA vseg is registered in VSL(P,X) when a page fault is signaled by a thread of P running in cluster X. To avoid contention, this vseg is physically distributed on all clusters, with a page granularity. For each page, the physical mapping is defined by the LSB bits of the page VPN. 1. '''STACK''' : This private vseg contains the execution stack of a thread. For each thread T of process P running in cluster X, ALMOS_MK creates one STACK vseg. This vseg is registered in the VSL(P,X) when the thread descriptor is created in cluster X. To enforce locality, this vseg is physically mapped in cluster X. 1. '''ANON''' : This type of vseg is dynamically created by ALMOS-MK to serve an anonymous [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c#L38 mmap()] system call executed by a client thread running in a cluster X. The first vseg registration and the physical mapping are done by the reference cluster Z, but the vseg is mapped in the client cluster X. 1. '''FILE''' : This type of vseg is dynamically created by ALMOS-MK to serve a file based mmap() system call executed by a client thread running in a cluster X. The first vseg registration and the physical mapping are done by the reference cluster Z, but the vseg is mapped in cluster Y containing the file cache. 1. '''REMOTE''' : This type of vseg is dynamically created by ALMOS-MK to serve a remote mmap() system call where a client thread running in a cluster X requests to create a new vseg mapped in another cluster Y. The first vseg registration and the physical mapping are done by the reference cluster Z, but the vseg is mapped in cluster Y specified by the user. The replication of the VSL(P,K) and GPT(P,K) kernel structures creates a coherence problem for the non private vsegs. * A VSL(P,K) contains all private vsegs in cluster K, but contains only the public vsegs that have been actually accessed by a thread of P running in cluster K. Only the '''reference''' process descriptor stored in the reference cluster Z contains the complete list VSL(P,Z) of all public vsegs for the P process. * A GPT(P,K) contains all mapped entries corresponding to private vsegs. For public vsegs, it contains only the entries corresponding to pages that have been accessed by a thread running in cluster K. Only the reference cluster Z contains the complete GPT(P,Z) page table of all mapped entries for process P. Therefore, the process descriptors - other than the reference one - can be considered as read-only caches. When a given vseg or a given entry in the page table must be removed by the kernel, this modification must be done first in the reference cluster, and broadcast to all other clusters for update. ==__ 2. kernel segments types__== * The read-only segment containing the user code is replicated in all clusters where there is at least one thread using it. * The private segment containing the stack for a given thread is placed in the same cluster as the thread using it. * The shared segment containing the global data is distributed on all clusters as regularly as possible to avoid contention. * The segments dynamically allocated by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c] system call are placed as described below. == __ 2. 32 bits virtual space organisation__ == The virtual address space of an user process P is split in 5 fixed size zones, defined by configuration parameters in [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config.h]. Each zone contains one or several vsegs, as described below. === The ''kernel'' zone === It contains the ''kcode'' vseg (type KCODE), that must be mapped in all user process to support syscalls. It is located in the lower part of the virtual space, and starts a address 0. Its size cannot be less than a big page size (2 Mbytes for the TSAR architecture), because it will be mapped as one or several big pages. === The ''utils'' zone === It contains the two ''args'' and ''envs'' vsegs, whose sizes are defined by [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config. specific configuration parameters]. The ''args'' vseg (DATA type) contains the process main() arguments. The ''envs'' vseg (DATA type) contains the process environment variables. It is located on top of the '''kernel''' zone, and starts at address defined by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config.h CONFIG_VMM_ELF_BASE] parameter. === The ''elf'' zone === It contains the ''text'' (CODE type) and ''data'' (DATA type) vsegs, defining the process binary code and global data. The actual vsegs base addresses and sizes are defined in the .elf file and reported in the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/tools/arch_info/boot_info.h boot_info_t] structure by the boot loader. === The ''heap'' zone === It contains all vsegs dynamically allocated / released by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap()] / [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_munmap.c munmap()] system calls (i.e. FILE / ANON / REMOTE types). It is located on top of the '''elf''' zone, and starts at the address defined by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config.h CONFIG_VMM_HEAP_BASE] parameter. The VMM defines a specific MMAP allocator for this zone, implementing the ''buddy'' algorithm. The mmap( FILE ) syscall maps directly a file in user space. The user level ''malloc'' library uses the mmap( ANON ) syscall to allocate virtual memory from the heap and map it in the same cluster as the calling thread. Besides the standard malloc() function, this library implements a non-standard [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/libs/libalmosmkh/almosmkh.c remote_malloc()] function, that uses the mmap( REMOTE ) syscall to dynamically allocate virtual memory from the heap, and map it to a remote physical cluster. === The ''stack'' zone === It is located on top of the '''mmap''' zone and starts at the address defined by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config.h CONFIG_VMM_STACK_BASE] parameter. It contains an array of fixed size slots, and each slot contains one ''stack'' vseg. The size of a slot is defined by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config.h CONFIG_VMM_STACK_SIZE]. In each slot, the first page is not mapped, in order to detect stack overflows. As threads are dynamically created and destroyed, the VMM implements a specific STACK allocator for this zone, using a bitmap vector. As the ''stack'' vsegs are private (the same virtual address can have different mappings, depending on the cluster) the number of slots in the '''stack''' zone actually defines the max number of threads for given process in a given cluster.