Changes between Version 102 and Version 103 of processus_thread
- Timestamp:
- Jun 26, 2018, 2:57:25 PM (7 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
processus_thread
v102 v103 14 14 As ALMOS-MKH supports process migration, the '''reference cluster''' can be different from the '''owner cluster'''. The '''owner cluster''' cannot change (because the PID is fixed), but the '''reference cluster''' can change in case of process migration. 15 15 16 In each cluster K, the local cluster manager ( [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/cluster.h cluster_t] type in ALMOS-MKH ) contains a process manager ( [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/cluster.hpmgr_t] type in ALMOS-MKH ) that maintains three structures for all processes owned by K :17 * The '''PREF_TBL[lpid]'''is an array indexed by the local process index. Each entry contains an extended pointer on the reference process descriptor.18 * The '''COPIES_ROOT[lpid]'''array is also indexed by the local process index. Each entry contains the root of the global list of copies for each process owned by cluster K.19 * The '''LOCAL_ROOT'''is the local list of all process descriptors in cluster K. A process descriptor copy of P is present in K, as soon as P has a thread in cluster K.16 In each cluster K, the local cluster manager ( [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/cluster.h#L97 cluster_t] type in ALMOS-MKH ) contains a process manager ( [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/cluster.h#L75 pmgr_t] type in ALMOS-MKH ) that maintains three structures for all processes owned by K : 17 * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/cluster.h#L77 pref_tbl[lpid]] is an array indexed by the local process index. Each entry contains an extended pointer on the reference process descriptor. 18 * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/cluster.h#L85 copies_root[lpid]] array is also indexed by the local process index. Each entry contains the root of the global list of copies for each process owned by cluster K. 19 * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/cluster.h#L81 local_root] is the local list of all process descriptors in cluster K. A process descriptor copy of P is present in K, as soon as P has a thread in cluster K. 20 20 21 21 A process can be in four states: … … 25 25 * '''KILLED''' : the process received a SIGKILL signal. It will be destroyed by the parent process executing a wait() sys call. 26 26 27 You can find below a partial list of information stored in a process descriptor ( [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/process.h process_t] in ALMOS-MKH ):27 You can find below a partial list of information stored in a process descriptor ( [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/process.h#L111 process_t] in ALMOS-MKH ): 28 28 - '''PID''' : proces identifier. 29 29 - '''PPID''' : parent process identifier, … … 39 39 - '''CHILDREN_ROOT''' : root of global list of children process. 40 40 41 All elements of a ''local'' list are in the same cluster, so ALMOS-MKH uses local pointers. Elements of a ''global'' list can be distributed on all clusters, so ALMOS-MKH uses extended pointers. 41 All elements of a ''local'' list are in the same cluster, so ALMOS-MKH uses local pointers. Elements of a ''global'' list can be distributed on all clusters, so ALMOS-MKH uses extended pointers. 42 42 43 43 == __2) Thread definition__ == 44 44 45 ALMOS-MKH defines in [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h thread_type_t] four types of threads :45 ALMOS-MKH defines in [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h#L58 thread_type_t] four types of threads : 46 46 * one '''USR''' thread is created by a pthread_create() system call. 47 47 * one '''DEV''' thread is created by the kernel to execute all I/O operations for a given channel device. 48 48 * one '''RPC''' thread is activated by the kernel to execute pending RPC requests in the local RPC FIFO. 49 * the 49 * the '''IDL''' thread is executed when there is no other thread to execute on a core. 50 50 51 51 From the point of view of scheduling, a thread can be in three states : RUNNING, RUNNABLE or BLOCKED. 52 52 53 In a given process, a thread is identified by a fixed format TRDID kernel identifier, coded on 32 bits : The 16 MSB bits (CXY) define the cluster K where the thread has been created. The 16 LSB bits (LTID) define the thread local index in the local TH_TBL[K,P] of a process descriptor P in a cluster K. This LTID index is allocated by the local process descriptor when the thread is created.53 In a given process, a thread is identified by a fixed format TRDID kernel identifier, coded on 32 bits : The 16 MSB bits (CXY) define the cluster K where the thread has been created. The 16 LSB bits (LTID) define the thread local index in the local [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/process.h#L139 TH_TBL[K,P]] of a process descriptor P in a cluster K. This LTID index is allocated by the local process descriptor when the thread is created. 54 54 55 55 Therefore, the TH_TBL(K,P) thread table for a given process in a given cluster contains only the threads of P placed in cluster K. The set of all threads of a given process is defined by the union of all TH_TBL(K,P) for all active clusters K. … … 58 58 This implementation of ALMOS-MKH does not support thread migration: a thread is pinned on a given core in a given cluster. In the future process migration mechanism, all threads of given process in a given cluster can migrate to another cluster for load balancing. This mechanism is not implemented yet (february 2018), and will require to distinguish the kernel thread identifier (TRDID, that will be modified by a migration), and the user thread identifier (THREAD, that cannot be modified by a migration). In the current implementation, the user identifier (returned by the pthread_create() sys call) is identical to the kernel identifier. 59 59 60 You can find below a partial list of information stored in a thread descriptor ([https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h thread_t] in ALMOS-MKH):60 You can find below a partial list of information stored in a thread descriptor ([https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h#L134 thread_t] in ALMOS-MKH): 61 61 * '''TRDID''' : thread identifier 62 62 * '''TYPE''' : KERNEL / USER / IDLE / RPC … … 115 115 116 116 Any user thread T of any process P, running in any cluster K, can create a new thread NT in any cluster K'. This creation is initiated by the ''pthread_create'' system call. 117 * The target cluster K' can be specified by the user application, using the CXY field of the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/shared_include/shared_pthread.h pthread_attr_t] argument. If the CXY is not defined by the user, the target cluster K' is selected by the kernel K, using the DQDT.117 * The target cluster K' can be specified by the user application, using the CXY field of the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/shared_include/shared_pthread.h#L44 pthread_attr_t] argument. If the CXY is not defined by the user, the target cluster K' is selected by the kernel K, using the DQDT. 118 118 * The target core in cluster K' can be specified by the user application, using the CORE_LID field of the pthread_attr_t argument. If the CORE_LID is not defined by the user, the target core is selected by the target kernel K'. 119 119 … … 125 125 == __5) Thread destruction__ == 126 126 127 The destruction of a target thread T can be caused by another thread K, executing the '''pthread_cancel()''' sys call (see [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_thread_cancel.c sys_thread_cancel.c]) requesting the target thread T to stop execution. It can be caused by the thread T itself, executing the '''pthread_exit()''' syscall (see [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_thread_exit.c sys_thread_exit.c])to suicide. Finally, it can be caused by the ''exit()'' or ''kill()'' syscalls requesting the destruction of all threads of a given process.127 The destruction of a target thread T can be caused by another thread K, executing the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_thread_cancel.c pthread_cancel()] syscall requesting the target thread T to stop execution. It can be caused by the thread T itself, executing the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_thread_exit.c pthread_exit()] syscall to suicide. Finally, it can be caused by the ''exit()'' or ''kill()'' syscalls requesting the destruction of all threads of a given process. 128 128 129 129 The unique method to destroy a thread is to call the '''thread_kill()''' function, that sets the THREAD_FLAG_REQ_DELETE bit in the ''flags'' field of the target thread descriptor. The thread will be asynchronously deleted by the scheduler at the next scheduling point. 130 130 The scheduler calls the ''thread_destroy()'' function that detaches the thread from the scheduler, detaches the thread from the local process descriptor, and releases the memory allocated to the thread descriptor. The '''thread_kill()''' function can be called by the target thread itself (for an exit), or by another thread (for a kill). 131 131 132 If the target thread is running in attached mode, the '''thread_kill()''' function synchronizes with the joining thread, waiting the actual execution of the pthread_join() syscall (see [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_thread_join.c sys_thread_join.c])before marking the target thread for delete.132 If the target thread is running in attached mode, the '''thread_kill()''' function synchronizes with the joining thread, waiting the actual execution of the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_thread_join.c pthread_join()] syscall before marking the target thread for delete. 133 133 134 134 If the target thread is the main thread (i.e. the thread 0 in the process owner cluster) the '''thread_kill()''' does not mark the target thread for delete, because this must be done by the parent process main thread executing the ''sys_wait()'' syscall (see section [6] below). … … 138 138 The scenario is rather simple when the target thread T is not running in ATTACHED mode. 139 139 The killer thread (that can be the target thread itself for an exit) calls the thread_kill() function that does the following actions: 140 * the killer thread sets the THREAD_BLOCKED_GLOBALbit in the target thread ''blocked'' field,141 * the killer thread sets the THREAD_FLAG_REQ_DELETEbit in the target thread ''flags'' field,140 * the killer thread sets the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h#L82 THREAD_BLOCKED_GLOBAL] bit in the target thread ''blocked'' field, 141 * the killer thread sets the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h#L76 THREAD_FLAG_REQ_DELETE] bit in the target thread ''flags'' field, 142 142 * the killer thread returns without waiting the actual deletion. 143 143 … … 148 148 This destruction mechanism can involve three threads: the target thread T, the killer thread K, and the joining thread J: 149 149 150 It uses two specific fields in the thread descriptor: the ''join_lock'' field is a remote_spin_lock, and the ''join_xp'' field contains an extended pointer on the first arrived thread. It uses also two specific THREAD_FLAG_JOIN_DONE and THREAD_FLAG_KILL_DONE flags in the target thread descriptor ''flags'' field, and one specific blocking bit THREAD_BLOCKED_JOIN, in the ''blocked'' field.150 It uses two specific fields in the thread descriptor: the ''join_lock'' field is a remote_spin_lock, and the ''join_xp'' field contains an extended pointer on the first arrived thread. It uses also two specific [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h#L72 THREAD_FLAG_JOIN_DONE] and [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h#L73 THREAD_FLAG_KILL_DONE] flags in the target thread descriptor ''flags'' field, and one specific blocking bit [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h#L86 THREAD_BLOCKED_JOIN], in the ''blocked'' field. 151 151 152 152 * Both the killer thread K, executing the thread_kill() function), and the joining thread J, executing the sys_thread_join() function, try to take the ''join_lock'' implemented in the T thread descriptor (the ''join_lock'' in the J thread is not used). … … 166 166 The process descriptors copies (other than the process descriptor in owner cluster) are simply deleted by the scheduler when the last thread of a given process in a given cluster is deleted. The process descriptor copy is removed from the list of copies in the owner process cluster descriptor, and the process copy disappears. 167 167 168 The process destruction in the owner cluster is more complex, because the child process destruction must be reported to the parent process when the main thread of the parent process executes the blocking [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_wait.c sys_wait()] system call (in the parent owner cluster). Therefore, the child process in owner cluster cannot be destroyed before the parent calls the sys_wait() function. As the '''sys_wait()''' and the '''sys_kill()''' (or '''sys_exit()''') functions are executed by different threads running in different clusters, this requires a parent/child synchronization. To keep a process descriptor in ''zombie'' state after a sys-kill() or sys_exit(), the main thread (i.e. thread 0 in process owner cluster) is not deleted until the sys_wait() syscall is executed by the parent process main thread. This synchronization uses the '''term_state''' field in process descriptor, that contains the following information :169 * The PROCESS_FLAG_KILLindicates that a KILL request has been received by the child;170 * The PROCESS_FLAG_EXITindicates that an EXIT request has been made by the child;171 * The PROCESS_FLAG_BLOCKindicates that a SIGSTOP signal has been received by the child;172 * The PROCESS_FLAG_WAITflag indicates that a WAIT request from parent has been received by the child.173 * Moreover, for an exit(), the exit() argument value is registered in this ''term_state''field.168 The process destruction in the owner cluster is more complex, because the child process destruction must be reported to the parent process when the main thread of the parent process executes the blocking [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_wait.c#L34 sys_wait()] system call (in the parent owner cluster). Therefore, the child process in owner cluster cannot be destroyed before the parent calls the sys_wait() function. As the '''sys_wait()''' and the '''sys_kill()''' (or '''sys_exit()''') functions are executed by different threads running in different clusters, this requires a parent/child synchronization. To keep a process descriptor in ''zombie'' state after a sys_kill() or sys_exit(), the main thread (i.e. thread 0 in process owner cluster) is not deleted until the sys_wait() syscall is executed by the parent process main thread. This synchronization uses the '''term_state''' field in process descriptor, that contains the following information : 169 * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/shared_include/shared_wait.h#L36 PROCESS_TERM_KILL] indicates that a KILL request has been received by the child; 170 * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/shared_include/shared_wait.h#L37 PROCESS_TERM_EXIT] indicates that an EXIT request has been made by the child; 171 * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/shared_include/shared_wait.h#L35 PROCESS_TERM_BLOCK] indicates that a SIGSTOP signal has been received by the child; 172 * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/shared_include/shared_wait.h#L38 PROCESS_TERM_WAIT] flag indicates that a WAIT request from parent has been received by the child. 173 * Moreover, for an exit(), the exit() argument value is registered in this [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/process.h#L147 term_state] field. 174 174 175 The actual deletion of the child owner process descriptor and child main thread are done by the sys_wait() function, executed by the parent main thread (i.e. thread 0 in parent owner cluster). This sys_wait() function executes an infinite loop. At each iteration, the parent main thread scans all children owner descriptors. When it detects that one child terminated, it sets the PROCESS_FLAG_WAIT in child owner process descriptor, sets the THREAD_FLAG_DELETEin the child main thread, and returns to report the child termination state to parent process. It is the responsibility of the parent process to re-enter the sys_wait() syscall for the other children. When the parent process does not detect a terminated child at the end of an iteration, it deschedules without blocking.175 The actual deletion of the child owner process descriptor and child main thread are done by the sys_wait() function, executed by the parent main thread (i.e. thread 0 in parent owner cluster). This sys_wait() function executes an infinite loop. At each iteration, the parent main thread scans all children owner descriptors. When it detects that one child terminated, it sets the PROCESS_FLAG_WAIT in child owner process descriptor, sets the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h#L76 THREAD_FLAG_REQ_DELETE] in the child main thread, and returns to report the child termination state to parent process. It is the responsibility of the parent process to re-enter the sys_wait() syscall for the other children. When the parent process does not detect a terminated child at the end of an iteration, it deschedules without blocking. 176 176 177 177 178 === 6.2) detailed exit scenario ===178 === 6.2) Detailed exit scenario === 179 179 180 180 This section describes the termination of a process caused by a sys_exit(). … … 187 187 1. The main thread, and the owner process descriptor on one hand, the calling thread and the associated process will be destroyed by the scheduler at the next scheduling point. 188 188 189 === 6.3) detailed kill scenario ===189 === 6.3) Detailed kill scenario === 190 190 191 191 This section describes the termination of a target process caused by a sys_kill( SIGKILL ).