In this section, we describe the scheduling mechanisms used by ALMOS-MKH.
A) General principles
Our main goal is to avoid to remove all contention points in the kernel. Therefore, we choose a fully distributed approach: There is one private scheduler per core, and ALMOS-MKH does not support thread migration: threads are assigned to a given core at thread creation, and will never be executed by another core, until thread destruction. ALMOS-MKH implement a preemptive policy for time sharing between all threads assigned to a given core. The <cpu_context> and <fpu_context> fields in the thread descriptor define the storage required to save (or initialize) the core registers values when the thread is not running. A fixed number of TICK periods - called quantum - is allocated to a running thread.
- The value of the TICK period (in milli-second) is defined by the CONFIG_SCHED_TICK_MS_PERIOD parameter.
- The number of TICKS is defined by the CONFIG_SCHED_TICKS_PER_QUANTUM configuration parameter.
- The max number of threads assigned to a given core is defined by the CONFIG_SCHED_MAX_THREADS_NR configuration parameter.
B) Thread states
ALMOS-MKH defines two types of threads:
- USER threads are POSIX compliant threads, defined in a given user process. A main thread is always created for an user process. Other threads are created by the pthread_create() syscall.
- KERNEL threads implement kernels services: RPC threads execute the Remote Procedure Calls; DEV threads implement IO channels operations, and the IDLE thread is the default (low-power) thread.
From the scheduler point of view, any thread (KERNEL or USER) can be in three states:
- RUNNING : the thread is running on the core. The scheduler <current> field contains a pointer on the running thread.
- READY : the thread is ready to execute, but is not currently running.
- BLOCKED : The thread is blocked on a given condition, and cannot be selected for execution.
The thread state is implemented as the <blocked> bit-vector field in the thread descriptor. A thread generally enter in the BLOCKED state, when a given resource is not available, by calling the thread_block() function that set the relevant bit in the <blocked> bit-vector. It returns to the READY state when another thread releases the blocking resource, and call the thread_unblock() function, that reset the relevant bit. The thread_unblock() function can be called by any thread running in any cluster.
This simple blocking / unblocking mechanism is well suited to the Multi-Kernel-Hybrid architecture, as it does not require to move the blocked thread from one queue to another queue:
- when a thread A is blocked on a busy shared resource, it makes a local_write to its <blocked> bit-vector, register itself in the resource waiting queue, and call the sched_yield() function.
- when the resource is released by the owner thread B - running in any cluster - thread B pop the first waiting thread from the waiting queue, and uses a remote_write access to unblock thread A.
C) Scheduling policy
Each scheduler maintains two separate, circular, lists of threads: one list of KERNEL threads, and one list of USER threads. The KERNEL threads have a higher priority than the USER threads, and each list is handled with a round-robin priority. When the sched_yield() function is called to perform a context switch for a given core, it implement the following policy:
- It scan the KERNEL list to find a READY thread (other than the IDLE thread). It executes this KERNEL thread if found.
- If no KERNEL thread is found, it scan the USER list to fin a READY thread. It executes this USER thread if found.
- If there is no KERNEL thread and no USER thread (other than the calling thread), the calling thread continues execution.
- If there is no READY thread, it executes the IDLE thread.
The kernel has the possibility to force the selection of a given thread by passing a non-null argument to the sched_yield() function. This is used to reduce the RPC latency: when a core executing the rpc_check() function detect a pending RPC, it select an RPC thread, and force execution of this RPC thread.
Finally ALMOS_MKH implements the following scheduling priority : RPC > DEV > USER > IDLE
D) Context Switches
As a general rule, a thread can deschedule when it is holding an high level kernel lock (queuelock or rwlock), but cannot deschedule when it is holding a kernel busy-waiting lock.
Context switches can have two causes: The RUNNING thread can explicitly ask to be descheduled, when blocked on a shared resource. Before descheduling, the thread should release all taken kernel busy locks. But the scheduling policy being preemptive, the RUNNING thread can hold one (or several) kernel lock(s) when receiving a TICK interrupt. Therefore, the busylock_acquire() function disables IRQs before it takes the lock, and the busylock_release() function restore IRQs only after it releases the lock.
On the other hand, ALMOS-MKH supports descheduling of an USER thread that is currently in kernel mode - as a result of a syscall. In other words, some syscalls (not all) can be interrupted by interrupts signaling the completion of an I/O operations, or by the TICK interrupt, requiring a context switch.
E) Floating Point Unit
To be Documented.