Version 73 (modified by alain, 4 years ago) (diff)


Boot procedure

A) General Principles

The ALMOS-MKH boot procedure can be decomposed in two phases:

  • The architecture dependent phase, implemented by an architecture specific bootloader procedure.
  • The architecture independent phase, implemented by a generic (architecture independent) kernel-init procedure.

As the generic (i.e. architecture independent) kernel initialization procedure is executed in parallel by all kernel instances in all clusters containing at least one core and one memory bank, the main task of the bootloader is to load - in each cluster - a local copy of the ALMOS-MKH kernel code, and a description of the hardware architecture, contained in a local boot_info data-structure, .

This fixed size boot_info structure is build by the boot-loader, and stored at the beginning of the local copy of the kdata segment. As it contains both general and cluster specific information, the content depends on the cluster:

  • general hardware architecture features : number of clusters, topology, etc.
  • available external (shared) peripherals : types and features.
  • number of cores in cluster,
  • available internal (private) peripherals in cluster : types and features.
  • available physical memory in cluster.

This boot_info structure is defined in the almos-mkh/tools/arch_info/boot_info.h and almos-mkh/tools/arch_info_info/boot_info.c files.

To build the various boot_info structures (one per cluster), the boot-loader uses the arch_info binary structure, that is described in section Hardware Platform Definition.

This arch_info structure is defined in the almos-mkh/tools/arch_info/arch_info.h and almos-mkh/tools/arch_info/arch_info.c files.

To be accessed by the boot loader, the binary file arch_info.bin file must be stored on disk, in the file system root directory.

This method allows an intelligent boot_loader to check the hardware components, to guaranty that the generated boot_info structures contain only functionally tested hardware components.

We describe below the boot_loader for the TSAR architecture, the boot_loader for the I86 architecture, and the generic kernel initialization procedure.

B) Boot-loader for the TSAR architecture

The bootloader uses an OS-independent preloader, stored in an addressable but non-volatile device, that load the bootloader code from an external block-device to the cluster 0 physical memory. This preloader is specific for the TSAR architecture, but is independent on the Operating System. It is used by ALMOS-MKH, but also by LINUX, NetBSD, ALMOS_MKH, or the GIET-VM.

The TSAR boot_loader allocates - in each cluster containing a physical memory bank - six fixed size memory zones, to store various binary files or data structures. The two first zones are permanently allocated: The PRE_LOADER zone can contain (for example in the TSAR-LETI architecture) the preloader code. The KERNEL_CODE zone containing the kcode and kdata sgments is directly used by the kernel when the bootloader transfers control - in each cluster - to the kernel_init procedure. The BOOT_CODE, ARCH_INFO, KERNEL_ELF, and BOOT_STACK zones are temporary: they are only used - in each cluster - by the boot-loader code, but the corresponding physical memory can be freely re-allocated by the local kernel instance when it starts execution.

name description base address (physical) size
BOOT_CODE boot-loader code and data BOOT_CODE_BASE (2 MB) BOOT_CODE_MAX_SIZE (1 MB)
ARCH_INFO arch_info.bin file copy ARCH_INFO_BASE (3 MB) ARCH_INFO_MAX_SIZE (1 MB)
BOOT_STACK boot stacks (one per core) BOOT_STACK_BASE (6 MB) BOOT_STACK_MAX_SIZE (1MB)

The values given in this array are indicative. The actual values are defined by configuration parameters in the boot_config.h file. The two main constraint are the following:

  • the kcode and kdata segments (in the KERNEL_CODE zone) must be entirely contained in one single big physical page (2 Mbytes), because it will be mapped as one single big page in all process virtual spaces.
  • the BOOT_CODE zone (containing the boot loader instructions and data) must be entirely contained in the next big physical page, because it will be mapped in the boot-loader page table to allow the cores to access locally the boot code as soon as it has been copied in the local cluster.

For almos-mkh, a core is identified by two - architecture independent - indexes: cxy is the cluster identifier, an lid is the core local index in cluster.

For the TSAR architecture, the MIPS32 CP0 register containing the core gid (global hardware identifier) that has a fixed format: gid = (cxy << 2) + lid

All cores contribute to the boot procedure, but all cores are not simultaneously active:

  • in the first phase - fully sequencial - only core[0][0] is running.
  • In the second phase - partially parallel - only core[cxy][0] is running in each cluster.
  • in the last phase - fully parallel - all core[cxy][lid] are running.

We describe below the five phases of the TSAR bootloader:

B1. Preloader

  • In the TSAR_LETI architecture, the preloader code is stored in the first 16 kbytes of the physical address space in cluster 0.
  • In the TSAR_IOB architecture, the preloader is stored in an external ROM, that is accessed throug the IO_bridge located in cluster 0.

At reset, the MMU is de-activated (for both data and instructions), and the extension address register supporting direct access to remote memory banks (for data only) contain the 0 value. Therefore, in this first phase, all cores can only access the physical address space of cluster 0.

All cores execute the same preloader code, but the work done depends on the core identifier:

  • The core[0][0] load in the BOOT_CODE zone of cluster 0, the boot-loader code stored on disk.
  • All other cores do only one task before going to sleep (i.e. low-power state): each core activates its private WTI channel in the local ICU (Interrupt Controller Unit) to be later activated by an IPI (Inter Processor Interrupt).

B2. Bootloader entry

The first instructions of the bootloader are defined in the almos-mkh/boot/tsar_mips32/boot_entry.S file. This assembly code is executed by all cores entering the boot-loader, but not at the same time.

Each core running this assembly code makes the 3 following actions:

  • It initializes the core stack pointer depending on the lid value extracted from the gid, using the BOOT_STACK_BASE and BOOT_STACK_SIZE parameters defined in the boot_config.h file,
  • It changes the value of the DATA address extension CP2 register, using the cxy value extracted from the gid, to force each core to access its stack in local physical memory.
  • It jumps to the boot_loader() C function defined in the boot.c file, passing the two (cxy , lid) arguments.

B3. Bootloader sequencial phase

In this phase, only core [0][0] is running, while all other cores are blocked in the preloader, waiting to be activated by an IPI.

In this sequencial phase, the core[0][0] executes the boot_loader() C function, defined in the almos-mkh/boot/tsar_mips32/boot.c function.

It makes the following actions:

  • The core[0][0] initializes 2 peripherals: The TTY terminal (channel 0) to display log messages, and the IOC peripheral to access the disk file system.
  • The core[0][0] initializes the boot-loader FAT32, allowing the boot loader to access files stored in the FAT32 file system on disk.
  • The core[0][0] load in the KERNEL_ELF zone the kernel.elf file from the disk file system..
  • Then it copies in the KERNEL_CORE zone the kcode and kdata segments, using the addresses contained in the .elf file (identity mapping).
  • The core[0][0] load in the ARCH_INFO zone the arch_info.bin file from the disk file system.
  • Then it builds from this arch_info.t structure the specific boot_info_t structure for cluster 0, and stores it in the kdata segment.
  • The core[0][0] send IPIs to activate all cores [cxy][0] in all other clusters.

B4. Bootloader partially parallel phase

In this phase all core[cxy][0], other than the core[0][0] are running (one core per cluster).

At this point, all DATA extension registers point already on the local cluster( to use the local stack).

The core[cxy][0] exécute the following tasks:

  • To access the global data stored in cluster cxy, the core[cxy][0] copies the boot-loader code from BOOT_CODE zone in cluster 0 to BOOT_CORE zone in cluster cxy.
  • The core[cxy][0] creates a minimal page table containing two big pages mapping the local BOOT_CORE zone, and the local KERNEL_CODE zone,
  • To access the boot code stored in cluster cxy, the core[cxy][0] activates the instruction MMU.
  • The core[cxy][0] copies the arch_info.bin structure from ARCH_INFO zone in cluster 0 to ARCH_INFO zone in cluster cxy.
  • The core[cxy][0] copies the kcode and kdata segments from KERNEL_CODE zone in cluster 0 to KERNEL_CODE zone in cluster cxy.
  • The core[cxy][0] builds from the arch_info.t the specific boot_info_t structure for cluster cxy, and stores it in the local kdata segment.
  • All core[cxy][0], including core[0][0], synchronize using a global barrier.
  • In each cluster cxy, the core[cxy][0] activates the other cores that are blocked in the pre-loader.

B5. Bootloader fully parallel phase

In this phase all cores in all clusters are running.

Each core must initialise few registers, as described below, and jump to the kernel_entry address. This address is defined in the kernel.elf file, and registered in the kernel_entry global variable.

  • argument : the kernel_init() function unique argument is a pointer on the boot_info_t structure, that is the first variable in the data segment.
  • stack pointer : In each cluster an array of idle thread descriptors, indexed by the local core index, is defined in the kdatasegment, on top of the boot_info_t structure. For any thread, the thread descriptor contains the kernel stack, and this is used to initialize the stack pointer.
  • base register : in each core, the cp0_ebase register, defines the kernel entry point in case of interrupt, exception, or syscall, and must be initialized. [TO BE MOVED to kernel_init()]
  • status register : in each core, the cp0_sr register defines the core state, and must be initialized (UM bit reset / IE bit reset / BEV bit reset ).

At this point, the bootloader completed its job:

  • The kernel code kcode and kdata segments are loaded - in all clusters - in the first offset physical pages.
  • The hardware architecture described by the arch_info.binfile has been analyzed, and copied - in each cluster - in the boot_info structure, stored in the kdata segment.
  • Each local kernel instance can use all the physical memory that is not used to store the kernel kcode and kdata segments themselves.

C) Bootloader for the I86 architecture


D) Generic kernel initialization procedure

The kernel_init() function is the kernel entry point when the boot_loader transfers control to the kernel. The argument is a pointer on the fixed size boot_info structure, stored in the local kdata segment.

The source code for this function is defined in the almos-mkh/kernel/kern/kernel_init.c file.

All the kernel_init() code is independent on the architecture, but the MMU status depends on the target architecture:

  • For the TSAR architectures, the instruction MMU has been activated and uses the Page Table defined by the boot-loader. The data MMU is de-activated, and the DATA address extension register points on the local physical memory.
  • For the I86 architectures, both the instruction and the data MMUs have been activated, an use the Page Table defined by the boot-loader.

In both cases, the kernel_init() function must create in each cluster a new kernel GPT (Generic Page Table), and a new kernel VSL (Virtual Segments List), to be used by the local core MMUs to access the local kcode and kdata segments. In each cluster, an unique kernel process_zero contains all kernel threads running in this cluster. These kernel VSL and GPT structures are registered in the process_zero descriptor, to be used by the kernel threads. The informations registered in these VSL and GPT kernel structures will be copied later in the user VSL and GPT structures contained in all user process descriptors created in the cluster, to be used by the user threads associated to this process.

In each cluster, all local cores execute this procedure in parallel, but most tasks are only executed by core[0]. This procedure uses two synchronisation barriers, defined as global variables in the kdata segment:

  • the global_barrier variable is used to synchronize all core[0] in all clusters containing a kernel instance.
  • the local_barrier variable is used to synchronize all cores in a given cluster.

The kernel initialization procedure execute sequentially the following steps:.

D1. Core and cluster identification

Each core is supposed to have an unique hardware identifier, called gid, hard-wired in a read-only register. From the kernel point of view a core is identified by a composite index (cxy,lid), where cxy is the cluster identifier, and lid is a local (continuous) index in the cluster. The association between the gid hardware index and the (cxy,lid) composite index is defined in the boot_info structure. In this first step, each core makes an associative search in the boot_info structure to obtain the (cxy,lid) indexes from the gid index.

The core[cxy][0] initialize the global variable local_cxy defining the local cluster identifier, and initialises the local cluster descriptor from informations found in the boot_info structure. All cores make a first initialization of their private kernel IDLE thread. Finally, the core[0][0] initialise the kernel TXT0. This terminal is used by the kernel code, running on any core, to display log or debug messages. This terminal is configured in non-descheduling mode : the calling thread executes directly the relevant TXT driver, without using the dedicated DEV kernel thread.

A first synchronization barrier is used to avoid that other cores use the TXT0 terminal before initialization.

D2. Cluster manager initialization

In each cluster, the core[0] makes the cluster manager initialization. The cluster manager contains the structures describing the main kernel ressources in the cluster :

  • the hardware architecture parameters (both local and global).
  • an array of cores descriptors, and the associated schedulers.
  • the physical memory allocator(s) in this cluster.
  • the DQDT, that is a global, distributed structure registering the current level of ressource (cores and memory) availability.

The cluster manager is defined in the almos-mkh/kernel/kern/cluster.h and almos-mkh/kernel/kern/cluster.c files.

As all cluster managers are global variables, accessible by any kernel instance, running in any cluster, a synchonization barrier is required to avoid access to a cluster manager before initialization.

D3. Kernel entry point and process_zero initialization

All cores initialise the registers, defining the kernel entry point(s) in case of interrupt, exception or system call. This must be done here because the VFS initialization uses RPCs requiring Inter Processor Interrupts. All cores initialise their (currently running) IDLE thread descriptor. In each cluster the core[cxy][0] initializes the local kernel process_zero descriptor. This include the creation of the local kernel GPT and VSL.

Here again, a synchronization barrier is required to avoid access to VSL/GPT before initialization.

D4. MMU activation

In each cluster, all cores activate their private MMU, as required by the architecture. For TSAR, only the instruction MMU is activated, but the data MMU is de-activated. Moreover, the core[0] in cluster[0] initializes the external IOPIC device

A synchronization barrier is required to avoid access to IOPIC before initialization.

D5. Internal & external devices initialization

In each cluster[cxy], the core[cxy][0] makes the devices initialization (device == peripheral).

The almos-mkh policy regarding peripherals and I/O operations is described here.

For multi-channels devices, there is one channel device (called chdev) per channel. For each chdev, almost-mkh creates a dedicated kernel DEV thread, that is in charge of executing all requests targeting this chdev. All these requests are registered in a waiting queue rooted in the chdev. The chdev API is defined in the almos-mkh/kernel/kern/chdev.h and almos-mkh/kernel/kern/chdev.c files.

For internal (replicated) devices, the chdev descriptors are allocated in the local cluster. For external (shared) devices, the chdev descriptors are regularly distributed on all clusters. These external chdev are indexed by a global index, and the host cluster is computed from this index by a modulo.

The internal devices descriptors are created first( ICU, then MMC, then DMA ), because the ICU device is used by all other devices. Then the WTI mailboxes used for IPIs (Inter Processor Interrupt) are allocated in local ICU : one WTI mailbox per core. Then each external chdev descriptor is created in the cluster where it must be created.

As any thread, running in any cluster, can access any chdev, located in any other cluster, a synchronization barrier is required to avoid access to devices before initialization.

D.) IPI, Idle thread, and VFS root initialization

Each core enable its private input IPI, and completes initialization of its (currently running) idle thread descriptor. Then core[0] in cluster[0] creates the root VFS in cluster[0]. This requires to access the file system on disk.

A synchronization barrier is required to avoid access to VFS root before initialization.

D7. VFS root initialisation in all clusters

In each cluster other than cluster[0], the core[0] initializes the VFS and FS contexts in the local cluster, from values registered in cluster[0].

A synchronization barrier is required to avoid access to VFS before initialization.

D8. DEVFS global initialization

The core[0] in cluster[0] makes the DEVFS global initialization: It initializes the DEVFS context, and creates the DEVFSbdev and external directory inodes in cluster[0].

A synchronization barrier is required to avoid access to DEVFS root before initialization.

D9. DEVFS local initialization

In each cluster[cxy], the core[0] completes in parallel the DEVFS initialization. Each core[0] get the extended pointers on the dev and external directories from values stored in cluster[0]. Then each core[0] creates the DEVFS internal directory, and creates the pseudo-files for all chdevs in cluster[cxy].

A synchronization barrier is used to avoid access to DEVFS before initialization.

D10. Process init creation

The core[0] in cluster[0] creates (i.e. allocates memory, and initializes) the process descriptor for the first user process. This includes the VMM initialization : the user process GPT and VSL inherits relevant informations from the kernel process GPT and VSL.

The core[0] in cluster[0] displays the ALMOS-MK banner.

A last synchronization barrier is used before jumping to the idle_thread() function.

D11. Scheduler activation

Finally, all cores make the three following actions:

  • set the TICK timer, and unmask interrupts to activate the scheduler.
  • jump to the idle_thread() function, and wait for an useful thread to be scheduled.

Attachments (4)

Download all attachments as: .zip