Course "Architecture of Multi-Processor Systems"
TP2: Code deployment on a programmable processor
(pirouz.bazargan-sabet@…)
A. Objectives
The goal of this second lab is to deploy and run a software application (written in C language) on a hardware architecture with a MIPS32 processor. You will need to use a cross-compiler to generate the binary code for this application, as well as the binary code for the operating system. You will need to load this binary code into the virtual prototype memory, run the simulation, and analyse how the processor accesses the memory and the TTY terminal.
The hardware architecture of this second practical is very similar to that of the first one. The only difference is that we replace the "wired" master with a MIPS32 programmable processor with an instruction cache and a data cache, and that we introduce a second ROM containing the boot code. So we have one master and three targets:
- PibusSegBcu: Bus arbiter
- PibusMips32Xcache: MIPS32 processor with its caches
- PibusSimpleRam: ROM containing the boot code
- PibusSimpleRam: RAM containing instructions and data
- PibusMultiTty: TTY terminal controller
The MIPS32 processor can start an instruction every cycle due to its pipelined architecture. The detailed operation of caches will be studied in labs 3 and 4. For this lab, we just need to know that the processor has two separate caches for instructions and data. The cache lines are 16-byte wide (i.e. 4 words of 32 bits), and the cache controller triggers a transaction on the bus in the following four cases:
- miss instruction: the processor seeks to read an instruction that is not in its instruction cache. The processor is frozen while processing the MISS, and the cache controller performs a burst transaction consisting of reading a complete cache line into memory (i.e. 4 elementary transfers corresponding to 4 words of 32 bits on the Pibus).
- miss data: the processor tries to execute a cacheable data read (lw or lb instruction), and it is not in its data cache. The processor is frozen, and the cache controller performs a burst transaction consisting of reading a complete cache line (i.e. 4 elementary transfers corresponding to 4 words of 32 bits on the Pibus).
- read uncached: the processor seeks to execute a non-cacheable data read (for example, the read in an addressable register of the PibusMultiTty component). The processor is frozen for as long as it takes the cache controller to perform a simple transaction (transferring a single 32-bit word) on the Pibus.
- Write: The processor seeks to execute a data write (sw or sb instruction). The processor is usually not frozen, thanks to the write-post buffer, and the cache controller performs a simple transaction of writing a single 32-bit word to the bus.
B. Getting Started
The archive multi_tp2.tgz contains the files you will need. Create a working directory tp2, and unpack the archive into this directory. In addition to the files tp2_top.cpp, and tp2.desc, you will find a soft directory used to generate the software executed by the MIPS32 processor.
C. Modeling the hardware architecture
Edit the file tp2_top.cpp which contains an incomplete description of the system architecture. Modify the values of the arguments of the constructor of the proc component. These arguments define the characteristics of the caches and the buffer for posted writes. We will choose cache lines of 4 words of 32 bits, no associativity, and a total capacity of 1 Kbyte for each of the two caches. We will choose a depth of 8 words for the posted write buffer. You must consult the file pibus_mips32_xcache.h to obtain information on the prototype of the constructor and on the possible values for the arguments.
We remind you that the models of the hardware components are available in the directory :
/users/tool/soc/soclib-lip6/pibus
Question C1 What values must the parameters icache_words, icache_sets, icache_ways, dcache_words, dcache_sets, dcache_ways , wbuf_depth take in order to give the caches the characteristics requested above?
The addressable space defined by this hardware architecture contains eight addressable segments:
- The seg_reset segment contains the boot code. It will be assigned to the rom component, with the base address 0xBFC00000, and it has a size of 4 Kbytes.
- The seg_kcode segment contains the operating system code. It will be assigned to the ram component, with the base address 0x80000000, and it has a size of 16 Kbytes.
- The seg_kunc segment contains the non-cacheable data of the operating system. It will be assigned to the ram component, with the base address 0x81000000, and has a size of 4 Kbytes.
- The seg_kdata segment contains the cacheable data of the operating system. It will be assigned to the ram component, with the base address 0x82000000, and has a size of 64 Kbytes.
- The seg_code segment contains the application code. It will be assigned to the ram component, with the base address 0x00400000, and it has a size of 16 Kbytes.
- The seg_data segment contains the global data of the application. It will be assigned to the ram component, at the base address 0x01000000, and has a size of 16 Kbytes.
- The seg_stack segment contains the program execution stack. It will be assigned to the ram component at base address 0x02000000, and has a size of 16 Kbytes.
- The seg_tty segment corresponding to the addressable registers of the PIBUS_MULTI_TTY component. It has a base address of 0x90000000, and a size of 16 bytes.
Question C2 Why is the seg_reset segment not assigned to the same hardware component as the other 6 memory segments seg_kcode, seg_kdata, seg_kunc, seg_stack, seg_code, and seg_data?
Question C3 Explain why the segment seg_tty must be non-cacheable.
Question C4 Among the 8 segments used in this architecture, which are the protected segments (i.e. accessible only when the processor is in supervisor mode). How is this protection achieved?
Complete the tp2_top.cpp file to define the segmentation of the addressable space. The values for the base addresses and segment lengths must be defined at the beginning of the file, and the segment table must be completed.
The ram and rom hardware components of this architecture must be preloaded. This preloading is a facility allowed by the virtual protoypage: in a real machine, the loading in memory of the executable binary code is done by a particular program, (called loader), which will read on the disk the file containing the binary code and load it in memory, at the addresses which were specified at the time of compilation. As this loading is very long (since it requires in principle to integrate in the architecture a disk controller, which is a slow device), a shortcut is used, and this loading is carried out (in zero time) by the constructor of the hardware component PibusSimpleRam itself, before launching the simulation. The executable binary code can be contained in one or more files.
To load the binary code, a loader must be used, which is a C++ object that must be passed as an argument to the constructor of the PibusSimpleRam component. The loader constructor itself takes as argument the pathname defining the file containing the binary code. If the binary code is contained in several separate files, as many pathoms must be given as files.
In our case, we will compile the system code and the user code separately, and we will therefore have two files containing binary code, sys.bin and app.bin, which will be stored in the soft directory.
Complete in the file tp2_top.cpp the constructor of the component loader which allows to pre-load in the two components rom and ram the binary code which will be executed by the MIP32 processor.
When all these modifications are made, you can compile the file tp2_top.cpp to generate the executable of simulation simul.x by launching the command:
soclib-cc -p tp2.desc -t systemcass -o simul.x
At this point you have a simulator of the hardware platform, but you still have to generate the binary code that has to be executed on that platform...
D. Operating System: GIET
The program executed in this tutorial is a software application that runs in user mode, under the control of a small operating system called GIET (Interruptions, Exceptions and Traps Manager).
The GIET provides three main services:
- an exception handler to handle errors in user programs.
- an interrupt handler supporting a vectorised interrupt mechanism.
- a system call handler providing in particular access functions to peripherals.
The two main limitations of GIET, which differentiate it from a real operating system, are the lack of support for virtual memory, and the lack of support for dynamic task creation.
The code is therefore separated into two parts: the files containing the code that runs in supervisor mode are contained in the sys directory, while the files containing the code that runs in user mode are contained in the app directory.
- The file sys_handler.c is written in C. It is in the sys directory, and contains the code of the system call handler, responsible for calling the system function corresponding to the requested service.
- The exc_handler.c file is written in C. It is in the sys directory, and contains the code of the exception handler, responsible for reporting and handling errors detected in user programs.
- The file irq_handler.c is written in C. It is in the sys directory, and contains the interrupt handler code, as well as the interrupt handling routines (ISR).
- The file ctx_handler.c is written in C. It is in the sys directory, and contains the code for the context switch handler, which is used when a processor is running multiple tasks in time division multiplex.
- The drivers.c file contains the functions of access to the peripherals. It gathers the drivers of all the machine's peripherals.
- The common.c file is written in C. It is in the sys directory, and contains the general functions of the operating system, such as the primitives of synchronisation between tasks.
- The giet.s file is written in MIPS32 assembler. It is in the sys directory, and contains the function that analyses the cause of the GIET call, and the context save/restore function.
- The stdio.c file is written in C. It is in the app directory because it contains all the system calls that can be used by a user program written in C. The name of this file comes from the fact that most system calls are used to access devices.
The GIET source code is accessible and stored in the following directory:
/users/enseig/alain/giet_2011/
To get into the GIET code, a good method is to analyze the proctime() system call, which does not involve any devices, and has no arguments: The MIPS32 processor has various protected registers (i.e. accessible only in supervisor mode, using the mtc0 and mfc0 instructions). Among these registers, the COUNT register is initialized to 0 at boot time, and incremented at each cycle. The system call proctime() returns the value of the COUNT register, which allows for example to measure the execution time of a calculation.
Question D1 What information must a user program provide to the operating system when it executes a system call? What is the technique used by the GIET to transmit this information?
Question D2 Open the file giet.s What are the two arrays _cause_vector[16] and _syscall_vector[32]? What are they indexed by? In which files are these arrays initialized?
Question D3 By successively analyzing the contents of the files stdio.c, giet.s, sys_handler.c, drivers.c, give precisely the sequence of function calls triggered by the system call proctime().
Question D4 Give an estimate of the cost (in number of cycles) of this system call, between the branch to the function proctime(), and the return from the function.
E. Generating the binary code
In this part you will generate the binary code that will be executed by the MIPS32 processor, using the GCC cross-compiler. Go to the soft directory.
The user code and the system code must be compiled separately, to generate two separate binary files sys.bin and app.bin.
The soft directory contains six files, which you will find in all the tutorials.
- The main.c file contains the C code of the user application. In this tutorial, this application simply displays the famous "hello world" message in a loop on the TTY terminal screen. This code runs in user mode, and uses system calls to access the TTY device. After compilation, the main.o object file must be linked to the stdio.o object file, which contains the system call code, to generate the app.bin file containing all the user mode code.
- The reset.s file contains the assembler code that boots the machine and launches the application. This code is executed in supervisor mode, and the object file reset.o must therefore, after compilation, be linked to the object files containing the system code (giet.o, drivers.o, common.o, ctx_handler.o, irq_handler.o, sys_handler.o, exc_handler.o), in order to generate the binary file sys.bin containing all the executable code in supervisor mode.
- The config.h file allows to configure the GIET: it defines the values of the two parameters NB_PROCS (number of processors of the hardware platform) and NB_MAXTASKS (maximum number of tasks executed by a processor).
- The sys.ld file contains the directives used by the linker to generate the sys.bin file.
- The app.ld file contains the directives used by the linker when generating the app.bin file.
- The seg.ld file defines the base addresses of the different segments known to the software. This information concerns both the system code and the user code, and the seg.ld file is included in both the sys.ld and app.ld files.
We start with the generation of the system code: file sys.bin.
The reset.s file contains the boot code, which is systematically executed at the time of the starting of the system (i.e. when the processor unconditionally connects to the address 0xBFC00000, after activation of the RESET signal). In general, the main function of the boot code is to initialise the peripherals, and to load the system code into memory. In our case, the boot code is very simple, since we consider that the system code is already loaded in memory when the machine is started. On the other hand, since the GIET does not provide support for dynamic launching of applications, it is the boot code that must launch the user program.
To access the protected registers of the processor, the boot code is written in assembler. It is obviously very dependent on the architecture of the hardware platform, since it depends on the number of processors and the number and type of peripherals. It will therefore vary from one TP to another.
Question E1 Give three reasons why the boot code must necessarily run in supervisor mode.
Complete the reset.s file, to initialise the stack pointer (register $29).
The GIET does not allow the user to interactively launch a new application (through a shell), when the machine has already started. To get around this difficulty, the boot code is responsible for launching the user application.
Question E2' What is the (non-standard) convention that allows the GIET boot code to retrieve the address of the first instruction of the main() function?
Before starting the compilation, first set the environment variables GIET_SYS_PATH, GIET_APP_PATH, AS, CC, LD, DU
> export GIET_SYS_PATH=/users/enseig/alain/giet_2011/sys > export GIET_APP_PATH=/users/enseig/alain/giet_2011/app > export AS=/opt/gcc-cross-mipsel/8.2.0/bin/mipsel-unknown-elf-as > export CC=/opt/gcc-cross-mipsel/8.2.0/bin/mipsel-unknown-elf-gcc > export LD=/opt/gcc-cross-mipsel/8.2.0/bin/mipsel-unknown-elf-ld > export DU=/opt/gcc-cross-mipsel/8.2.0/bin/mipsel-unknown-elf-objdump
Compile successively the two assembler files reset.s and giet.s to generate the corresponding object files in the soft directory.
> $AS -g -mips32 -o reset.o reset.s > $AS -g -mips32 -o giet.o $GIET_SYS_PATH/giet.s
Compile successively the 6 files drivers.c, common.c, ctx_handler.c, irq_handler.c, sys_handler.c, exc_handler.c to generate the object files in the directory soft.
> $CC -Wall -mno-gpopt -ffreestanding -mips32 -I$GIET_SYS_PATH -I. -c -o drivers.o $GIET_SYS_PATH/drivers.c > $CC -Wall -mno-gpopt -ffreestanding -mips32 -I$GIET_SYS_PATH -I. -c -o common.o $GIET_SYS_PATH/common.c > $CC -Wall -mno-gpopt -ffreestanding -mips32 -I$GIET_SYS_PATH -I. -c -o ctx_handler.o $GIET_SYS_PATH/ctx_handler.c > $CC -Wall -mno-gpopt -ffreestanding -mips32 -I$GIET_SYS_PATH -I. -c -o irq_handler.o $GIET_SYS_PATH/irq_handler.c > $CC -Wall -mno-gpopt -ffreestanding -mips32 -I$GIET_SYS_PATH -I. -c -o sys_handler.o $GIET_SYS_PATH/sys_handler.c > $CC -Wall -mno-gpopt -ffreestanding -mips32 -I$GIET_SYS_PATH -I. -c -o exc_handler.o $GIET_SYS_PATH/exc_handler.c
In the link editing phase following the compilation phase, the main function of the sys.ld directive file is to indicate in which memory segments the various objects resulting from the compilation should be grouped. The base addresses of these segments are defined in the seg.ld file. The base addresses of the segments corresponding to the peripherals are also specified, since these addresses are used by the OS to access these peripherals.
Complete the seg.ld file to define the base addresses of the 8 segments known to the software. Double definition of base addresses is a common cause of error: The base addresses used by the software are defined in the seg.ld file, while the base addresses used by the hardware are defined in the tp2_top.cpp file.
Question E3 What happens if the addresses defined in these two files are not equal to each other? What happens if the address constructed by the processor does not correspond to any segment defined in the architecture?
Question E4 By analysing the contents of the file sys.ld, determine which software objects are placed in each of the 2 segments that contain executable system code: seg_reset, seg_kcode.
Run the linker to generate the sys.bin file containing the executable binary code, following the guidelines contained in the sys.ld file.
> $LD -o sys.bin -T sys.ld reset.o giet.o drivers.o common.o ctx_handler.o irq_handler.o sys_handler.o exc_handler.o
Disassemble this binary code to obtain a readable version in the file sys.bin.txt.
> $DU -D sys.bin > sys.bin.txt
Question E5 By analysing the contents of the file sys.bin.txt, determine the effective length of the two segments seg_reset and seg_kcode.
We now attack the generation of the user code: file app.bin.
Question E6' Complete the file main.c. This program must perform the same operations as the wired processor used in TP1: It executes an infinite loop, in which it displays the string hello world\n on the TTY terminal, and then freezes while waiting for a character to be entered on the keyboard, before proceeding to the next iteration of the loop. We will use the two system calls defined in the stdio.c file: tty_puts() and tty_getc(). The prototypes defining the arguments of these two functions can be found in the file stdio.h.
The system call tty_getc(), which runs in user mode, calls (through a syscall statement), the system function _tty_read(), which runs in supervisor mode.
Question E7 By analysing the code of the system call tty_getc() (which you will find in the file stdio.c) and the code of the system function _tty_read() (which you will find in the file drivers.c), explain the mechanism which makes this system call blocking (i.e. it only gives the hand back to the calling program when at least one character has been entered on the keyboard). In other words, which of the two functions contains the waiting loop? Explain why.
Compile successively the 2 files stdio.c and main.c to generate the object files in the soft directory.
> $CC -Wall -mno-gpopt -ffreestanding -mips32 -I$GIET_APP_PATH -I. -c -o stdio.o $GIET_APP_PATH/stdio.c > $CC -Wall -mno-gpopt -ffreestanding -mips32 -I$GIET_APP_PATH -I. -c -o main.o main.c
Run the linker to generate the app.bin file containing the executable binary code, following the guidelines contained in the app.ld file.
> $LD -o app.bin -T app.ld stdio.o main.o
Disassemble this binary code to obtain a readable version in the file app.bin.txt.
> $DU -D app.bin > app.bin.txt
Question E8 By analysing the contents of the app.bin.txt file, determine the effective length of the segment_code.
Question E9 Write a makefile that automates all the steps in generating the two files sys.bin and app.bin.
F. Running the binary code on the virtual prototype
The boot code used here (contained in the reset.s file) is very simple. In a real machine the system binary code is stored on the disk, so the boot code must load the operating system code into memory, using a disk controller, which can take several million cycles. This is why this boot code is called the boot-loader. As we are in virtual prototyping (i.e. simulation), the loader described in section C above is used, and all the binary code contained in the sys.bin and app.bin files is pre-loaded into the two memory components rom and ram before starting the simulation. This shortcut greatly simplifies the boot code...
Start the execution of the binary code on the virtual prototype defined by the simul.x file, by placing yourself in the tp2 directory, and by launching the command :
$ ./simul.x
To understand what is going on, re-run the simulation, plotting the bus signal values and the internal states of the hardware components, and saving the trace to a tmp file. Use the -DEBUG argument to activate the trace, and the -NCYCLES argument to limit the number of cycles simulated (and thus the size of the file generated).
$ ./simul.x -DEBUG 0 -NCYCLES 5000 > tmp
Question F1 By analysing the execution trace, say what the first transaction on the bus is? At what cycle does the processor execute the first instruction of the boot code? What does the second transaction on the bus correspond to?
Question F2 At which cycle does the first instruction of the function main() execute?
Question F3 At what cycle does the first transaction corresponding to the reading of the string hello world begin?
Question F4 At which cycle does the first writing of a character to the TTY terminal occur?
G. Report
The answers to the above questions must be written in a text editor and this report must be handed in at the beginning of the next practical session. Similarly, the simulator will be checked (in pairs) at the beginning of the next week's practical session.