wiki:EnMultiCourseTP6

Course "Architecture of Multi-Processor Systems"

TP6 : Vectorized interrupts / communication with peripherals

(franck.wajsburt@…)

A. Objectives

The goal of this lab is to analyze the mechanisms of communication by interrupts between the peripherals and the operating system. In the first part of this tutorial, we illustrate on a bi-processor architecture the mechanism of vectored interrupts by using a programmable timer, capable of generating periodic interrupts. In a second part, we analyse in detail the mechanism allowing a program to read caches from a TTY terminal.

A device is said to be "memory mapped" when it has registers that can be addressed by the software (by means of "lw" or "sw" type read or write instructions).

  • The registers accessible in writing allow the operating system to configure the peripherals, or to send them commands.
  • Readable registers allow the operating system to obtain information about the status of the device.

To communicate with the operating system, devices use interrupts: an interrupt, or IRQ (Interrupt ReQuest) is an active Boolean signal in high state, which allows a device to "steal" a few cycles from a processor to execute an ISR (Interrupt Service Routine). These ISRs usually have the role of writing to memory buffers owned by the operating system. These communication buffers are specific for each type of device.

The hardware architecture is identical to the hardware architecture defined for TP5, but only two processors will be used.

B. Hardware architecture

The archive multi_tp6.tgz contains the files you will need. Create a working directory tp6, and copy the files tp5_top.cpp, and tp5.desc that you have already used into this directory. Unzip the archive and copy into tp6 the soft directory containing the two software applications used in this tutorial: files main_prime.c and main_pgcd.c.

C. Peripheral components

The hardware component PibusIcu, is a vectorised interrupt hub. This component is often called a PIC (Programmable Interrupt Controller) in PCs. The interrupt hub used here can concentrate up to 32 IRQ_IN[I] input interrupt lines (from different devices), to a single IRQ_OUT output interrupt line (connected to a processor). It contains a large wired "OR": only one IRQ_IN[i] input needs to be active (high) and not masked, for IRQ_OUT to go high.

The ICU component used contains a priority encoder that implements a fixed priority mechanism: if several IRQ_IN[i] incoming interrupt lines are active simultaneously, the addressable register IT_VECTOR contains the index of the active interrupt that has the smallest index. Finally, this component allows the software to mask each of the 32 incoming interrupt lines individually, by writing a 32-bit word to the IT_MASK register of the PibusIcu component.

The PibusIcu? component is a multi-channel device: when there are several P[k] processors, the interrupt controller has as many IRQ_OUT[k] outputs as there are P[k] processors. Each channel [k] corresponds to an IRQ_OUT[k] output, and behaves as an independent interrupt hub. The only thing shared by the different channels are the 32 incoming IRQ_IN[i] signals. If there are multiple processors, the PibusIcu component contains a specific mask register for each channel, which allows the operating system to decide, for each IRQ_IN[i] interrupt, to which processor it will be transmitted.

The hardware component PibusMultiTimer is also a multi-channel device. It contains several programmable timers. Each timer has the function of generating periodic, software-programmable interrupts. Each timer has its own interrupt line. The code executed in case of an interrupt generated by the timer is defined by the interrupt handling routine "_isr_timer" (ISR stands for Interrupt Service Routine).

The hardware component PibusMultiTty has already been used in previous tutorials. It controls several independent TTY terminals. Each terminal has an interrupt line which allows it to signal that a character has been entered on the keyboard. This interrupt can be used by the system when it is not desired to use a polling mechanism to acquire characters from the keyboard. The code that is executed when an interrupt is generated by the TTY is defined by the interrupt handling routine _isr_tty_get.

Read the functional specification of the components PibusMultiTty, PibusMultiTimer and PibusIcu found in the header of the files pibus_multi_tty.h, pibus_multi_timer.h and pibus_icu.h.

Question C1: Why is the PibusMultiTimer component a target, not a master on the bus? What is the meaning of the ntimer argument of the constructor? What are the addressable registers of this component, what are their addresses, and what is the functionality of each of them?

Question C2: Why is the PibusIcu component a target on the bus? What is the meaning of the constructor argument nirq? What is the meaning of the constructor argument nproc? In a multi-processor architecture, how can the software route the interrupt line connected to the IRQ_IN[i] input of the ICU component to the processor connected to the IRQ_OUT[j] output of the ICU component? For each IRQ_OUT[i] output port, the ICU contains several addressable registers. What are these registers? What are their addresses? What is the functionality of each of them?

Question C3: Why must the base address of the segment associated with the PibusIcu component be aligned to a multiple of 32 * 8 bytes? What would be the hardware cost of relaxing this constraint?

The generic architecture defined in the tp5_top.cpp file can instantiate from 1 to 8 processors. When there are several processors, each processor has its own TTY terminal and timer. For an architecture containing 2 processors, there will be 2 IRQs from the TTY controller, and 2 IRQs from the timer controller.

Question C4: By analysing the contents of the file tp5_top.cpp, specify how these 4 interrupt lines are connected to the IRQ_IN[i] ports of the ICU controller.

We will use 4-way associative caches with a capacity of 4 Kbytes and cache lines of 32 bytes (8 words). Modify in the file tp5_top.cpp the default values of the parameters of the caches, and generate the executable of simulation simul.x.

D. Launching tasks

In the previous lab, all processors were running the same program, on different data. In this lab, the two processors will run different programs:

  • The file main_prime.c contains a first program that calculates the first 1000 primes and displays them on the TTY terminal. This program will be executed by processor 0.
  • The file main_pgcd.c contains an interactive program that calculates the GCD (Greatest Common Divisor) of two integers X and Y entered from the keyboard (as decimal strings), and displays the result on the TTY screen. This program will be executed by processor 1.

On a machine with a real operating system such as LINUX or WINDOWS, new applications can be launched by the OS without the need to reboot the machine. All the user has to do is issue a command to the system via a shell. Since the GIET does not support dynamic task creation, the boot code must take care of running the programs on both processors: both processors run the same boot code (stored at address 0xBFC00000), but they connect to different addresses depending on the processor number. The convention imposed by the GIET is as follows: The seg_data segment must start with a jump table, indexed by the processor number, containing the addresses of the entry points of the different programs that are to be executed by the different processors: The main[i] entry of this table contains the entry point (i.e. the address of the first instruction) of the program that will be executed by processor (i).

Question D1: Go to the soft directory, and complete the reset.s file, so as to initialize the stack pointer, the SR register, and the EPC register of the processor (1). You should obviously follow the example of what is done for the processor (0).

Generate the two files app.bin and sys.bin using the Makefile provided.

Question D2: Check in the file app.bin.txt that the segment seg_data starts with a table containing the addresses of the two functions main_prime() and main_pgcd(). What are these two addresses?

Question D3 How do we force GCC to build this jump table at the beginning of the seg_data segment?

Go to the tp6 directory, and run these two programs on a hardware architecture containing two processors using the command:

> ./simul.x -NPROCS 2

Question D4: How do you explain the fact that the PGCD calculation program gets stuck on the input of the operand X?

E. Enabling the Timer

We now want to activate the interrupts coming from the TIMER.

Remember that each interrupt line has an associated interrupt routine (ISR or Interrupt Service Routine) which is specific to the device that generated the interrupt, and which is executed by the processor when the interrupts are not masked. Enabling the TIMER interrupts is therefore equivalent to launching a "background task" on each of the 2 processors of the architecture, consisting in periodically executing the ISR "_isr_timer".

Question E1: Recall how a processor connects to the relevant ISR routine when it receives an interrupt request. Analyse the code contained in the files giet.s and irq_handler.c, and describe the sequence of function calls between the branch to address 0x80000180 (entry point into the GIET) and the branch to the _isr_timer routine.

Question E2: What does the _isr_timer interrupt routine do?

Recall that the interrupt vector is an array in memory containing the addresses of the interrupt routines associated with the different interrupt lines (IRQ) used in the architecture. This array is indexed by the IRQ number. To activate the interrupts, three additional initialisations must be made for each processor (i) in the boot code:

  • Processor (i) must initialise all entries in the 'interrupt vector' that are intended for it.
  • Processor (i) must initialise the MASK[i] register in the ICU component indicating which incoming interrupts (IRQ_IN[32]) are to be "routed" to processor (i).
  • The processor (i) must initialise the TIMER device, by writing to the PERIOD[i] register, then to the RUNNING[i] register.

Question E3: Complete the reset.s file, to initialise the interrupt vector with the 2 entries corresponding to the 2 IRQs associated with the TIMER component, using the results of question C4. The name of the array representing the interrupt vector and the name of the ISR associated with the TIMER are defined in the file irq_handler.c.

Question E4 Complete the reset.s file to configure the TIMER component. A period of 50000 cycles for TIMER[0] and 100000 cycles for TIMER[1] will be chosen. The register map of the TIMER component is defined in the header of the file pibus_multi_timer.h.

Question E5 Complete the reset.s file to initialise the ICU component MASK[0] register to allow transmission of the IRQ from TIMER[0] to processor 0, and the ICU MASK[1] register to allow transmission of the IRQ from TIMER[1] to processor 1. The register map of the ICU component is defined in the header of the pibus_icu.h file.

Re-compile the system code.

The -TRACE option generates an execution trace that allows you to analyse the behaviour of the hardware cycle by cycle. The -NCYCLES option defines the maximum number of cycles executed. To obtain an execution trace between the from_cycle cycle and the to_cycle cycle in the trace file, you must therefore run the command :

$ ./simul.x -NPROCS 2 -TRACE from_cycle -NCYCLES to_cycle > trace

In the following questions we want to estimate the number of cycles needed to configure the devices, as well as the number of cycles needed to process an interrupt. Choose the correct values for the parameters from_cycle and to_cycle, and start the execution by activating the trace.

Question E6: At what cycle does processor 0 write the first value to the interrupt vector? To answer this question, monitor the value of the sel_ram signal. At what cycle is the MASK[0] register of the ICU set? To answer this question, monitor the value of the sel_icu signal. At what cycle is the TIMER[0] set? To answer this question, the values of the sel_tim signal must be monitored.

Question E7 At what cycle does processor 0 receive the first TIMER[0] interrupt? At what cycle is the interrupt acknowledged by the ISR? To answer these questions, the timer_irq[0] signal must be monitored.

Question E8: Describe in detail the processing of a Timer interrupt by processor 0, between the activation of the signal proc_irq[0] and the return to the execution of the interrupted program. You must analyse the execution trace, using the file sys.bin.txt to determine the address of the GIET, the address of the interrupt handler, and the address of the _isr_timer routine. You must follow in the trace the sequence of function calls between the connection to the GIET (Interrupt, Exceptions and Traps handler), the connection to the interrupt handler, the connection to the ISR, and the resumption of the interrupted program, by monitoring the address of the instruction requested by the processor. You need to accurately record the dates of these different events, calculate the number of cycles spent in each processing step, and deduce the total number of cycles that were "stolen" from the interrupted application.

To minimise cache miss waits, it is advisable to wait until the second interrupt to do this analysis: with an associative cache, one can hope that the interrupt handler code, which was loaded into the cache on the first interrupt, is still in the instruction cache on the second interrupt.

Question E9: How do you explain that the messages displayed by the ISR associated with TIMER[1] continue to be displayed on the processor(1) terminal, while the program running on processor(1) is still stuck in the tty_getw_irq() system call?

F. Enabling TTY interrupts

In this part, we will activate the interrupts coming from the TTY, to unlock the interactive program for calculating the PGCD.

The general principle of asynchronous communication between the TTY device and the operating system is recalled below.

For each TTY[i] terminal, there is a memory buffer that can store a character entered from the keyboard, even if the program that has to use this character is not ready to consume it. This _tty_get_buf[i] buffer is located in the protected memory area belonging to the operating system (addresses greater than 0x80000000). This buffer is associated with a _tty_get_full[i] status variable, used for synchronisation, and indicating whether a character is available. This variable is of type int for the C language, but it is managed as a set/reset toggle. This flip-flop is set by the producer, and reset by the consumer.

  • The producer is the TTY[i] device, which can, at any time, generate an interrupt which will itself launch the execution of the ISR code _isr_tty_get to write a character in the buffer _tty_get_buf[i], and set the state variable _tty_get_full[i]. Everything happens as if the (hardware) device had itself written these two values into memory.
  • The consumer is the operating system (software), which can, at any time, execute the code of the system function _tty_read_irq to test if a character is available, and if so, read the character from the _tty_get_buf[i] buffer, copy it to another memory buffer belonging to the user program, and reset the _tty_get_full[i] status variable to 0.

In the proposed architecture, each processor (i) has its own TTY[i] terminal, and for each TTY[I] terminal, the hardware component pibus_multi_tty has a separate interrupt line p_irq_get[i] that signals the availability of a character entered on the keyboard of the TTY[i] terminal.

Question F1: The interrupt communication mechanism is asynchronous, since the ISR writes the typed character to an intermediate buffer belonging to the operating system, rather than to the memory buffer of the user program for which the character is intended. Describe a scenario that justifies this asynchronous behaviour.

question F2: Summarize the sequence of steps by which a user program retrieves the value of a number entered from the keyboard as a decimal string. Which functions are called?

Question F3: Analyze the ISR code _isr_tty_get in the file irq_handler.c. What happens if the _tty_get_buf buffer is full when the ISR executes?

Question F4: Analyze the code of the system function _tty_read_irq() called by the system call tty_getw_irq() in the file drivers.c. What does this function do? What are its arguments? What happens if the buffer is empty? How is the number of the relevant TTY terminal calculated?

Question F5: Analyze the code of the system call tty_getw_irq(). What are the special characters that are parsed and processed by this system call? What happens if the number of decimal characters entered on the keyboard defines a number too large to be encoded on 32 bits?

Question F6: The arrays _tty_get_buf[] and _tty_get_full[] are declared in the file drivers.c. Why must these variables be declared with the volatile attribute? In which memory segment are these variables stored? Why must this segment be declared non-cacheable?

Question F7: Complete the reset.s file to initialise the two interrupt vector entries corresponding to the two IRQs from the two terminals TTY[0] and TTY[1] with the address _isr_tty_get.

Question F8: Complete the reset.s file to modify the MASK[0] and MASK[1] registers of the ICU so that the IRQ from TTY[0] is routed to processor 0, and the IRQ from TTY[1] is routed to processor 1. Take the opportunity to remove the interrupts from TIMER[1], so as to avoid interfering with the interactive PGCD calculation program. This can be done either by changing the configuration of the MASK[1] register of the ICU component or by changing the configuration of the RUNNING[1] register of the TIMER component.

Rerun the generation of the sys.bin file in the soft directory, then run the simulation, and check that the PGCD calculation program runs correctly.

G. Reporting

The answers to the above questions must be written in a text editor and this report must be handed in at the beginning of the next lab session. Similarly, the simulator will be checked (in pairs) at the beginning of the next week's practical session.

Last modified 23 months ago Last modified on Jan 4, 2023, 5:06:26 PM