TP1: Virtual Prototyping & PIBUS Protocol
(pirouz.bazargan-sabet@…)
A. Objectives
The aim of this first lab is twofold:
- We introduce hardware modelling techniques in the SystemC language (virtual prototyping).
- We analyse in detail the PIBUS protocol, on a very small system, not containing a programmable processor.
The SoCLib prototyping platform is used to model and simulate integrated on-chip hardware architectures "to the cycle". The SoCLib platform contains simulation models of hardware components (such as processor cores, bus controllers, embedded memory controllers, or peripheral controllers). These models are written in the SystemC language and can be interconnected to build a "virtual prototype" of the hardware architecture.
To understand how the PIBUS works, you will analyse the structure of the automata that control each of the three types of hardware components connected to the bus: an initiator, a target, and the bus arbiter.
In addition to the bus arbiter, the architecture studied in this TME thus includes a master and two targets:
- PibusSegBcu: Bus arbiter
- PibusSimpleMaster (Master): wired automaton
- PibusSimpleRam (Target): memory controller
- PibusMultiTty (Target): screen/keyboard terminal controller
The PibusSimplemaster is only intended to facilitate the understanding of the PIBUS protocol. It is a wired automaton that can do only one thing: it executes an infinite loop in which it performs successively, at each iteration, the following 4 actions:
- Reading the string Hello World, stored in the memory. To do this, it performs a first transaction on the bus consisting of reading a burst of 4 words of 32 bits (which represents 4 * 4 = 16 characters) from the memory.
- Displaying this string, character by character, on the TTY terminal. To do this, it must perform 16 write "transactions" (one transaction per character to be displayed), writing each character to the address corresponding to the DISPLAY register of the TTY terminal.
- Waiting for a character to be entered on the terminal keyboard. To do this, the processor runs in a loop where it performs a read "transaction" of the TTY terminal's STATUS register, until it obtains a non-zero value meaning that a character is available in the KEYBUF register of the TTY terminal.
- The last step is to perform a read "transaction" to read the typed character from the KEYBUF register of the TTY terminal.
The TTY hardware component has a total of 4 addressable 32-bit registers, and therefore occupies 16 bytes in addressable space. If SEG_TTY_BASE is the base address of the segment associated with the TTY component, the addresses of these registers are as follows:
DISPLAY | SEG_TTY_BASE |
STATUS | SEG_TTY_BASE+4 |
KEYBUF | SEG_TTY_BASE+8 |
CONFIG | SEG_TTY_BASE+12 |
Regarding the PIBUS protocol, we recall that, from the master's point of view, a simple transaction (i.e. the transfer of a single 32-bit word) requires at least three cycles:
- allocation cycle: the master requests bus allocation
- command cycle: the master sends the command containing the address.
- response cycle: the target sends the response and the data is transmitted.
In the case of "burst" type transactions, a pipeline technique is used to transfer the address (i+1) in the same cycle and on two separate layers of wires, at the same time as the data (i) is transferred.
In any shared memory system, the address has two parts:
- The most significant bits (MSB) are used to designate a particular target.
- The low order bits (LSB) designate a particular byte in the target.
The PibusSegBcu hardware component must parse the high-order bits of the address to designate the target involved in the transaction. The segment table is described in the file that describes the hardware architecture. architecture. This segment table allows the system designer to define the organisation (i.e. the division into segments) of the physical addressable space: The designer can associate each target instantiated in the system with one (or more) segment(s) of the addressable space. Targets can be either memory banks (component PibusSimpleRam) or addressable devices (component PibusMultiTty). The designer must also define the number of high-order bits of the address that will be decoded by the BCU to select a particular target. The "decoding ROM", contained in the BCU component, is a decoder that takes as input the N most significant bits of the address and provides as output the index of the selected target (encoded in "one hot"). This ROM is built (i.e. initialized) by the constructor of the PibusSegBcu component, from the information entered in the table of segments.
B. Getting started
The archive multi_tp1.tgz contains different files that you will need for this first practical. Create in your account a directory tp1, and unpack the archive in this directory. You should get the 2 files tp1_top.cpp and tp1.desc describing the hardware architecture, as well as the string_file containing the string.
The files containing the simulation models of the hardware components used in the different hardware architectures that will be studied in this U.E. are stored in the directory:
/users/outil/soc/soclib-lip6/pibus
You will not need to modify these files, and it is therefore not necessary to copy them on your account.
C. Automate the PibusSimpleRam component
The hardware component PibusSimpleRam has the following interface:
- 4 input ports: SEL (target selection), READ (direction of exchange on PIBUS), A (PIBUS address), and OPC (not used)
- 1 output port: ACK (PIBUS acknowledgement)
- 1 bi-directional port DT (data transferred to PIBUS)
In this practical and the subsequent ones, we consider synchronous systems: all hardware components are synchronised by the same clock signal, and the processor cycle time is equal to the bus cycle time, which is itself equal to the memory controller cycle time.
This assumption is not realistic for the memory: depending on the operating frequency chosen for the processor, the memory controller does not necessarily respond in one cycle: it is sometimes necessary to wait several cycles between receiving a read command and obtaining the data. To reproduce this behaviour, it is possible to vary the latency of the memory controller by introducing a fixed number of wait cycles. This number L of wait cycles is a hardware parameter of the memory controller model, which means that it cannot be changed by software during the simulation. In the memory controller simulation model, a cycle counter (COUNT) is initialized in the IDLE state to the value L. If the value of the parameter L is non-zero, the component enters a wait state when it receives a read or write command. The counter is decremented with each cycle, and the transaction response is not actually sent until the counter reaches zero.
The Moore automaton that controls this component has 6 states. This automaton is part of the PibusSimpleRam component, and has its own interface described below: it has itself 5 Boolean signals in input: the SEL and READ signals of the Pibus, the GO signal (which is worth 1 when the latency counter reaches the value 0), the DELAY signal (which is worth 1 when the value of the L parameter is not zero), and ADR_OK (which is worth 1 when the address belongs to the segment associated to the RAM).
It controls 4 output signals: The ACK_EN signal is a Boolean that allows the response to be sent on the PIBUS ACK bus. The ACK_VALUE signal defines the value of the response and can take 3 values: WAIT, READY, ERROR. The MEM_CMD signal is the command to the memory, and can take 3 values: NOP (no read or write), READ, WRITE. The DT_EN signal is a Boolean which authorises writing to the DT bus of the PIBUS. This automaton behaves like a Moore automaton: the values of the output signals do not depend on the values of the input signals.
Question C1: Complete the graph representing the 6-state automaton that controls the hardware component PibusSimpleRam. You must specify the transition function by attaching to each transition a Boolean expression depending on the input signals of the automaton... without forgetting to check the completeness and orthogonality conditions.
Question C2: Specify the generating function of this automaton (i.e. the signals defining the response on the bus), by filling in the table below.
ACK_EN | ACK_VALUE | DT_EN | MEM_CMD | |
IDLE | ||||
R_WAIT | ||||
R_OK | ||||
W_WAIT | ||||
W_OK | ||||
ERROR |
Attention: We remind you that the ACK and DT signals of the PIBUS are "bussed" signals (i.e. multi-transmitter). Consequently, the PIBUS_MULTI_RAM component only transmits a value on these output signals in the states where it has the right to do so.
To check your results, you can compare the automaton you have defined with the one encoded in the files:
/users/outil/soc/soclib-lip6/pibus/pibus_simple_ram/source/include/pibus_simple_ram.h /users/outil/soc/soclib-lip6/pibus/pibus_simple_ram/source/src/pibus_simple_ram.cpp
D. Automaton of the PibusSimpleMaster component
The hardware component PibusSimpleMaster has the following interface:
- 2 input ports: GNT, ACK (PIBUS acknowledgement).
- 5 output ports: REQ, A (PIBUS address), READ, LOCK, OPC (PIBUS command)
- 1 bi-directional port: DT (data transferred to PIBUS)
In addition to the automaton status register (FSM_STATE), this component stores the 16 bytes read from memory in a small local memory (BUF) with a capacity of 4 words of 32 bits, and has a small internal counter to count the number of characters that have been sent to the TTY.
The Moore's automaton that controls this component has 4 Boolean signals as input: GNT (the bus is allocated), READY (signal indicating that the ACK bus of the PIBUS has the value READY), LAST (signal coming from an auxiliary counter, indicating that it is the last character of the string), NUL (signal coming from an auxiliary decoder, indicating that the data read on the DT bus of the PIBUS has the value 0).
Since this component is a wired automaton (non-programmable), there can be no software addressing errors, and it is assumed that the targets never return the value ERROR on the ACK bus, which simplifies the automaton.
It controls 6 output signals: The REQ signal is the Boolean signal sent to the BCU to request bus allocation, the CMD_EN signal is a Boolean that allows transmission on the control bus. ADR_VALUE defines the address value (there are 7 possible values: RAM_BASE, RAM_BASE+4, RAM_BASE+8, RAM_BASE+12, TTY_BASE, TTY_BASE+4, TTY_BASE+8). The READ_VALUE and LOCK_VALUE signals indicate respectively the direction of the exchange on the bus and the fact that the command issued is not the last in a burst. Finally, the DT_EN signal is a Boolean which authorises writing to the DT bus of the PIBUS.
Question D1: Complete the graph representing the automaton below, by attaching to each transition a Boolean expression depending on the 4 signals GNT, READY, LAST , and NUL .
Question D2: Specify the generating function, by filling in the table defining, for each of the states of the automaton, the values of the output signals controlled by this automaton.
REQ | CMD_EN | ADR_VALUE | READ_VALUE | LOCK_VALUE | DT_EN | |
INIT | ||||||
RAM_REQ | ||||||
RAM_A0 | ||||||
RAM_A1_D0 | ||||||
RAM_A2_D1 | ||||||
RAM_A3_D2 | ||||||
RAM_D3 | ||||||
W_REQ | ||||||
W_AD | ||||||
W_DT | ||||||
STS_REQ | ||||||
STS_AD | ||||||
STS_DT | ||||||
BUF_REQ | ||||||
BUF_AD | ||||||
BUF_DT |
We recall that this is a Moore automaton.
To check your results, you can compare the automaton you have defined with the one encoded in the files:
/users/outil/soc/soclib-lip6/pibus/pibus_simple_master/source/include/pibus_simple_master.h /users/outil/soc/soclib-lip6/pibus/pibus_simple_master/source/src/pibus_simple_master.cpp
E. Automaton of the PibusSegBcu component
In the case where there is only one master and two targets, the PibusSegBcu bus controller has the following interface:
- 4 input signals: REQ, LOCK, ACK, A
- 3 output signals GNT, SEL0, SEL1
Recall that the two main functions of the bus arbiter are:
- allocate the bus to a master (with a rotating priority when there are several simultaneous requests).
- select the target designated by the high-order bits of the address issued by the master which has obtained access to the bus.
Caution: The BCU component is not involved in the processing of addressing errors. it does not differentiate between an ACK == READY response (signaling a success) or ACK == ERROR (signaling a failure). It only needs to know that the selected target has provided a response (i.e. ACK != WAIT), to detect the end of the current transaction.
This component is implemented as a Mealy automaton: the values of the three output signals depend directly (i.e. combinatorially) on the input signals.
The 4 states of the automaton correspond to the different possible states of the pipeline, and have the following meaning:
- IDLE: no transaction is in progress, and the arbiter is waiting for at least one master to request to use the bus
- AD: The bus has been allocated to a master, and this is the first command
- DTAD: The bus has been allocated to a master, and we are in the middle of a burst transaction: CMD(i) / RSP(i-1)
- DT: the bus has been allocated to a master, and this is the response to the last command
Question E1: Complete the graph representing the 4-state automaton of the hardware component PibusSegBcu below, attaching to each transition a Boolean expression depending on the input signals REQ, ACK, and LOCK.
Question E2: Specify the generating function, filling in the table defining the values of the output signals GNT, SEL0 and SEL1 for each of the 4 states of the automaton.
GNT | SEL0 | SEL1 | |
IDLE | |||
AD | |||
DTAD | |||
DT |
Attention: The BCU automaton behaves like a Mealy automaton, and the values contained in this table may depend on the values of the input signals REQ, ACK & LOCK, as well as on the A address.
Question E3: Explain why allocation (choosing a master) is performed not only in the IDLE state, but also in the DT state.
To check your results, you can compare the automaton you have defined with the one coded in the files:
/users/outil/soc/soclib-lip6/pibus/pibus_seg_bcu/source/include/pibus_seg_bcu.h /users/outil/soc/soclib-lip6/pibus/pibus_seg_bcu/source/src/pibus_seg_bcu.cpp
F. Hardware architecture modelling
The file tp1_top.cpp contains a (voluntarily incomplete) description of the architecture of the system: a frame is provided which contains only the component PibusBcu and the component PibusMultiTty. The description of the architecture is broken down into 5 parts:
- declaration of the signals allowing to interconnect the components
- definition of the segment table (see below)
- definition and parameterisation of the components (the constructor of each instantiated component is executed)
- definition of the net-list
- launch of the simulation
Question F1: Complete the file tp1_top.cpp by instantiating and connecting the 2 missing hardware components: PibusSimpleMaster and PibusSimpleRam. You have to open the files pibus_simple_ram.h and pibus_simple_master.h to understand which arguments have to be defined when instantiating these two components (the instantiation is done by calling the constructor of the component).
The segmentation of the addressable space is also defined in the tp1_top.cpp file. The segment table is an associative table, where a segment is defined by 5 characteristics:
- a segment name
- a base address
- a size (in bytes)
- the index of the target to which it is associated
- a Boolean defining whether the segment is cacheable
This non-programmable architecture (everything is wired) uses only two segments: a segment associated with the RAM, and a segment associated with the TTY terminal.
Question F2: Complete the file tp1_top.cpp, by defining the segment associated with the TTY controller (The segment associated with the RAM is already defined). Mnemonics are used to define the base and length of the segments. These same mnemonics are used as arguments to the PibusSimpleMaster component constructor.
Question F3: Open the file pibus_simple_ram to determine how the character string "Hello World!" is initialized in memory, at the beginning of the seg_ram segment associated with the memory.
G. Simulation
In this last part, we will - finally - launch the simulation.
Compile the file tp1_top.cpp to create the simulation executable simul.x, using the version SystemC 2.1. For this, we use the tool soclib-cc. You need to add the path to soclib-cc in your path by running the script :
source /users/outil/soc/env_soclib.sh
Then run the command:
$ soclib-cc -p tp1.desc -t systemcass -o simul.x
Run the simulation for 1000000 cycles using the command:
$ ./simul.x -NCYCLES 10000000
Question G1: What is the simulation speed (measured in number of cycles simulated per second)? To answer this question, the easiest way is to use your watch.
To understand what is going on, re-execute the simulation by tracing the values of the bus signals, as well as the internal states of the automata of the three components PibusSimpleMaster, PibusSimpleRam and PibusBcu. To do this, the -TRACE (to activate the trace mode) and -NCYCLES (to limit the number of simulated cycles) options must be used, and the trace must be redirected to a trace file:
$ ./simul.x -DEBUG 0 -NCYCLES 10000 > trace
Analyse this trace file to answer the following questions:
Question G2: How many wait cycles are there in the automaton states of the master component where the master component asks the BCU for bus allocation? Explain this behaviour.
Question G3: How many wait cycles are there in the automaton states of the master component or it is waiting for the RAM response? Explain this behaviour.
Question G4: How many cycles does the master component automaton need to display a character on the PIBUS_MULTI_TTY component?
Question G5: Draw the chronogram corresponding to the first 20 execution cycles, representing : the internal states of the 4 PLCs of the 4 components, as well as the Pibus signals: REQ, GNT, SEL_RAM, SEL_TTY, A, LOCK, READ, D, ACK.
H. Report
The answers to the above questions must be written in a text editor and this report must be handed in at the beginning of the next practical session. Similarly, the simulator will be checked (by pairs) at the beginning of the next week's practical session.