wiki:rpc_implementation

Version 42 (modified by alain, 5 years ago) (diff)

--

Remote Procedure Call

To enforce locality when a single complex operation requires a large number memory accesses to one single remote cluster, ALMOS-MKH defines RPCs (Remote Procedure Call), respecting the client/server model. This section describe the RPC mechanism implemented by ALMOS-MKH. The corresponding code is defined in the rpc.c and rpc.h files. The software FIFO implementing the client/server communication channel is defined in the remote_fifo.c and remote_fifo.h files.

1) Hardware platform assumptions

The target architecture is clusterised: the physical address space is shared by all cores, but it is physically distributed, with one physical memory bank per cluster, and the following assumptions:

  • The physical addresses - also called extended adresses - are 64 bits encoded.
  • The max size of the physical address space in a single cluster is defined by the CONFIG_CLUSTER_SPAN configuration parameter, that must be a power of 2. It is 4 Gbytes for the TSAR architecture, but it can be larger for Intel based architectures.
  • For a given architecture, the physical address is therefore split in two fixed size fields : The LPA field (Local Physical Adress) contains the LSB bits, and defines the physical address inside a given cluster. The CXY field (Cluster Identifier Index) contains the MSB bits, and directly identifies the cluster.
  • Each cluster can contain several cores (including 0), several peripherals, and a physical memory bank of any size (including 0 bytes). This is defined in the arch_info file.
  • There is one kernel instance in each cluster containing at least one core, one local interrupt controler, and one physical memory bank.

2) Inter-cluster communication

ALMOS-MKH replicates the KDATA segment (containing the kernel global variables) in all clusters, and uses the same LPA (Local Physical Address) for the KDATA base in all clusters. Therefore, in two different clusters, a given global variable, identified by its LPA can have different values. This feature is used by by ALMOS-MKH to allow a client thread in cluster K to access a global variable in a server cluster K', building a physical address by concatenation of the LPA with the CXY cluster identifier for the server cluster K'.

Any client thread T running in any cluster K can send an RPC request to any cluster K'. Each core in server cluster K' has a private RPC requests queue, where the client thread must register its RPC request. In order to share the working load associated with RPC handling, the client thread T running on the client core [i] select the waiting queue of core [i] in server cluster K'. If it is not possible (when the number of cores in cluster K' is smaller than the number of cores in client cluster), ALMOS-MKH selects core [0] in server cluster.

For each core [i] in a cluster K, ALMOS-MKH implement the RPC requests queue as a software RPC_FIFO[i,k], implemented as a global variable in the KDATA segment. More precisely, each RPC_FIFO[i] has the type remote_fifo_t, and is a member of the "cluster_manager" structure of cluster [k].

This RPC_FIFO has been designed to support a large number (N) of concurrent writers, an a small number (M) of readers:

  • N is the number of client threads (practically unbounded). A client thread can execute in any cluster, and can send a RPC request to any target cluster K. To synchronize these multiple client threads, each RPC_FIFO[i,k] implements a ticket based policy, defining a first arrived / first served priority to register a new request into a given RPC_FIFO[i,k].
  • M is the number of server threads in charge of handling RPC requests stored in a given RPC_FIFO[i,k]. M is bounded by the CONFIG_RPC_THREAD_MAX parameter. For each PRC_FIFO[i,k], it can exist several server threads, because we must avoid the head-of-line blocking phenomenon, as explained below in section 6. To synchronize these multiple server threads, the RPC FIFO implements a light lock, that is a non blocking lock : only one RPC thread at a given time can take the lock and become the FIFO owner. Another RPC thread T' failing to take the lock simply returns to IDLE state.

3) RPC descriptor format

ALMOS-MKH supports two modes for an RPC request :

  • simple RPC : the client thread send the RPC request to one single server, and is expecting one single response.
  • parallel RPC : the client thread send in parallel several RPC requests to several servers, and is expecting several responses.

Both RPC types use the same RPC descriptor format. One entry in the RPC_FIFO (located on the server side) contains a remote pointer (xptr_t) on the RPC descriptor (rpc_desc_t), that is stored on the client side. This RPC descriptor contains the following informations:

  • The index field defines the required service type (ALMOS-MKH defines about 30 service types).
  • The blocked field defines the RPC mode : true for a simple RPC, false for a parallel RPC.
  • The args field is an array of 10 uint64_t, containing the service arguments (both input & output).
  • The thread field is a local pointer on the client thread (used by the server thread to unblock the client thread).
  • The lid field defines the client core local index (used by the server thread to send the completion IPI).
  • The rsp field is a local pointer on an expected responses counter on the client side (can be larger than one for a parallel RPC).

The semantic of the args array depends on the service type, as defined by the index field.

This format supports both simple and parallel RPCs: The client thread initializes the responses counter with the number of expected responses, and each server thread atomically decrement this counter when the RPC request has been satisfied.

4) Simple RPC scenario

Simple RPC requests are blocking for the client thread. The client thread must perform the following tasks:

  1. allocate memory in the client cluster for the RPC descriptor (can be in the client stack),
  2. allocate in the client cluster an expected response counter (can be in the client stack),
  3. initialize this RPC descriptor (this includes RPC arguments marshaling), as well as the responses counter (one expected response),
  4. register the extended pointer on the RPC descriptor in the server FIFO,
  5. send an IPI to the selected core in the server cluster,
  6. blocks and deschedule, waiting to be re-activated by the server thread when the server completed the requested service.

For each RPC service type XYZ, ALMOS-MKH define a specific rpc_xyz_client() function that performs the 3 first tasks, and call the generic rpc_send() function to perform the three last tasks.

On the server side, a kernel RPC thread is activated at the next scheduling point on the selected server core, as soon as the RPC_FIFO is non-empty. This server thread executes the following tasks:

  1. extract relevant informations from RPC descriptor stored in client cluster,
  2. depending on the RPC index, call the specific rpc_xyz_server() function to perform the RPC arguments unmarshmaling,
  3. call the relevant kernel function to execute the requested service,
  4. atomically increment the responses counter in the client cluster,
  5. if this response is the last expected response, unblocks the client thread, and send an IPI to the client core.

In order to reduce latency, ALMOS-MKH use IPIs (Inter-Processor Interrupts). The client thread select a core in the server cluster, and send an IPI to the selected server core. An IPI forces the target core to make a scheduling. This reduces the RPC latency when no RPC thread is active for the server core, because the RPC threads are kernel threads that have the highest scheduling priority. When no RPC thread is active for this core, the selected core will activate (or create) a new RPC thread and execute it. When an RPC thread is already active, the IPI forces a scheduling point on the target core, but no new RPC thread is activated (or created).

5) Parallel RPC scenario

All RPC services defined by ALMOS-MKH can be used in simple or parallel mode. Only the behavior of the client has to be modified for a parallel RPC : To send parallel RPC requests to several servers, the client thread doe not block until the last request has been registered in the last server FIFO. Therefore, to request a RPC service XYZ in parallel mode to N servers, the client function does NOT use the rpc_xyz_client() function, and must implement the following client scenario:

  1. allocate itself an array of RPC descriptors rpc[N] in client cluster (one per target server),
  2. allocate itself a shared responses counter in client cluster (can be in client stack),
  3. initialize itself the N RPC descriptors, as well as the responses counter (N expected responses),
  4. for all servers, register an extended pointer on rpc[i] in the server[i] FIFO,
  5. for all servers, send an IPI to the selected core in the server[i] cluster,
  6. blocks and deschedule, waiting to be re-activated by the last server thread when it completed the requested service.

The tasks 4 and 5 can be done, for each target server, by the generic rpc_send() function.

When the RPC "in" arguments values are equal for all target clusters, and there is no out arguments, it is possible to save memory on the client side, and use an unique, shared, RPC descriptor.

6) Pool of RPC servers

In order to avoid deadlocks, for each core, ALMOS-MKH defines a private pool of RPC threads associated to one single RPC_FIFO[i,k]. If a given RPC thread extracted request[i] from the FIFO, but is blocked, waiting for a shared resource, the next request[i+i] in the FIFO can be extracted and handled by another RPC thread. In that case, the blocked RPC thread T releases the FIFO ownership before blocking and descheduling. This RPC thread T will complete the current RPC request when the blocking condition is solved, and the thread T is rescheduled. At any time, only one RPC thread has the FIFO ownership and can consume RPC requests from the FIFO.

The RPC threads are dynamically created - on demand - by the scheduler: When the RPC FIFO is not empty, an "idle" RPC thread is scheduled to handle the pending RPC requests. If all existing RPC threads are blocked, a new RPC thread is dynamically created by the scheduler (in the shed_rpc_activate() function).

Therefore, it can exist for each RPC_FIFO[i,k] a variable number M of RPC threads: the running one is the FIFO owner, and the (M-1) others are blocked on a wait condition. This number M can temporarily exceed the CONFIG_RPC_THREAD_MAX value, but the exceeding server threads are destroyed when the temporary overload is solved, by the rpc_thread_func() function itself.

7) How to define a new RPC

To introduce a new RPC service rpc_new_service in ALMOS-MKH, you need to modify the rpc.c and rpc.h files:

  • You must identify (or possibly implement if it does not exist) the kernel function kernel_service() that you want to execute remotely. This function should not use more than 10 arguments.
  • You must register the new RPC in the enum rpc_index_t (in rpc.h file), and in the rpc_server [ ] array (in the rpc.c file).
  • You must implement the marshaling function rpc_kernel_service_client(), that is executed on the client side by the client thread, in the rpc.c and rpc.h files. This blocking function (1) register the input arguments in the RPC descriptor, (2) register the RPC request in the target cluster RPC_FIFO, (3) extract the output arguments from the RPC descriptor.
  • You must implement the marshaling function rpc_kernel_service_server(), that is executed on the server side by the RPC thread, in the rpc.c and rpc.h files. This function (1) extract the input arguments from the RPC descriptor, (2) call the kernel_service() function, (3) register the outputs arguments in the RPC descriptor.