DMA device API
A) General principles
This device allows the kernel to accelerate memory copies from a remote cluster source to another remote cluster destination, when the architecture contains dedicated DMA hardware accelerators. It can be multi-channel devices, supporting several parallel transfers, and these devices can be internal devices, replicated in all clusters. It can exist one chdev descriptor per cluster and per channel.
The "kernel" API contains two, synchronous and asynchronous, operation types, detailed in section C below.
The asynchronous operation is not directly executed by the client thread. The requests are registered in the waiting queue rooted in the DMA chdev descriptor. These requests are actually handled by a dedicated server thread running in the cluster containing the DMA chdev descriptor. This server thread calls the blocking ioc_driver_cmd() function for each registered request. The driver is supposed to deschedule the server thread after launching the DMA transfer, to wait the transfer completion.
The synchronous operations does not use the waiting queue, and does not use the server thread. The client thread calls itself the dma_driver_cmd() blocking function. The driver is supposed to use a polling strategy to wait the DMA transfer completion, without using the DMA_IRQ.
To access the various drivers, the DMA device defines a lower-level "driver" API, that is detailed in section D below.
All DMA device structures and access functions are defined in the dev_dma.c et dev_dma.h files.
B) Initialisation
The dev_dma_init( chdev_t * chdev ) function makes the following initializations :
- it initialises the DMA specific fields of the chdev descriptor.
- it initialises the implementation specific DMA hardware device,
- it initializes the specific software data structures required by the hardware implementation.
- it links the DMA_IRQ to the core executing the server thread.
- It disable the DMA_IRQ, because most operations are supposed to be synchronous.
It must be called by a local thread.
C) The "kernel" API
Both the synchronous and the asynchronous operations are blocking and return only when the transfer is completed, but the blocking policy depends on the operation type. They have the same arguments as the hal_remote_memcpy() function:
- The dev_mma_sync_memcpy( xptr_t dst_xp , xptr_t src_xp , uint32_t nbytes ) blocking function moves synchronously <nbytes> from a remote source buffer identified by the <src_xp> argument, to another remote destination buffer identified by the <dst_xp> argument. It does not use a server thread an the DMA waiting queue, and the driver is supposed use a polling strategy on the DMA status register to wait the transfer completion.
- The dev_mma_async_memcpy( xptr_t dst_xp , xptr_t src_xp , uint32_t nbytes ) blocking function moves asynchronously <nbytes> from a remote source buffer identified by the <src_xp> argument, to another remote destination buffer identified by the <dst_xp> argument. It register in the DMA waitingg queue, and uses a descheduling policy for both the client and the server thread to wait the transfer completion signaled by the DMA_IRQ.
D) The "driver" API
All DMA drivers must define three functions :
- void dma_driver_init( chdev_t *chdev )
- void dma_driver_cmd( xptr_t thread_xp )
- void dma_driver_isr( chdev_t * chdev )
The dma_driver_cmd() function arguments are actually defined in the dma_command_t structure embedded in the client thread descriptor. One command contains four informations:
- sync : operation type (synchronous if true / asynchronous if false)
- size : number of bytes to be moved.
- src_xp : extended pointer on source buffer.
- dst_xp : extended pointer on destination buffer.
For an asynchronous transfer this function must dynamically enable the DMA_IRQ before launching the transfer, and disable the DMA_IRQ when the transfer completed. The dma_diver_isr() function is only used for synchronous transfers. It acknowledges the DMA_IRQ, reports the transfert status into the dma_command_t structure, and reactivates the server thread.