wiki:nic_device_api

Version 4 (modified by alain, 5 years ago) (diff)

--

NIC device API

A) General principles

This device allows the kernel to access an external generic Gigabit Ethernet network controller. It assume that the NIC hardware peripheral has a DMA capability, and can access two packets queues in kernel memory for sent (TX) and received (RX) packets. Packets are Ethernet/IPV4.

The NIC device is handling two (infinite) streams of packets to or from the network. It is the driver responsibility to move the RX packets from the hardware NIC to the software RX queue, and to move the TX packets from the software TX queue to the hardware NIC.

AS the RX and TX queues are independant, there is one NIC-RX chdev descriptor to handle RX packets, and another NIC-TX chdev descriptor to handle TX packets. In order to improve throughput, the hardware NIC controller can (optionnally) implements multiple (N) channels. To share the load between channels, the hardware is supposed to use an hash key :

  • The RX channels are indexed by an hash key derived from the source IP address.
  • The TX channels are indexed by an hash key derived from the destination IP address.

These 2*N chdev descriptors, and 2*N associated server threads, are distributed in clusters.

The 2*N server threads implement the protocols stack.

The RX server threads block and deschedule when the RX queue is empty. The TX server threads block and deschedule when the TX queue is full. It is the driver responsibily to re-activate a blocked server thread when the queue state is modified: not full for TX, or not empty for RX.

The WTI mailboxes used to receive the NIC_TX_IRQ[N] and NIC_RX_IRQ[N] (one IRQ per channel and per direction) to signal available RX packet, or free TX slot, for a given channel, must be statically allocated during the kernel initialisation phase. They are routed to the cluster containing the associated chdev descriptor and The associated server thread.

The generic NIC device "kernel" API defines two functions:

  • the read() function is called by the RX server thread to get one paquet from the RX queue.
  • the write() function is called by the TX server thread to put one packet to the TX queue.

This "kernel" API is detailed in section C below.

All RX or TX paquets are sent or received in standard 2 Kbytes kernel buffers, that are dynamically allocated by the server threads. The structure pkd_t defining a packet descriptor contains the buffer pointer and the actual Ethernet packet length (in bytes).

The actual TX an RX queues structures depends on the hardware NIC implementation, and are only accessed by the driver functions.

To access the drivers, the NIC device defines a lower-level "driver" API, containing four command types, that are detailed in section D below.

All NIC device structures and access functions are defined in the dev_nic.c et dev_nic.h files.

B) Initialisation

The dev_nic_init() function makes the following initializations for given NIC chdev:

  • It selects a core in cluster containing the N chdev to execute the server thread.
  • it links the NIC IRQ to the core executing the server thread.
  • it initialises the NIC specific fields of the chdev descriptor.
  • it calls the nic_driver_init() function to initialize the NIC hardware device,
  • it initializes the specific software data structures required by the hardware implementation.

It must be called by a local thread.

C) The "kernel" API

The read function is always called by the DEV server thread associated to a given NIC_RX chdev. The write function is always called by the DEV server thread associated to a given NIC_TX chdev. For both functions, the local pointer on the chdev descriptor is registered in the server thread descriptor, and the channel index is registered in the chdev descriptor.

These two functions are blocking and return only when the transfer is completed.

  • The dev_nic_read( pkd_t * pkd ) read one Ethernet/IPv4 packet from the NIC_RX queue associated to the NIC channel. It calls directly the NIC driver, without registering in a waiting queue, because only one dedicated NIC_RX thread can access a given NIC_RX queue.
    1. It test the NIC_RX queue status, using the NIC_CMD_READABLE driver command. If the NIC_RX queue is empty, it unmasks the NIC-RX_IRQ, blocks and deschedules. It is re-activated by the nic_driver_isr() function (activated by the NIC_RX_IRQ) as soon as the queue becomes not empty.
    2. If the queue is not empty, it get one packet, using the NIC_CMD_READ driver command and returns.

Both commands are successively registered in the NIC-RX server thread descriptor to be passed to the driver.

WARNING : for a RX packet, the initiator is the NIC hardware, and the protocols stack executed by the RX thread is traversed upward, from the point of view of function calls.

  • The dev_nic_write( pkd_t * pkd ) function writes one Ethernet/IPv4 packet to the NIC_TX queue associated to the NIC channel. It calls directly the NIC driver, without registering in a waiting queue, because only one dedicated NIC_TX thread can access this NIC_TX queue.
    1. It test the NIC_RX queue status, using the NIC_CMD_WRITABLE driver command. If the NIC_TX queue is full, it unmasks the NIC-TX_IRQ, blocks and deschedules. It is re-activated by the nic_driver_isr() function (activated by the NIC_TX_IRQ) as soon as the queue becomes not full.
    2. If the queue is not full, it put one packet, using the NIC_CMD_WRITE driver command.

Both commands are successively registered in the NIC-TX server thread descriptor to be passed to the driver.

WARNING : for a TX packet, the initiator is the client thread, and the protocols stack executed by the TX thread is traversed downward, from the point of view of function calls.

D) The "driver" API

All NIC drivers must define three functions :

  • void nic_driver_init( chdev_t * chdev )
  • void nic_driver_cmd( xptr_t thread_xp )
  • void nic_driver_isr( chdev_t * chdev )

The nic_driver_cmd() function arguments are actually defined in the nic_command_t structure embedded in the server thread descriptor. One command contains four informations:

  • type : operation type (defined below)
  • buffer : local pointer on kernel buffer containing one packet.
  • length : packet length (in bytes).
  • status : return value for READABLE and WRITABLE

The four command types for the NIC driver(s) are the following:

  • NIC_CMD_READABLE : returns true if at least one RX paquet is available in RX queue.
  • NIC_CMD_WRITABLE : returns true if at least one empty slot is available in TX queue.
  • NIC_CMD_READ : consume one packet from the RX queue.
  • NIC_CMD_WRITE : produce one packet to the TX queue.