source: trunk/kernel/kern/ksocket.h @ 690

Last change on this file since 690 was 683, checked in by alain, 4 years ago

All modifications required to support the <tcp_chat> application
including error recovery in case of packet loss.A

File size: 41.7 KB
Line 
1/*
2 * ksocket.h - kernel socket definition.
3 *
4 * Authors  Alain Greiner    (2016,2017,2018,2019,2020)
5 *
6 * Copyright (c) UPMC Sorbonne Universites
7 *
8 * This file is part of ALMOS-MKH
9 *
10 * ALMOS-MKH is free software; you can redistribute it and/or modify it
11 * under the terms of the GNU General Public License as published by
12 * the Free Software Foundation; version 2.0 of the License.
13 *
14 * ALMOS-MKH is distributed in the hope that it will be useful, but
15 * WITHOUT ANY WARRANTY; without even the implied warranty of
16 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
17 * General Public License for more details.
18 *
19 * You should have received a copy of the GNU General Public License
20 * along with ALMOS-MKH; if not, write to the Free Software Foundation,
21 * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
22 */
23
24#ifndef _KSOCKET_H_
25#define _KSOCKET_H_
26
27#include <kernel_config.h>
28#include <hal_kernel_types.h>
29#include <xlist.h>
30#include <remote_buf.h>
31#include <remote_busylock.h>
32
33/*****************************************************************************************
34 * This structure defines a kernel socket descriptor, used for both UDP or TCP sockets.
35 * A socket is a private resource used by a most two user threads : one TX client
36 * thread to send packets, and one RX client thread, to receive packets. The TX client
37 * thread and the RX client thread can be the same thread.
38 *
39 * When the Network Interface Controller contains several channels, the set of all
40 * existing sockets is split in as many subsets as the number of NIC channels, in order
41 * to parallelize the transfers. The distribution key defining the channel index
42 * is computed from the (remote_addr/remote_port) couple (by the NIC hardware for the
43 * RX packets; by the software for the TX packets) using a dedicated NIC driver function.
44 * All sockets that have the same key share the same channel, and each socket is
45 * therefore linked to two chdevs : NIC_TX[key] & NIC_RX[key].
46 * The socket allows the NIC-TX and NIC_RX server threads to access various buffers:
47 * - the kernel "tx_buf" buffer contains the data to be send by the TX server thread.
48 *   It is dynamically allocated, and used as retransmission buffer when required.
49 * - the kernel "rx_buf" buffer contains the data received by the RX server thread.
50 *   It is allocated in socket and handled as a single writer / single reader FIFO.
51 * - the kernel "r2t" buffer allows the RX server thread to make direct requests
52 *   to the associated TX server (mainly used to handle the TCP ACKs).
53 * - the kernel "crq" buffer allows to store concurrent remote client connect requests
54 *   to a local server socket.
55 *
56 * The synchronisation mechanism between the client threads and the server threads
57 * is different for the TX and RX directions:
58 *
59 * 1) TX direction (sent packets)
60 *
61 * - The internal API between the TX client thread and the NIC_TX server thread defines
62 *   four command types, stored in the "tx_cmd" variable of the socket descriptor:
63 *   . SOCKET_TX_CONNECT : request to start the connection handshake (TCP client only).
64 *   . SOCKET_TX_ACCEPT  : request to accept one connection request (TCP server only).
65 *   . SOCKET_TX_SEND    : local (UDP/TCP) request to send data to a remote (UDP/TCP).
66 *   . SOCKET_TX_CLOSE   : local TCP socket request remote TCP socket to close connection.
67 * - All commands are blocking for the TX client thread: to make a command, the TX client
68 *   registers the command type in the socket "tx_cmd",field, set the "tx_valid" field,
69 *   reset the "tx_error" field, and registers itself in the "tx_client" field.
70 *   Then, it unblocks the TX server thread from the BLOCKED_CLIENT condition, blocks itself
71 *   on the BLOCKED_IO condition, and deschedules. For a SEND, the client thread copies
72 *   the payload contained in the "u_buf" user buffer to the socket "tx_buf" kernel buffer
73 *   that is used as retransmission buffer, when required.
74 * - A command is valid for the TX server when the socket descriptor "tx_valid" is true.
75 *   For a SEND command, the "tx_valid" is reset by the NIC_TX server thread when the last
76 *   byte has been sent, but the TX client thread is unblocked by the NIC_RX server thread
77 *   only when the last byte has been acknowledged, or to report an error.
78 *   For the CONNECT, ACCEPT and CLOSE commands, the "tx_valid" is reset by the NIC_TX server
79 *   when the first segment of the handshake has been sent, but the TX client thread is
80 *   unblocked by the NIC_RX server thread only when the handshake is actually completed.
81 *   The TX server thread is acting as a multiplexer. It scans the list of attached sockets,
82 *   to sequencially handle the valid commands: one UDP packet or TCP segment per iteration.
83 *   The TX server blocks and deschedules on the BLOCKED_CLIENT condition when there is
84 *   no more valid TX command or R2T request registered in any socket. It is unblocked
85 *   from BLOCKED_CLIENT by a client thread registering a TX command, or by the RX server
86 *   thread registering a R2T request. The TX server thread signals an error to the TX client
87 *   thread using the "tx_error" field in socket descriptor.
88 *   When "tx_valid" or "r2t_valid" are true, the TX server thread build and send an UDP
89 *   packet or TCP segment. A single SEND command can require a large number of TCP
90 *   segments to move a big data buffer, before unblocking the client thread.
91 *   This TX server thread blocks and deschedules on the BLOCKED_ISR condition when there
92 *   the NIC_RX queue is full . It is unblocked by the hardware NIC_TX_ISR.
93 * - As multiple simultaneous TX accesses to the same socket are forbiden, the client
94 *   thread makes a double check before posting a new TX command :
95 *   the "tx_valid" field must be false, and the "tx_client" field must be XPTR_NULL.
96 *   The "tx_valid" field is reset by the TX server thread, and the "tx_client"
97 *   field is reset by the TX client thread itself, when it resumes after a TX command.
98 *   . For a SEND command on an UDP socket, the TX server thread reset "tx_valid" and
99 *     unblocks the TX client thread as soon as the last data byte has been sent.
100 *   . For a SEND command on a TCP socket, the TX server thread reset "tx_valid" when the
101 *     last data byte has been sent, but the TX client thread is unblocked by the TX server
102 *     only when the last data byte has been acknowledged by the remote socket.
103 *   . For the CONNECT or ACCEPT commands, the "tx_valid" flag is reset and the TX client
104 *     thread is unblocked by the RX server thread only when the command is completed,
105 *     and the local TCP socket is actually in the ESTAB state.
106 *   . For a CLOSE command, the "tx_valid" flag is reset, and the TX client thread is
107 *     unblocked by the RX server thread only when the remote socket is disconnected.
108 *
109 * 2) RX stream
110 *
111 * - The internal API between the RX client thread and the RX server thread defines two
112 *   command types stored in the rx_cmd variable of the socket descriptor:
113 *   . SOCKET_RX_ACCEPT : TCP server request a connection request from CRQ queue.
114 *   . SOCKET_RX_RECV   : local (UDP/TCP) socket expect data from a remote (UDP/TCP).
115 *   For the RECV command the communication is done through the "rx_buf" buffer,
116 *   attached to the socket, and handled as a single-writer / single reader-FIFO.
117 *   For the ACCEPT command the communication is done through the CRQ buffer, attached
118 *   to the socket, and handled as a single-writer / single reader-FIFO.
119 *   These two commands are blocking for the RX client thread as long as the buffer is
120 *   empty. The client thread set the socket "rx_valid" field, reset the "rx_error" field,
121 *   registers itself in the "rx_client" field, and  blocks on the BLOCKED_IO condition.
122 * - The RX server thread is acting as a demultiplexor: it handle one received TCP segment,
123 *   or UDP packet per iteration in the loop on the NIC_RX queue, and moves the data to
124 *   the relevant buffer of the socket matching the packet. It discard packets that don't
125 *   match a registered socket. When a client thread is registered in the socket descriptor,
126 *   the RX server thread reset the "rx_valid" field and unblocks the RX client thread from
127 *   the BLOCKED_IO condition as soon as there is data available in the "rx_buf".
128 *   This RX server thread blocks and deschedules on the BLOCKED_ISR condition when there
129 *   is no more packets in the NIC_RX queue. It is unblocked by the hardware NIC_RX_ISR.
130 * - In order to detect and report error for multiple simultaneous RX accesses to the same
131 *   socket, the RX client thread makes a double check before posting a new RX command :
132 *   the "rx_valid" field must be false, and the "rx_client" field must be XPTR_NULL.
133 *   The "rx_valid" field is reset by the RX server thread, and the "rx_client"
134 *   field is reset by the RX client thread itself, when it resumes after an RX command.
135 *
136 * 3) R2T queue
137 *
138 * The RX server thread can directly request the associated TX server thread to send
139 * control packets in  the TX stream, using a dedicate R2T (RX to TX) queue embedded in
140 * the socket descriptor, and implemented as a remote_buf_t FIFO.
141 * It is used for TCP acknowledge and for the TCP three-steps handshake.
142 * Each R2T request occupy exactly one single byte defining the TCP flags to be set.
143 *
144 * 4) CRQ queue
145 *
146 * The remote CONNECT requests received by a TCP socket (SYN segments) are stored in a
147 * dedicated CRQ queue, and consumed  by the local client thread executing an ACCEPT.
148 * This CRQ queue is embedded in the local socket descriptor,  and implemented as a
149 * remote_buf_t FIFO. Each request occupy sizeof(connect_request_t) bytes in the queue.
150 * The connect_request_t structure containing the request arguments is defined below.
151 *
152 * Note : the socket domains and types are defined in the "shared_socket.h" file.
153 ****************************************************************************************/
154
155/*****************************************************************************************
156 * This enum defines the set of commands that can be registered in the socket
157 * by the TX & RX client threads to be executed by the NIC_TX & NIC_TX server threads.
158 ****************************************************************************************/
159typedef enum socket_cmd_type_e
160{
161    CMD_TX_CONNECT      = 20,         /*! request a SYN segment     (TCP only)          */
162    CMD_TX_ACCEPT       = 21,         /*! request a SYN-ACK segment (TCP only)          */
163    CMD_TX_CLOSE        = 22,         /*! request a RST segment     (TCP only)          */
164    CMD_TX_SEND         = 23,         /*! request to send data      (TCP or UDP)        */
165
166    CMD_RX_ACCEPT       = 30,         /*! wait request from CRQ     (TCP only)          */ 
167    CMD_RX_RECV         = 31,         /*! wait DATA from rx_buf     (TCP or UDP)        */
168}
169socket_cmd_type_t;
170
171/*****************************************************************************************
172 * This enum defines the set of command status that can be returned by the NIC_RX and
173 * NIC_TX server threads to the TX & RX client threads.
174 ****************************************************************************************/
175typedef enum socket_cmd_sts_e
176{
177    CMD_STS_SUCCESS     =  0,
178    CMD_STS_EOF         =  1,
179    CMD_STS_RST         =  2,
180    CMD_STS_BADACK      =  3,
181    CMD_STS_BADSTATE    =  4,
182    CMD_STS_BADCMD      =  5,
183}
184socket_cmd_sts_t;
185
186/*****************************************************************************************
187 * This enum defines the set of tates for an UDP socket.
188 ****************************************************************************************/
189typedef enum udp_socket_state_e
190{
191    UDP_STATE_UNBOUND    = 0x00,
192    UDP_STATE_BOUND      = 0x01,
193    UDP_STATE_ESTAB      = 0x02,
194}
195udp_socket_state_t;
196
197/*****************************************************************************************
198 * This enum defines the set of tates for an TCP socket.
199 ****************************************************************************************/
200typedef enum tcp_socket_state_e
201{
202    TCP_STATE_UNBOUND    = 0x10,
203    TCP_STATE_BOUND      = 0x11,
204    TCP_STATE_LISTEN     = 0x12,
205    TCP_STATE_SYN_SENT   = 0x13,
206    TCP_STATE_SYN_RCVD   = 0x14,
207    TCP_STATE_ESTAB      = 0x15,
208    TCP_STATE_FIN_WAIT1  = 0x16,
209    TCP_STATE_FIN_WAIT2  = 0x17,
210    TCP_STATE_CLOSING    = 0x18,
211    TCP_STATE_TIME_WAIT  = 0x19,
212    TCP_STATE_CLOSE_WAIT = 0x1A,
213    TCP_STATE_LAST_ACK   = 0x1B,
214    TCP_STATE_CLOSED     = 0x1C,
215}
216tcp_socket_state_t;
217
218/****************************************************************************************
219 * This structure defines one connection request, registered in the CRQ queue.
220 ***************************************************************************************/
221typedef struct connect_request_s
222{
223    uint32_t          addr;          /* requesting socket IP address                   */
224    uint32_t          port;          /* requesting socket port number                  */
225    uint32_t          iss;           /* requesting socket initial sequence number      */
226    uint32_t          window;        /* requesting socket receive window               */
227}         
228connect_request_t;
229
230/****************************************************************************************
231 * This structure defines the socket descriptor.
232 ***************************************************************************************/
233typedef struct socket_s
234{
235    remote_queuelock_t lock;         /*! lock protecting socket state                  */
236    pid_t              pid;          /*! owner process identifier                      */
237    uint32_t           fdid;         /*! associated file descriptor index              */
238    uint32_t           domain;       /*! domain : AF_LOCAL / AF_INET                   */
239    uint32_t           type;         /*! type : SOCK_DGRAM / SOCK_STREAM               */
240    uint32_t           state;        /*! socket state (see above)                      */
241    uint32_t           local_addr;   /*! local  socket IP address                      */
242    uint32_t           remote_addr;  /*! remote socket IP address                      */
243    uint32_t           local_port;   /*! local  socket port number                     */
244    uint32_t           remote_port;  /*! remote socket port number                     */
245    uint32_t           nic_channel;  /*! derived from (remote_addr,remote_port)        */
246
247    xlist_entry_t      tx_list;      /*! all sockets attached to same NIC_TX channel   */
248    xptr_t             tx_client;    /*! extended pointer on current TX client thread  */
249    bool_t             tx_valid;     /*! TX command valid                              */
250    socket_cmd_type_t  tx_cmd;       /*! TX command (CONNECT / ACCEPT / SEND / CLOSE)  */
251    uint32_t           tx_sts;       /*! signal a TX command success / failure         */
252    uint8_t         *  tx_buf;       /*! pointer on TX data buffer in kernel space     */
253    uint32_t           tx_len;       /*! number of data bytes for a SEND command       */
254    uint32_t           tx_todo;      /*! number of bytes not yet sent in tx_buf        */
255    uint32_t           tx_ack;       /*! number of bytes acknowledged in tx_buf        */
256
257    xlist_entry_t      rx_list;      /*! all sockets attached to same NIC_RX channel   */
258    xptr_t             rx_client;    /*! extended pointer on current RX client thread  */
259    bool_t             rx_valid;     /*! RX command valid                              */
260    socket_cmd_type_t  rx_cmd;       /*! RX command ( ACCEPT / RECV )                  */
261    uint32_t           rx_sts;       /*! signal a RX command success / failure         */
262    remote_buf_t       rx_buf;       /*! embedded receive buffer descriptor            */
263
264    remote_buf_t       r2tq;         /*! RX_to_TX requests queue descriptor            */
265    remote_buf_t       crqq;         /*! connection requests queue descriptor          */
266
267    /* the following fields defines the TCB (only used for a TCP connection)           */
268
269    uint32_t           tx_nxt;       /*! next byte to send in TX_data stream           */
270    uint32_t           tx_wnd;       /*! number of acceptable bytes in TX_data stream  */
271    uint32_t           tx_una;       /*! first unack byte in TX_data stream            */
272
273    uint32_t           rx_nxt;       /*! next expected byte in RX_data stream          */
274    uint32_t           rx_wnd;       /*! number of acceptable bytes in RX_data stream  */
275    uint32_t           rx_irs;       /*! initial sequence number in RX_data stream     */
276}
277socket_t;
278
279/****************************************************************************************
280 * This function returns a printable string for a socket domain.
281 ****************************************************************************************
282 * domain   :  AF_INET / AF_LOCAL
283 ***************************************************************************************/
284char * socket_domain_str( uint32_t domain );
285
286/****************************************************************************************
287 * This function returns a printable string for a socket type.
288 ****************************************************************************************
289 * type   :  SOCK_DGRAM / SOCK_STREAM
290 ***************************************************************************************/
291char * socket_type_str( uint32_t type );
292
293/****************************************************************************************
294 * This function returns a printable string for an UDP or TCP socket state.
295 ****************************************************************************************
296 * state  :  UDP_STATE_*** / TCP_STATE***
297 ***************************************************************************************/
298char * socket_state_str( uint32_t state );
299
300/****************************************************************************************
301 * This function returns a printable string for a command type.
302 ****************************************************************************************
303 * type  :  command type
304 ***************************************************************************************/
305char * socket_cmd_type_str( uint32_t type );
306
307/****************************************************************************************
308 * This function returns a printable string for a command status.
309 ****************************************************************************************
310 * sts   : command status.
311 ***************************************************************************************/
312char * socket_cmd_sts_str( uint32_t sts );
313
314
315
316/****************************************************************************************
317 *      Functions used by the NIC_TX and NIC_RX server threads.
318 ***************************************************************************************/
319
320/****************************************************************************************
321 * This blocking function is called by the dev_nic_rx_handle_tcp() function, executed by
322 * the NIC_RX[channel] server thread, to register a R2T request defined by the <flags>
323 * argument in the socket R2T queue, specified by the <queue_xp> argument.
324 * This function unblocks the NIC_TX[channel] server thread, identified by the <channel>
325 * argumentfrom the THREAD_BLOCKED_CLIENT condition.
326 *
327 * WARNING : It contains a waiting loop and return only when an empty slot has been
328 * found in the R2T queue.
329 ****************************************************************************************
330 * @ queue_xp   : [in] extended pointer on the R2T qeue descriptor.
331 * @ flags      : [in] flags to be set in the TCP segment.
332 * @ channel    : [in] NIC channel (both TX & RX).
333 ***************************************************************************************/
334void socket_put_r2t_request( xptr_t    queue_xp,
335                             uint8_t   flags,
336                             uint32_t  channel );
337
338/****************************************************************************************
339 * This function is called by the nic_tx_server thread to extract an R2T request
340 * (one byte) from a R2T queue, specified by the <queue_xp> argument, to the buffer
341 * defined by the <flags> argument.
342 *****************************************************************************************
343 * @ queue_xp      : [in]  extended pointer on the CRQ queue descriptor.
344 * @ flags         : [out] buffer for TCP flags to be set.
345 * @ return 0 if success / return -1 if queue empty.
346 ***************************************************************************************/
347error_t socket_get_r2t_request (xptr_t    queue_xp,
348                                uint8_t * flags );
349 
350/****************************************************************************************
351 * This function is called by the dev_nic_rx_handle_tcp() function to register
352 * a client connection request, defined by the <remote_addr>, <remote_port>,
353 * <remote_iss>, and <remote_window> arguments, * in the CRQ queue, specified
354 * by the <queue_xp> argument.
355 ****************************************************************************************
356 * @ queue_xp      : [in] extended pointer on the CRQ queue descriptor.
357 * @ remote_addr   : [in] remote socket IP address.
358 * @ remote_port   : [in] remote socket port.
359 * @ remote_iss    : [in] remote socket initial sequence number.
360 * @ remote_window : [in] remote socket receive window
361 * @ return 0 if success / return -1 if queue full.
362 ***************************************************************************************/
363error_t socket_put_crq_request( xptr_t    queue_xp,
364                                uint32_t  remote_addr,
365                                uint32_t  remote_port,
366                                uint32_t  remote_iss,
367                                uint32_t  remote_window );
368
369/****************************************************************************************
370 * This function is called by the socket_accept() function to extract a connection
371 * request from a CRQ queue, specified by the <queue_xp> argument, to the buffers
372 * defined by <remote_addr>, <remote_port>, <remote_iss>, and <remote_window>.
373 *****************************************************************************************
374 * @ queue_xp      : [in]  extended pointer on the CRQ qeue descriptor.
375 * @ remote_addr   : [out] buffer for remote socket IP address.
376 * @ remote_port   : [out] buffer for remote socket port.
377 * @ remote_iss    : [out] buffer for remote socket initial sequence number.
378 * @ remote_window : [out] buffer for remote socket receive window
379 * @ return 0 if success / return -1 if queue empty.
380 ***************************************************************************************/
381error_t socket_get_crq_request( xptr_t     queue_xp,
382                                uint32_t * remote_addr,
383                                uint32_t * remote_port,
384                                uint32_t * remote_iss,
385                                uint32_t * remote_window );
386
387/****************************************************************************************
388 * This blocking function diplays the socket state (including the TCB).
389 ****************************************************************************************
390 * @ socket_xp     : [in] extended pointer on socket descriptor.
391 * @ func_str      : [in] name of calling function.
392 * @ string        : [in] string defining the calling context (can be NULL)
393 ***************************************************************************************/
394void socket_display( xptr_t         socket_xp,
395                     const char   * func_str,
396                     const char   * string );
397
398
399
400/****************************************************************************************
401 *      Functions implementing the socket related system calls
402 ***************************************************************************************/
403
404/****************************************************************************************
405 * This function implements the socket() syscall.
406 * This function allocates and intializes in the calling thread cluster:
407 * - a new socket descriptor, defined by the <domain> and <type> arguments,
408 * - a new file descriptor, associated to this socket,
409 * It registers the file descriptor in the reference process fd_array[],
410 * set the socket state to UNBOUND, and returns the <fdid> value.
411 ****************************************************************************************
412 * @ domain  : [in] socket protocol family (AF_UNIX / AF_INET)
413 * @ type    : [in] socket type (SOCK_DGRAM / SOCK_STREAM).
414 * @ return a file descriptor <fdid> if success / return -1 if failure.
415 ***************************************************************************************/
416int socket_build( uint32_t   domain,
417                  uint32_t   type );
418
419/****************************************************************************************
420 * This function implements the bind() syscall.
421 * It assigns an IP address, defined by the <local_addr> argument, and a port number,
422 * defined by the <local_port> argument to an unnamed local socket, identified by the
423 * <fdid> argument, and set the socket state to BOUND. It applies to UDP or TCP sockets.
424 * It does not require any service from the NIC_TX and NIC_RX server threads.
425 * It can be called by a thread running in any cluster.
426 ****************************************************************************************
427 * @ fdid         : [in] file descriptor index identifying the socket.
428 * @ local_addr   : [in] local IP address.
429 * @ local_port   : [in] local port.
430 * @ return 0 if success / return -1 if failure.
431 ***************************************************************************************/
432int socket_bind( uint32_t  fdid,
433                 uint32_t  addr,
434                 uint16_t  port );
435
436/****************************************************************************************
437 * This function implements the listen() syscall().
438 * It is called by a (local) server process to specify the max size of the CRQ queue
439 * for a socket identified by the <fdid> argument, that expect connection requests
440 * from one or several (remote) client processes. The selected socket CRQ is supposed
441 * to register all connections requests, whatever the client IP address and port values.
442 *
443 * This function applies only to a TCP socket, that must be in the BOUND state.
444 * The socket is set to the LISTEN state.
445 * It does not require any service from the NIC_TX and NIC_RX server threads.
446 * It can be called by a thread running in any cluster.
447 ****************************************************************************************
448 * Implementation notes :
449 * The number N of channels available in the NIC contrôler can be larger than 1.
450 * Depending on the remote client IP address and port, the  connection request can be
451 * received by any NIC_RX[k] server thread. To find the relevant listening socket, each
452 * NIC_RX[k] server thread must be able to scan the set of all listening sockets.
453 * Therefore a list of listening sockets is implemented as a dedicated xlist, rooted in
454 * the NIC_RX[0] chdev extension, and using the listening socket <rx_list> field,
455 * because a listening socket is never used to move data. 
456 ****************************************************************************************
457 * @ fdid      : [in] file descriptor index identifying the local server socket.
458 * @ crq_depth : [in] depth of CRQ queue of pending connection requests.
459 * @ return 0 if success / return -1 if failure
460 ***************************************************************************************/
461int socket_listen( uint32_t fdid,
462                   uint32_t crq_depth );
463
464/****************************************************************************************
465 * This blocking function implements the accept() syscall().
466 * It applies only to TCP sockets in the LISTEN state.
467 * It is executed by a server process, waiting for one (or several) client process(es)
468 * requesting a connection on a listening socket identified by the <fdid> argument.
469 * This socket must have been previouly created with socket(), bound to a local address
470 * with bind(), and listening for connections after a listen(). It  blocks on the <IO>
471 * condition if the CRQ is empty. Otherwise, it get a pending connection request from
472 * the listening socket CRQ queue, and creates a new socket with the same properties
473 * as the listening socket, allocating a new file descriptor for this new socket.
474 * It computes the nic_channel index [k] from <remote_addr> and <remote_port> values,
475 * and initializes "remote_addr","remote_port", "nic_channel" in local socket.
476 * It returns the new socket fdid as well as the remote IP address
477 * and port, but only when the new socket is set to the ESTAB state. The new socket
478 * cannot accept connections, but the listening socket keeps open for new connections. 
479 ****************************************************************************************
480 * Implementation Note:
481 * This blocking function contains two blocking conditions because it requests services
482 * to both the NIC_RX server thread, and he NIC_TX server thread.
483 * It is structured in five steps:
484 * 1) It makes several checkings on the listening socket domain, type, and state.
485 * 2) If the socket CRQ queue is empty, the function makes an SOCKET_RX_ACCEPT command
486 *    to the NIC_RX server thread, waiting registration of a connection request in the
487 *    CRQ queue. Then it blocks on the <IO> condition and deschedules. It is unblocked
488 *    by the NIC_RX server thread receiving a valid TCP SYN segment.
489 * 3) When it found a pending request, it creates a new socket with the same properties
490 *    as the listening socket, and a new file descriptor for this socket. It initializes
491 *    the new socket descriptor using the values in the registered connect_request_t
492 *    structure, and set this new socket to the SYN_RECV state.
493 * 4) Then it makes a SOCKET_TX_command to the NIC_TX thread, requesting a TCP SYN_ACK
494 *    segment to the remote socket. Then, it blocks on <IO> condition and dechedules.
495 *    It is unblocked by the NIC_RX server thread when this SYN_ACK is acknowledged,
496 *    and the new socket is set in ESTAB state (by the NIC_RX server).
497 * 5) Finally, it returns the new socket fdid, and registers, in the <address> and
498 *    <port> arguments, the remote client IP address & port.
499 ****************************************************************************************
500 * @ fdid         : [in] file descriptor index identifying the listening socket.
501 * @ address      : [out] server IP address.
502 * @ port         : [out] server port address length in bytes.
503 * @ return the new socket <fdid> if success / return -1 if failure
504 ***************************************************************************************/
505int socket_accept( uint32_t   fdid,
506                   uint32_t * address,
507                   uint16_t * port );
508
509/****************************************************************************************
510 * This blocking function implements the connect() syscall.
511 * It is used by a client process to connect a local socket identified by
512 * the <fdid> argument, to a remote socket identified by the <remote_addr> and
513 * <remote_port> arguments. It can be used for both  UDP and TCP sockets.
514 * It computes the nic_channel index [k] from <remote_addr> and <remote_port> values,
515 * and initializes "remote_addr","remote_port", "nic_channel" in local socket.
516 * It registers the socket in the lists of sockets rooted in the NIC_RX[k] & NIC_TX[k]
517 * chdevs. It can be called by a thread running in any cluster.
518 * It returns only when the local socket is in the ESTAB state, or to report an error.
519 ****************************************************************************************
520 * Implementation Note:
521 * - For a TCP socket, it updates the "remote_addr", "remote_port", "nic_channel" fields
522 *   in the socket descriptor defined by the <fdid> argument, and register this socket,
523 *   in the lists of sockets attached to the NIC_TX[k] and NIC_RX[k] chdevs.
524 *   Then, it builds a TX_CONNECT command to the NIC_TX server thread to send a SYN to
525 *   the remote socket, unblocks the NIC_TX server thread from the <CLIENT> condition,
526 *   blocks itself on <IO> condition and deschedules. It is unblocked by the NIC_RX
527 *   server thread when this thread receive the expected SYN-ACK, and the local socket
528 *   has been set to the ESTAB state, or when an error is reported in "tx_error" field.
529 * - For an UDP socket, it simply updates "remote_addr", "remote_port", "nic_channel"
530 *   in the socket descriptor defined by the <fdid> argument, and register this socket
531 *   in the lists of sockets attached to the NIC_TX[k] and NIC_RX[k] chdevs.
532 *   Then, it set the socket to the ESTAB state, or returns an error without blocking.
533 ****************************************************************************************
534 * @ fdid          : [in] file descriptor index identifying the socket.
535 * @ remote_addr   : [in] remote IP address.
536 * @ remote_port   : [in] remote port.
537 * @ return 0 if success / return -1 if failure.
538 ***************************************************************************************/
539int socket_connect( uint32_t  fdid,
540                    uint32_t  remote_addr,
541                    uint16_t  remote_port );
542
543/****************************************************************************************
544 * This blocking function implements the send() syscall.
545 * It is used to send data stored in the user buffer, identified the <u_buf> and <length>
546 * arguments, to a connected (TCP or UDP) socket, identified by the <fdid> argument.
547 * The work is actually done by the NIC_TX server thread, and the synchronisation
548 * between the client and the server threads uses the "tx_valid" set/reset flip-flop:
549 * The client thread registers itself in the socket descriptor, registers in the queue
550 * rooted in the NIC_TX[index] chdev, set "tx_valid", unblocks the server thread, and
551 * finally blocks on THREAD_BLOCKED_IO, and deschedules.
552 * When the TX server thread completes the command (all data has been sent for an UDP
553 * socket, or acknowledged for a TCP socket), the server thread reset "rx_valid" and
554 * unblocks the client thread.
555 * This function can be called by a thread running in any cluster.
556 * WARNING : This implementation does not support several concurent SEND commands
557 * on the same socket, as only one TX thread can register in a given socket.
558 ****************************************************************************************
559 * @ fdid      : [in] file descriptor index identifying the socket.
560 * @ u_buf     : [in] pointer on buffer containing packet in user space.
561 * @ length    : [in] packet size in bytes.
562 * @ return number of sent bytes if success / return -1 if failure.
563 ***************************************************************************************/
564int socket_send( uint32_t    fdid,
565                 uint8_t   * u_buf,
566                 uint32_t    length );
567
568/****************************************************************************************
569 * This blocking function implements the recv() syscall.
570 * It is used to receive data that has been stored by the NIC_RX server thread in the
571 * rx_buf of a connected socket, identified by the <fdid> argument. 
572 * The synchronisation between the client and the server threads uses the "rx_valid"
573 * set/reset flip-flop: If "rx_valid" is set, the client simply moves the available
574 * data from the "rx_buf" to the user buffer identified by the <u_buf> and <length>
575 * arguments, and reset the "rx_valid" flip_flop. If "rx_valid" is not set, the client
576 * thread register itself in the socket descriptor, registers in the clients queue rooted
577 * in the NIC_RX[index] chdev, and finally blocks on THREAD_BLOCKED_IO, and deschedules.
578 * The client thread is re-activated by the RX server, that set the "rx_valid" flip-flop
579 * as soon as data is available in the "rx_buf". The number of bytes actually transfered
580 * can be less than the user buffer size.
581 * This  function can be called by a thread running in any cluster.
582 * WARNING : This implementation does not support several concurent RECV
583 * commands on the same socket, as only one RX thread can register in a given socket.
584 ****************************************************************************************
585 * @ fdid        : [in] file descriptor index identifying the local socket.
586 * @ u_buf       : [in] pointer on buffer in user space.
587 * @ length      : [in] buffer size in bytes.
588 * @ return number of received bytes if success / return -1 if failure.
589 ***************************************************************************************/
590int socket_recv( uint32_t    fdid,
591                 uint8_t   * u_buf,
592                 uint32_t    length );
593
594/****************************************************************************************
595 * This blocking function implements the sendto() syscall.
596 * It is used to send data stored in the user buffer, identified the <u_buf> and <length>
597 * to a remote process identified by the <remote_ip> and <remote_port> arguments,
598 * through a local, unconnected (UDP) socket, identified by the <fdid> argument.
599 * The work is actually done by the NIC_TX server thread, and the synchronisation
600 * between the client and the server threads uses the "rx_valid" set/reset flip-flop:
601 * The client thread registers itself in the socket descriptor, registers in the queue
602 * rooted in the NIC_TX[index] chdev, set "rx_valid", unblocks the server thread, and
603 * finally blocks on THREAD_BLOCKED_IO, and deschedules.
604 * When the TX server thread completes the command (all data has been sent for an UDP
605 * socket, or acknowledged for a TCP socket), the server thread reset "rx_valid" and
606 * unblocks the client thread.
607 * This function can be called by a thread running in any cluster.
608 * WARNING : This implementation does not support several concurent SEND/SENDTO commands
609 * on the same socket, as only one TX thread can register in a given socket.
610 * TODO : this function is not implemented yet.
611 ****************************************************************************************
612 * @ fdid        : [in] file descriptor index identifying the local socket.
613 * @ u_buf       : [in] pointer on buffer containing packet in user space.
614 * @ length      : [in] packet size in bytes.
615 * @ remote_ip   : [in] remote socket IP address.
616 * @ remote_port : [in] remote socket port address.
617 * @ return number of sent bytes if success / return -1 if failure.
618 ***************************************************************************************/
619int socket_sendto( uint32_t   fdid,
620                   uint8_t  * u_buf,
621                   uint32_t   length,
622                   uint32_t   remote_ip,
623                   uint16_t   remote_port );
624
625/****************************************************************************************
626 * This blocking function implements the recvfrom() syscall.
627 * It is used to receive data that has been stored by the NIC_RX server thread in the
628 * rx_buf of a non connected socket, identified by the <fdid> argument, from a
629 * remote process identified by the <remote_ip> and <remote_port> arguments.
630 * The synchronisation between the client and the server threads uses the "rx_valid"
631 * set/reset flip-flop: If "rx_valid" is set, the client simply moves the available
632 * data from the "rx_buf" to the user buffer identified by the <u_buf> and <length>
633 * arguments, and reset the "rx_valid" flip_flop. If "rx_valid" is not set, the client
634 * thread register itself in the socket descriptor, registers in the clients queue rooted
635 * in the NIC_RX[index] chdev, and finally blocks on THREAD_BLOCKED_IO, and deschedules.
636 * The client thread is re-activated by the RX server, that set the "rx_valid" flip-flop
637 * as soon as data is available in the "rx_buf". The number of bytes actually transfered
638 * can be less than the user buffer size.
639 * This  function can be called by a thread running in any cluster.
640 * WARNING : This implementation does not support several concurent RECV/RECVFROM
641 * commands on the same socket, as only one RX thread can register in a given socket.
642 * TODO : this function is not implemented yet.
643 ****************************************************************************************
644 * @ fdid        : [in] file descriptor index identifying the local socket.
645 * @ u_buf       : [in] pointer on buffer in user space.
646 * @ length      : [in] buffer size in bytes.
647 * @ remote_ip   : [in] remote socket IP address.
648 * @ remote_port : [in] remote socket port address.
649 * @ return number of received bytes if success / return -1 if failure.
650 ***************************************************************************************/
651int socket_recvfrom( uint32_t    fdid,
652                     uint8_t   * u_buf,
653                     uint32_t    length,
654                     uint32_t    remote_ip,
655                     uint16_t    remote_port );
656
657/****************************************************************************************
658 * This blocking function implements the close() syscall for a socket.
659 * - For a UDP socket, it simply calls the static socket_destroy() function to release
660 *   all structures associated to the local socket, including the file descriptor.
661 * - For a TCP socket, it makes a CLOSE command to NIC_TX, and blocks on the <IO>
662 *   condition. The close TCP hanshake is done by the NIC_TX and NIC_RX threads.
663 *   It is unblocked when the socket is in CLOSED state, or when an error is reported.
664 *   Finally, it calls the static socket_destroy() function to release all structures
665 *   associated to the local socket, including the file descriptor.
666 ****************************************************************************************
667 * @ file_xp     : [in] extended pointer on file descriptor.
668 * @ fdid        : [in] file descriptor index identifying the socket.
669 * @ return 0 if success / return -1 if failure.
670 ***************************************************************************************/
671int socket_close( xptr_t     file_xp,
672                  uint32_t   fdid );
673
674
675#endif  /* _KSOCKET_H_ */
676
677
678
Note: See TracBrowser for help on using the repository browser.