Ignore:
Timestamp:
Jul 18, 2019, 2:06:55 PM (5 years ago)
Author:
alain
Message:

Introduce the non-standard pthread_parallel_create() system call
and re-write the <fft> and <sort> applications to improve the
intrinsic paralelism in applications.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/kernel/kern/dqdt.h

    r632 r637  
    22 * kern/dqdt.h - Distributed Quad Decision Tree
    33 *
    4  * Author : Alain Greiner (2016,2017,2018)
     4 * Author : Alain Greiner (2016,2017,2018,2019)
    55 *
    66 * Copyright (c)  UPMC Sorbonne Universites
     
    3131/****************************************************************************************
    3232 * This DQDT infrastructure maintains a topological description of ressources usage
    33  * in each cluster: number of threads, and number of physical pages allocated.
     33 * in each cluster: number of threads per core, and number of physical pages allocated.
    3434 *
    35  * - If X_SIZE or Y_SIZE are equal to 1, it makes the assumption that the cluster
    36  *   topology is a one dimensionnal vector, an build the smallest one-dimensionnal
    37  *   quad-tree covering this one-dimensionnal vector. If the number of clusters
    38  *   is not a power of 4, the tree is truncated as required.
    39  *
    40  *   TODO : the mapping for the one dimensionnal topology is not implemented yet [AG].
    41  *
    42  * - If both Y_SIZE and Y_SIZE are larger than 1, it makes the assumption that
    43  *   the clusters topology is a 2D mesh. The [X,Y] coordinates of a cluster are
    44  *   obtained from the CXY identifier using the Rrelevant macros.
    45  *      X = CXY >> Y_WIDTH   /  Y = CXY & ((1<<Y_WIDTH)-1)
    46  * - If the mesh X_SIZE and Y_SIZE dimensions are not equal, or are not power of 2,
    47  *   or the mesh contains "holes" reported in the cluster_info[x][y] array,
    48  *   we build the smallest two dimensionnal quad-tree covering all clusters,
    49  *   and this tree is truncated as required.
    50  * - The mesh size is supposed to contain at most 32 * 32 clusters.
    51  *   Therefore, it can exist at most 6 DQDT nodes in a given cluster:
    52  *   . Level 0 nodes exist on all clusters and have no children.
    53  *   . Level 1 nodes exist when both X and Y coordinates are multiple of 2
    54  *   . Level 2 nodes exist when both X and Y coordinates are multiple of 4
    55  *   . Level 3 nodes exist when both X and Y coordinates are multiple of 8
    56  *   . Level 4 nodes exist when both X and Y coordinates are multiple of 16
    57  *   . Level 5 nodes exist when both X and Y coordinates are multiple of 32
    58  * - For nodes other than level 0, the placement is defined as follow:
    59  *   . The root node is placed in the cluster containing the core executing
    60  *     the dqdt_init() function.
    61  *   . An intermediate node (representing a given sub-tree) is placed in one
    62  *     cluster covered by the subtree, pseudo-randomly selected.
     35 * It is organized as a quad-tree, where the leaf cells are the clusters, organised
     36 * as a 2D mesh. Each node in the quad-tree (including the root and the leaf cells,
     37 * covers a "macro-cluster", that is a square array of clusters where the number
     38 * in the macro-cluster is a power of 4, and the macro-cluster side is a power of two.
     39 * Each node contains informations on ressources usage (physical memory and cores)
     40 * in the covered macro-cluster.
     41 * This quad-tree can be truncated, if the physical mesh X_SIZE and Y_SIZE dimensions
     42 * are not equal, or are not power of 2, or if the physical mesh contains "holes".
     43 * The mesh size is supposed to contain at most 32*32 clusters in this implementation.
     44 *   . Level 0 nodes exist in all clusters and have no children.
     45 *   . Level 1 nodes can be placed in any cluster of the covered  2*2  macro-cluster.
     46 *   . Level 2 nodes can be placed in any cluster of the covered  4*4  macro-cluster.
     47 *   . Level 3 nodes can be placed in any cluster of the covered  8*8  macro-cluster.
     48 *   . Level 4 nodes can be placed in any cluster of the covered 16*16 macro-cluster.
     49 *   . Level 5 nodes can be placed in any cluster of the covered 32*32 macro-cluster.
     50 * The root node is placed in the cluster containing the core executing the dqdt_init()
     51 * function. Other (non level 0) nodes are placed pseudo-randomly.
    6352 ***************************************************************************************/
    6453
     
    6655 * This structure describes a node of the DQDT.
    6756 * The max number of children is 4, but it can be smaller for some nodes.
    68  * Level 0 nodes are the clusters, and have no children.
    69  * The root node has no parent.
     57 * Level 0 nodes have no children. The root node has no parent.
    7058 ***************************************************************************************/
    7159
     
    7462        uint32_t      level;            /*! node level                                     */
    7563        uint32_t      arity;            /*! actual children number in this node            */
    76     uint32_t      threads;          /*! current number of threads in macro-cluster     */
    77     uint32_t      pages;            /*! current number of pages in macro-cluster       */
     64    uint32_t      threads;          /*! number of threads in macro-cluster             */
     65    uint32_t      pages;            /*! number of allocated pages in macro-cluster     */
    7866    uint32_t      cores;            /*! number of active cores in macro cluster        */
    79     uint32_t      clusters;         /*! number of active cluster in macro cluster      */
     67    uint32_t      clusters;         /*! number of active clusters in macro cluster     */
    8068        xptr_t        parent;           /*! extended pointer on parent node                */
    8169        xptr_t        children[2][2];   /*! extended pointers on children nodes            */
     
    8775 * This function recursively initializes the DQDT structure from informations
    8876 * stored in cluster manager (x_size, y_size and cluster_info[x][y].
    89  * It is executed in all clusters by the local CP0, to compute level_max and register
     77 * It is called in all clusters by the local CP0, to compute level_max and register
    9078 * the DQDT root node in each cluster manager, but only CPO in cluster 0 build actually
    9179 * the quad-tree covering all active clusters.
     
    10290 ***************************************************************************************/
    10391void dqdt_increment_threads( void );
     92
    10493void dqdt_decrement_threads( void );
    10594
     
    121110
    122111/****************************************************************************************
    123  * This function can be called in any cluster. It traverses the DQDT tree
    124  * from the root to the bottom, to analyse the computing load and select the cluster
    125  * with the lowest number ot threads to place a new process.
     112 * This function returns an extended pointer on the dqdt node that is the root of
     113 * the sub-tree covering the macro-cluster defined by the <level> argument and
     114 * containing the cluster defined by the <cxy> argument. It returns XPTR_NULL if
     115 * this macro-cluster is undefined (when the cxy cluster contains no core).
    126116 ****************************************************************************************
     117 * @ cxy   : cluster identifier.
     118 * @ level   : level of the sub-tree.
     119 * @ returns  root_xp if success / return XPTR_NULL if no active core in macro_cluster.
     120 ***************************************************************************************/
     121xptr_t dqdt_get_root( cxy_t    cxy,
     122                      uint32_t level );
     123
     124/****************************************************************************************
     125 * This function can be called in any cluster. It traverses the DQDT tree from the
     126 * local root of a macro-cluster, defined by the <root_xp> argument, to the bottom.
     127 * It analyses the computing load & select the cluster containing the lowest number
     128 * ot threads.
     129 ****************************************************************************************
     130 * @ root_xp  : extended pointer on DQDT node root.
    127131 * @ returns the cluster identifier with the lowest computing load.
    128132 ***************************************************************************************/
    129 cxy_t dqdt_get_cluster_for_process( void );
     133cxy_t dqdt_get_cluster_for_thread( xptr_t root_xp );
    130134
    131135/****************************************************************************************
    132  * This function can be called in any cluster. It traverses the DQDT tree
    133  * from the root to the bottom, to analyse the memory load and select the cluster
    134  * with the lowest memory load for dynamic memory allocation with no locality constraint.
     136 * This function can be called in any cluster. It traverses the DQDT tree from the
     137 * local root of a macro-cluster, defined by the <root_xp> argument, to the bottom.
     138 * It analyses the memory load & select the cluster with the lowest number of allocated
     139 * physical pages.
    135140 ****************************************************************************************
     141 * @ root_xp  : extended pointer on DQDT node root.
    136142 * @ returns the cluster identifier with the lowest memory load.
    137143 ***************************************************************************************/
    138 cxy_t dqdt_get_cluster_for_memory( void );
     144cxy_t dqdt_get_cluster_for_memory( xptr_t root_xp );
    139145
    140146/****************************************************************************************
    141147 * This function displays on kernel TXT0 the DQDT state for all nodes in the quad-tree.
    142  * It traverses the quadtree from root to bottom, and can be called by a thread
    143  * running in any cluster
     148 * It traverses the quadtree from the global root to bottom.
     149 * It can be called by a thread running in any cluster
    144150 ***************************************************************************************/
    145151void dqdt_display( void );
Note: See TracChangeset for help on using the changeset viewer.