GIET-VM / Barriers access functions
The kernel_barriers.c and kernel_barriers.h files define the functions used by the kernel to accesss synchronisation barriers between several concurrent tasks.
The GIET_VM kernel define two types of barriers:
- The simple_barrier_t implements a non-distributed toggle barrier. the number of expected tasks can be defined by the software. It can be safely used several times.
- The sqt_barrier_t is physically distributed on all clusters, and is intended to avoid contention on a single cluster when a barrier is shared by a large number of tasks. It is implemented as a Synchronisation Quad Tree (SQT). For now, the number of expected tasks is defined by the number of processors specified in the mapping (NB_TOTAL_PROCS).
All access functions are prefixed by "_" to remind that they can only be executed by a processor in kernel mode.
The simple_barrier_t and sqt_barrier_t, structures are implemented to have one single barrier node in a 64 bytes cache line, and should be aligned on a cache line boundary.
Simple barrier access functions
void _simple_barrier_init( simple_barrier_t * barrier, unsigned int ntasks )
This function initialises the barrier.
- barrier pointer on the barrier
- ntasks number of expected tasks.
void _simple_barrier_wait( simple_barrier_t * barrier )
This function is blocking until all expected tasks reached the barrier. It uses a toggle condition to avoid race conditions when the same barrier is used several times. It uses the _atomic_increment() kernel function to compute the number of arrived tasks.
Distributed barrier access functions
void _sqt_barrier_init( sqt_barrier_t* barrier )
This function allocates and initialises the distributed SQT barrier nodes on clusters. The number of expected tasks is defined by the NB_TOTAL_PROCS parameter defined in the hard_config.h file. The SBT footprint is computed to cover all clusters containing processors in the 2D mash (X_SIZE / Y_SIZE). The SQT can be "uncomplete" as SQT barrier nodes are only build in clusters containing processors. The actual number of SQT barriers nodes in a cluster[x][y] depends on (x,y). Ther is at least 1 node / at most 5 nodes per cluster:
- barrier node arbitrating between all processors of 1 cluster has level 0,
- barrier node arbitrating between all processors of 4 clusters has level 1,
- barrier node arbitrating between all processors of 16 clusters has level 2,
- barrier node arbitrating between all processors of 64 clusters has level 3,
- barrier node arbitrating between all processors of 256 clusters has level 4,
This function uses the _remote_malloc() function, and the distributed kernel heap[x][y] segments.
void _sqt_barrier_wait( sqt_lock_t* barrier )
This function is blocking until all expected tasks reached the barrier. It uses a toggle condition to avoid race conditions when the same barrier is used several times. It uses the _atomic_increment() kernel function to compute the number of arrived tasks.