wiki:kernel_barriers

Version 1 (modified by alain, 10 years ago) (diff)

--

GIET-VM / Barriers access functions

The kernel_barriers.c and kernel_barriers.h files define the functions used by the kernel to accesss synchronisation barriers between several concurrent tasks.

The GIET_VM kernel define two types of barriers:

  1. The simple_barrier_t implements a non-distributed toggle barrier. the number of expected tasks can be defined by the software. It can be safely used several times.
  1. The sqt_barrier_t is physically distributed on all clusters, and is intended to avoid contention on a single cluster when a barrier is shared by a large number of tasks. It is implemented as a Synchronisation Quad Tree (SQT). For now, the number of expected tasks is defined by the number of processors specified in the mapping, and we use the smallest SQT covering all processors.

All access functions are prefixed by "_" to remind that they can only be executed by a processor in kernel mode.

The simple_barrier_t and sqt_barrier_t, structures are implemented to have one single barrier in a 64 bytes cache line, and should be aligned on a cache line boundary.

Simple barrier access functions

void _simple_barrier_init( simple_barrier_t * barrier, unsigned int ntasks )

This function initialises the barrier.

  • barrier pointer on the barrier
  • ntasks number of expected tasks.

void _simple_barrier_wait( simple_barrier_t * barrier )

This function is blocking until all expected tasks reached the barrier. It uses a toggle condition to avoid race conditions when the same barrier is used several times. It uses the _atomic_increment() kernel function to compute the number of arrived tasks.

Distributed barrier access functions

void _sqt_barrier_init( sqt_barrier_t* barrier )

This function allocates and initialises the distributed SQT barrier nodes on clusters. The number of expected tasks is defined by the NB_TOTAL_PROCS parameter defined in the hard_config.h file. The SBT footprint is computed to cover all clusters containing processors in the 2D mash (X_SIZE / Y_SIZE). The SQT can be "uncomplete" as SQT barrier nodes are only build in clusters containing processors. The actual number of SQT barriers nodes in a cluster[x][y] depends on (x,y). Ther is at least 1 node / at most 5 nodes per cluster:

  • barrier node arbitrating between all processors of 1 cluster has level 0,
  • barrier node arbitrating between all processors of 4 clusters has level 1,
  • barrier node arbitrating between all processors of 16 clusters has level 2,
  • barrier node arbitrating between all processors of 64 clusters has level 3,
  • barrier node arbitrating between all processors of 256 clusters has level 4, This function uses the _remote_malloc() function, and the distributed kernel heap segments.

void _sqt_barrier_wait( sqt_lock_t* barrier )

This function is blocking until all expected tasks reached the barrier. It uses a toggle condition to avoid race conditions when the same barrier is used several times. It uses the _atomic_increment() kernel function to compute the number of arrived tasks.