Changes between Version 23 and Version 24 of user_applications


Ignore:
Timestamp:
Mar 31, 2016, 10:43:00 AM (9 years ago)
Author:
alain
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • user_applications

    v23 v24  
    3939== __sort__ ==
    4040
    41 This first multi-threaded application is a very simple parallel sort. The input is an array of randomly generated integers. The size of this array is a parameter, that must be a multiple of the number of threads.
     41This multi-threaded application is a very simple parallel sort. The input is an array of randomly generated integers. The size of this array is a parameter, that must be a multiple of the number of threads.
    4242
    4343It can run on a multi-processors, multi-clusters architecture, with one thread per processor core.
     
    202202
    203203The source code can be found [source:soft/giet_vm/applications/router/router.c here], and the mapping is defined [source:soft/giet_vm/applications/router/router.py here].
     204
     205== __fft__ ==
     206
     207This multi-threaded parallel application is the port of the SPLASH FFT benchmark on the GIET_VM OS.
     208
     209It performs the in-place 1D Fast Fourier Transfom for an array of N complex points, using the Cooley-Tuckey method.
     210The N data points are handled as a 2D array (rootN rows * rootN columns). This data array is implemented as
     211distributed buffers (one buffer per cluster).
     212 * The number of points N is defined by the M parameter  (N = 1<< M). It must be a power of 4.
     213 * The number of threads is equal to the number of processors in the hardware architecture. It must be a power of 2.
     214 * The number of threads must be larger or equal to the number of rows (nthreads >= rootN).     
     215Each thread is handling a subset of the  k of the rows (k = rootN / nthreads), and transpose the data array to benefit of cache locality.
     216After the sequential initialization, the parallel processing is done in five steps separated by synchronisation barriers:
     217 * first data array transpose
     218 * FFT on the columns
     219 * second data array transpose
     220 * FFT on the rows
     221 * third data array transpose
     222The N input data points be initialised in three different modes:
     223 * CONSTANT : all data points have the same [1,0] value
     224 * COSIN    : data point n has [cos(n/N) , sin(n/N)] value
     225 * RANDOM   : data points have pseudo random values
     226The instrumentation results (computation times) are stored in a file.
     227
     228It requires one TTY terminal shared by all threads.
     229
     230The source code can be found [source:soft/giet_vm/applications/fft/fft.c here], and the mapping is defined [source:soft/giet_vm/applications/fft/fft.py here].
     231