| 204 | |
| 205 | == __fft__ == |
| 206 | |
| 207 | This multi-threaded parallel application is the port of the SPLASH FFT benchmark on the GIET_VM OS. |
| 208 | |
| 209 | It performs the in-place 1D Fast Fourier Transfom for an array of N complex points, using the Cooley-Tuckey method. |
| 210 | The N data points are handled as a 2D array (rootN rows * rootN columns). This data array is implemented as |
| 211 | distributed buffers (one buffer per cluster). |
| 212 | * The number of points N is defined by the M parameter (N = 1<< M). It must be a power of 4. |
| 213 | * The number of threads is equal to the number of processors in the hardware architecture. It must be a power of 2. |
| 214 | * The number of threads must be larger or equal to the number of rows (nthreads >= rootN). |
| 215 | Each thread is handling a subset of the k of the rows (k = rootN / nthreads), and transpose the data array to benefit of cache locality. |
| 216 | After the sequential initialization, the parallel processing is done in five steps separated by synchronisation barriers: |
| 217 | * first data array transpose |
| 218 | * FFT on the columns |
| 219 | * second data array transpose |
| 220 | * FFT on the rows |
| 221 | * third data array transpose |
| 222 | The N input data points be initialised in three different modes: |
| 223 | * CONSTANT : all data points have the same [1,0] value |
| 224 | * COSIN : data point n has [cos(n/N) , sin(n/N)] value |
| 225 | * RANDOM : data points have pseudo random values |
| 226 | The instrumentation results (computation times) are stored in a file. |
| 227 | |
| 228 | It requires one TTY terminal shared by all threads. |
| 229 | |
| 230 | The source code can be found [source:soft/giet_vm/applications/fft/fft.c here], and the mapping is defined [source:soft/giet_vm/applications/fft/fft.py here]. |
| 231 | |