56 | | The total number of threads depends on the hardware architecture, and is computed as ( x_size * y_size * nprocs ) . The main() function is executed by the thread running on P[0,0,0]. It makes several initializations, launches all other threads (using the pthread_create() function), and calls the execute() function. When the main() function returns from the execute(), it uses the |
| 56 | The total number of threads depends on the hardware architecture, and is computed as ( x_size * y_size * nprocs ). The main() function is executed by the thread running on P[0,0,0]. It makes several initializations, launches all other threads (using the pthread_create() function), and calls itself the execute() function. When the main() function returns from the execute(), it uses the |
60 | | In each cluster[x,y], the thread running on processor P[x,y,0] uses the giet_fat_mmap() function to map the buf_in[x,y] and buf_out[x,y] buffers containing a set of lines. |
61 | | Then, all threads in cluster[x,y] read pixels from the local buf_in[x,y] buffer, and write the pixels to the remote buf_out[x,y] buffers. Finally, each thread display |
62 | | a part of the transposed image to the frame buffer. There is (image size / clusters) lines per cluster. Therefore, the data read are local, but the data write are mostly remote. |
| 60 | In each cluster[x,y], the thread running on processor P[x,y,0] uses the giet_fat_mmap() function to map the buf_in[x,y] and buf_out[x,y] buffers containing the set of lines that must be handled in cluster[x,y] : (image_size / nclusters) lines per cluster. |
| 61 | Then, all threads in cluster[x,y] read pixels from the local buf_in[x,y] buffer, and write the pixels to the remote buf_out[x,y] buffers. Finally, each thread display a part of the transposed image to the frame buffer. Therefore, the data read are local, but the data write are mostly remote. |