59 | | The input and output buffers containing the source and transposed images are allocated from the user heap distributed in all clusters. There is (image size / clusters) lines per cluster. Therefore, the data read are mostly local, but the data write are mostly remote. |
60 | | |
61 | | The number of clusters must be a power of 2 no larger than 256. |
62 | | The number of processors per cluster must be a power of 2 no larger than 4. |
| 59 | The buf_in[x,y] and buf_out[x,y] buffers containing the direct ans transposed images are distributed in clusters: |
| 60 | In each cluster[x,y], the thread running on processor P[x,y,0] uses the giet_fat_mmap() function to map the buf_in[x,y] and buf_out[x,y] buffers containing a set of lines. |
| 61 | Then, all threads in cluster[x,y] read pixels from the local buf_in[x,y] buffer, and write the pixels to the remote buf_out[x,y] buffers. Finally, each thread display |
| 62 | a part of the transposed image to the frame buffer. There is (image size / clusters) lines per cluster. Therefore, the data read are local, but the data write are mostly remote. |
| 63 | |
| 64 | * The image size must fit the frame buffer width and height, that must be power of 2. |
| 65 | * The number of clusters must be a power of 2 no larger than 256. |
| 66 | * The number of processors per cluster must be a power of 2 no larger than 4. |
| 67 | * The number of clusters cannot be larger than (image_size * image_size) / 4096, because the size of buf_in[x,y] and buf_out[x,y] must be multiple of 4096. |