Changeset 652 for trunk/user/transpose
- Timestamp:
- Nov 14, 2019, 3:56:51 PM (5 years ago)
- Location:
- trunk/user/transpose
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/user/transpose/transpose.c
r646 r652 5 5 ////////////////////////////////////////////////////////////////////////////////////////// 6 6 // This multi-threaded aplication read a raw image (one byte per pixel) 7 // stored on disk, transpose it, display the result on the frame buffer, 8 // and store the transposed image on disk. 9 // It can run on a multi-cores, multi-clusters architecture, with one thread 7 // stored on disk, transposes it, displays the result on the frame buffer, 8 // and stores the transposed image on disk. 10 9 // 11 // per core, and uses the POSIX threads API. 12 // It uses the mmap() syscall to directly access the input and output files 13 // and the fbf_write() syscall to display the images. 10 // The image size and the pixel encoding type are defined by the IMAGE_SIZE and 11 // IMAGE_TYPE global parameters. 14 12 // 15 // The main() function can be launched on any core[cxy,l]. 16 // It makes the initialisations, launch (N-1) threads to run the execute() function 17 // on the (N-1) other cores, calls himself the execute() function, and finally calls 18 // the instrument() function to display instrumentation results when the parallel 19 // execution is completed. The placement of threads on the cores can be done 20 // automatically by the operating system, or can be done explicitely by the main thread 21 // (when the EXPLICIT_PLACEMENT global parameter is set). 13 // It can run on a multi-cores, multi-clusters architecture, where (X_SIZE * Y_SIZE) 14 // is the number of clusters and NCORES the number of cores per cluster. 15 // A core is identified by two indexes [cxy,lid] : cxy is the cluster identifier, 16 // (that is NOT required to be a continuous index), and lid is the local core index, 17 // (that must be in the [Ø,NCORES-1] range). 22 18 // 23 // The buf_in[x,y] and buf_out[put buffers containing the direct ans transposed images 24 // are distributed in clusters: In each cluster[cxy], the thread running on core[cxy,0] 25 // map the buf_in[cxy] and // buf_out[cxy] buffers containing a subset of lines. 26 // Then, all threads in cluster[xy] read pixels from the local buf_in[cxy] buffer, and 27 // write the pixels to all remote buf_out[cxy] buffers. Finally, each thread display 28 // a part of the transposed image to the frame buffer. 19 // The main() function can run on any core in any cluster. This main thread 20 // makes the initialisations, uses the pthread_create() syscall to launch (NTHREADS-1) 21 // other threads in "attached" mode running in parallel the execute() function, calls 22 // himself the execute() function, wait completion of the (NTHREADS-1) other threads 23 // with a pthread_join(), and finally calls the instrument() function to display 24 // and register the instrumentation results when execution is completed. 25 // All threads run the execute() function, but each thread transposes only 26 // (NLINES / NTHREADS) lines. This requires that NLINES == k * NTHREADS. 27 // 28 // The number N of working threads is always defined by the number of cores availables 29 // in the architecture, but this application supports three placement modes. 30 // In all modes, the working threads are identified by the [tid] continuous index 31 // in range [0, NTHREADS-1], and defines how the lines are shared amongst the threads. 32 // This continuous index can always be decomposed in two continuous sub-indexes: 33 // tid == cid * ncores + lid, where cid is in [0,NCLUSTERS-1] and lid in [0,NCORES-1]. 34 // 35 // - NO_PLACEMENT: the main thread is itsef a working thread. The (N_1) other working 36 // threads are created by the main thread, but the placement is done by the OS, using 37 // the DQDT for load balancing, and two working threads can be placed on the same core. 38 // The [cid,lid] are only abstract identifiers, and cannot be associated to a physical 39 // cluster or a physical core. In this mode, the main thread run on any cluster, 40 // but has tid = 0 (i.e. cid = 0 & tid = 0). 41 // 42 // - EXPLICIT_PLACEMENT: the main thread is again a working thread, but the placement of 43 // of the threads on the cores is explicitely controled by the main thread to have 44 // exactly one working thread per core, and the [cxy][lpid] core coordinates for a given 45 // thread[tid] can be directly derived from the [tid] value: [cid] is an alias for the 46 // physical cluster identifier, and [lid] is the local core index. 47 // 48 // - PARALLEL_PLACEMENT: the main thread is not anymore a working thread, and uses the 49 // non standard pthread_parallel_create() function to avoid the costly sequencial 50 // loops for pthread_create() and pthread_join(). It garanty one working thread 51 // per core, and the same relation between the thread[tid] and the core[cxy][lpid]. 52 // 53 // The buf_in[x,y] and buf_out[put buffers containing the direct and transposed images 54 // are distributed in clusters: each thread[cid][0] allocate a local input buffer 55 // and load in this buffer all lines that must be handled by the threads sharing the 56 // same cid, from the mapper of the input image file. 57 // In the execute function, all threads in the group defined by the cid index read pixels 58 // from the local buf_in[cid] buffer, and write pixels to all remote buf_out[cid] buffers. 59 // Finally, each thread displays a part of the transposed image to the frame buffer. 29 60 // 30 61 // - The image must fit the frame buffer size, that must be power of 2. 31 62 // - The number of clusters must be a power of 2 no larger than 256. 32 63 // - The number of cores per cluster must be a power of 2 no larger than 4. 33 // - The number of clusters cannot be larger than (IMAGE_SIZE * IMAGE_SIZE) / 4096, 34 // because the size of buf_in[x,y] and buf_out[x,y] must be multiple of 4096. 64 // - The number of threads cannot be larger than IMAGE_SIZE. 35 65 // 36 66 ////////////////////////////////////////////////////////////////////////////////////////// … … 50 80 #define CORES_MAX 4 // max number of cores per cluster 51 81 #define CLUSTERS_MAX (X_MAX * Y_MAX) // max number of clusters 52 53 #define IMAGE_SIZE 256 // image size 82 #define THREADS_MAX (X_MAX * Y_MAX * CORES_MAX) // max number of threads 83 84 #define IMAGE_SIZE 512 // image size 54 85 #define IMAGE_TYPE 420 // pixel encoding type 55 #define INPUT_FILE_PATH "/misc/lena_256.raw" // input file pathname 56 #define OUTPUT_FILE_PATH "/home/trsp_256.raw" // output file pathname 57 86 #define INPUT_FILE_PATH "/misc/couple_512.raw" // input file pathname 87 #define OUTPUT_FILE_PATH "/misc/transposed_512.raw" // output file pathname 88 89 #define SAVE_RESULT_FILE 0 // save result image on disk 58 90 #define USE_DQT_BARRIER 1 // quad-tree barrier if non zero 59 #define EXPLICIT_PLACEMENT 1 // explicit thread placement 60 #define VERBOSE 1 // print comments on TTY 91 92 #define NO_PLACEMENT 0 // uncontrolefdthread placement 93 #define EXPLICIT_PLACEMENT 0 // explicit threads placement 94 #define PARALLEL_PLACEMENT 1 // parallel threads placement 95 96 #define VERBOSE_MAIN 0 // main function print comments 97 #define VERBOSE_EXEC 0 // exec function print comments 98 #define VERBOSE_INSTRU 0 // instru function print comments 61 99 62 100 … … 65 103 /////////////////////////////////////////////////////// 66 104 67 // instrumentation counters for each processor in each cluster 68 unsigned int MMAP_START[CLUSTERS_MAX][CORES_MAX] = {{ 0 }}; 69 unsigned int MMAP_END [CLUSTERS_MAX][CORES_MAX] = {{ 0 }}; 105 // global instrumentation counters for the main thread 106 unsigned int SEQUENCIAL_TIME = 0; 107 unsigned int PARALLEL_TIME = 0; 108 109 // instrumentation counters for each thread in each cluster 110 // indexed by [cid][lid] : cluster continuous index / thread local index 111 unsigned int LOAD_START[CLUSTERS_MAX][CORES_MAX] = {{ 0 }}; 112 unsigned int LOAD_END [CLUSTERS_MAX][CORES_MAX] = {{ 0 }}; 70 113 unsigned int TRSP_START[CLUSTERS_MAX][CORES_MAX] = {{ 0 }}; 71 114 unsigned int TRSP_END [CLUSTERS_MAX][CORES_MAX] = {{ 0 }}; … … 73 116 unsigned int DISP_END [CLUSTERS_MAX][CORES_MAX] = {{ 0 }}; 74 117 75 // arrays of pointers on distributed buffers 76 // one input buffer & one output buffer per cluster 77 unsigned char * buf_in [CLUSTERS_MAX]; 78 unsigned char * buf_out[CLUSTERS_MAX]; 79 80 // synchronisation barrier (all threads) 118 // pointer on buffer containing the input image, maped by the main to the input file 119 unsigned char * image_in; 120 121 // pointer on buffer containing the output image, maped by the main to the output file 122 unsigned char * image_out; 123 124 // arrays of pointers on distributed buffers indexed by [cid] : cluster continuous index 125 unsigned char * buf_in_ptr [CLUSTERS_MAX]; 126 unsigned char * buf_out_ptr[CLUSTERS_MAX]; 127 128 // synchronisation barrier (all working threads) 81 129 pthread_barrier_t barrier; 82 130 83 131 // platform parameters 84 unsigned int x_size; // number of clusters in a row 85 unsigned int y_size; // number of clusters in a column 86 unsigned int ncores; // number of processors per cluster 87 88 // cluster identifier & local index of core running the main thread 89 unsigned int cxy_main; 90 unsigned int lid_main; 91 92 // input & output file descriptors 93 int fd_in; 94 int fd_out; 95 96 #if EXPLICIT_PLACEMENT 97 98 // thread index allocated by the kernel 99 pthread_t trdid[CLUSTERS_MAX][CORES_MAX]; 100 101 // user defined continuous thread index 102 unsigned int tid[CLUSTERS_MAX][CORES_MAX]; 103 104 // thread attributes only used if explicit placement 105 pthread_attr_t attr[CLUSTERS_MAX][CORES_MAX]; 106 107 #else 108 109 // thread index allocated by the kernel 110 pthread_t trdid[CLUSTERS_MAX * CORES_MAX]; 111 112 // user defined continuous thread index 113 unsigned int tid[CLUSTERS_MAX * CORES_MAX]; 114 115 #endif 132 unsigned int x_size; // number of clusters in a row 133 unsigned int y_size; // number of clusters in a column 134 unsigned int ncores; // number of cores per cluster 135 136 // main thread continuous index 137 unsigned int tid_main; 116 138 117 139 //return values at thread exit … … 119 141 unsigned int THREAD_EXIT_FAILURE = 1; 120 142 143 // array of kernel thread identifiers / indexed by [tid] 144 pthread_t exec_trdid[THREADS_MAX]; 145 146 // array of execute function arguments / indexed by [tid] 147 pthread_parallel_work_args_t exec_args[THREADS_MAX]; 148 149 // array of thread attributes / indexed by [tid] 150 pthread_attr_t exec_attr[THREADS_MAX]; 151 121 152 //////////////////////////////////////////////////////////////// 122 153 // functions declaration 123 154 //////////////////////////////////////////////////////////////// 124 155 125 void execute( unsigned int * ptid);126 127 void instrument( void);128 129 /////////// 130 void main( )156 void execute( pthread_parallel_work_args_t * args ); 157 158 void instrument( FILE * f , char * filename ); 159 160 ///////////////// 161 void main( void ) 131 162 { 132 unsigned long long date; 163 unsigned long long start_cycle; 164 unsigned long long end_sequencial_cycle; 165 unsigned long long end_parallel_cycle; 166 167 char filename[32]; // instrumentation file name 168 char pathname[64]; // instrumentation file pathname 133 169 134 170 int error; 135 171 136 printf("\n bloup 0\n"); 137 138 // get identifiers for core executing main 139 get_core_id( &cxy_main , &lid_main ); 140 141 printf("\n bloup 1\n"); 172 ///////////////////////////////////////////////////////////////////////////////// 173 get_cycle( &start_cycle ); 174 ///////////////////////////////////////////////////////////////////////////////// 175 176 if( (NO_PLACEMENT + EXPLICIT_PLACEMENT + PARALLEL_PLACEMENT) != 1 ) 177 { 178 printf("\n[transpose error] illegal placement\n"); 179 exit( 0 ); 180 } 142 181 143 182 // get & check plat-form parameters 144 get_config( &x_size , &y_size , &ncores );145 146 printf("\n bloup 2\n");147 148 if((ncores != 1) && (ncores != 2) && (ncores == 4))183 get_config( &x_size, 184 &y_size, 185 &ncores ); 186 187 if((ncores != 1) && (ncores != 2) && (ncores != 4)) 149 188 { 150 189 printf("\n[transpose error] number of cores per cluster must be 1/2/4\n"); … … 166 205 } 167 206 168 printf("\n bloup 3\n"); 207 // main thread get identifiers for core executing main 208 unsigned int cxy_main; 209 unsigned int lid_main; 210 get_core_id( &cxy_main , &lid_main ); 169 211 170 212 // compute number of threads … … 172 214 unsigned int nthreads = nclusters * ncores; 173 215 174 printf("\n bloup 4\n"); 175 176 // get FBF ownership and FBF size 216 // main thread get FBF size and type 177 217 unsigned int fbf_width; 178 218 unsigned int fbf_height; … … 180 220 fbf_get_config( &fbf_width , &fbf_height , &fbf_type ); 181 221 182 printf("\n bloup 5\n");183 184 222 if( (fbf_width != IMAGE_SIZE) || (fbf_height != IMAGE_SIZE) || (fbf_type != IMAGE_TYPE) ) 185 223 { … … 188 226 } 189 227 190 get_cycle( &date ); 191 printf("\n[transpose] starts at cycle %d on %d cores / FBF = %d * %d pixels\n", 192 (unsigned int)date , nthreads , fbf_width , fbf_height ); 193 194 // open input file 195 fd_in = open( INPUT_FILE_PATH , O_RDONLY , 0 ); // read-only 196 if ( fd_in < 0 ) 228 if( nthreads > IMAGE_SIZE ) 229 { 230 printf("\n[transpose error] number of threads larger than number of lines\n"); 231 exit( 0 ); 232 } 233 234 unsigned int npixels = IMAGE_SIZE * IMAGE_SIZE; 235 236 // define instrumentation file name 237 if( NO_PLACEMENT ) 238 { 239 printf("\n[transpose] %d cluster(s) / %d core(s) / FBF[%d*%d] / PID %x / NO_PLACE\n", 240 nclusters, ncores, fbf_width, fbf_height, getpid() ); 241 242 // build instrumentation file name 243 if( USE_DQT_BARRIER ) 244 snprintf( filename , 32 , "trsp_dqt_no_place_%d_%d_%d", 245 IMAGE_SIZE , x_size * y_size , ncores ); 246 else 247 snprintf( filename , 32 , "trsp_smp_no_place_%d_%d_%d", 248 IMAGE_SIZE , x_size * y_size , ncores ); 249 } 250 251 if( EXPLICIT_PLACEMENT ) 252 { 253 printf("\n[transpose] %d cluster(s) / %d core(s) / FBF[%d*%d] / PID %x / EXPLICIT\n", 254 nclusters, ncores, fbf_width, fbf_height, getpid() ); 255 256 // build instrumentation file name 257 if( USE_DQT_BARRIER ) 258 snprintf( filename , 32 , "trsp_dqt_explicit_%d_%d_%d", 259 IMAGE_SIZE , x_size * y_size , ncores ); 260 else 261 snprintf( filename , 32 , "trsp_smp_explicit_%d_%d_%d", 262 IMAGE_SIZE , x_size * y_size , ncores ); 263 } 264 265 if( PARALLEL_PLACEMENT ) 266 { 267 printf("\n[transpose] %d cluster(s) / %d core(s) / FBF[%d*%d] / PID %x / PARALLEL\n", 268 nclusters, ncores, fbf_width, fbf_height, getpid() ); 269 270 // build instrumentation file name 271 if( USE_DQT_BARRIER ) 272 snprintf( filename , 32 , "trsp_dqt_parallel_%d_%d_%d", 273 IMAGE_SIZE , x_size * y_size , ncores ); 274 else 275 snprintf( filename , 32 , "trsp_smp_parallel_%d_%d_%d", 276 IMAGE_SIZE , x_size * y_size , ncores ); 277 } 278 279 // open instrumentation file 280 snprintf( pathname , 64 , "/home/%s", filename ); 281 FILE * f = fopen( pathname , NULL ); 282 if ( f == NULL ) 197 283 { 198 printf("\n[transpose error] main cannot open file %s\n", INPUT_FILE_PATH ); 199 exit( 0 ); 200 } 201 202 #if VERBOSE 203 printf("\n[transpose] main open file %s / fd = %d\n", INPUT_FILE_PATH , fd_in ); 204 #endif 205 206 // open output file 207 fd_out = open( OUTPUT_FILE_PATH , O_CREAT , 0 ); // create if required 208 if ( fd_out < 0 ) 209 { 210 printf("\n[transpose error] main cannot open file %s\n", OUTPUT_FILE_PATH ); 211 exit( 0 ); 212 } 213 214 #if VERBOSE 215 printf("\n[transpose] main open file %s / fd = %d\n", OUTPUT_FILE_PATH , fd_out ); 216 #endif 217 218 // initialise barrier 284 printf("\n[transpose error] cannot open instrumentation file %s\n", pathname ); 285 exit( 0 ); 286 } 287 288 #if VERBOSE_MAIN 289 printf("\n[transpose] main on core[%x,%d] open instrumentation file %s\n", 290 cxy_main, lid_main, pathname ); 291 #endif 292 293 // main thread initializes barrier 219 294 if( USE_DQT_BARRIER ) 220 295 { … … 236 311 } 237 312 238 get_cycle( &date ); 239 printf("\n[transpose] main on core[%x,%d] completes initialisation at cycle %d\n" 240 "- CLUSTERS = %d\n" 241 "- PROCS = %d\n" 242 "- THREADS = %d\n", 243 cxy_main, lid_main, (unsigned int)date, nclusters, ncores, nthreads ); 244 245 ////////////////////// 246 #if EXPLICIT_PLACEMENT 247 248 // main thread launch other threads 249 unsigned int x; 250 unsigned int y; 251 unsigned int l; 252 unsigned int cxy; 253 for( x = 0 ; x < x_size ; x++ ) 254 { 255 for( y = 0 ; y < y_size ; y++ ) 313 #if VERBOSE_MAIN 314 printf("\n[transpose] main on core[%x,%d] completes barrier initialisation\n", 315 cxy_main, lid_main ); 316 #endif 317 318 // main thread open input file 319 int fd_in = open( INPUT_FILE_PATH , O_RDONLY , 0 ); 320 321 if ( fd_in < 0 ) 322 { 323 printf("\n[transpose error] main cannot open file %s\n", INPUT_FILE_PATH ); 324 exit( 0 ); 325 } 326 327 #if VERBOSE_MAIN 328 printf("\n[transpose] main open file <%s> / fd = %d\n", INPUT_FILE_PATH , fd_in ); 329 #endif 330 331 // main thread map image_in buffer to input image file 332 image_in = (unsigned char *)mmap( NULL, 333 npixels, 334 PROT_READ, 335 MAP_FILE | MAP_SHARED, 336 fd_in, 337 0 ); // offset 338 if ( image_in == NULL ) 339 { 340 printf("\n[transpose error] main cannot map buffer to file %s\n", INPUT_FILE_PATH ); 341 exit( 0 ); 342 } 343 344 #if VERBOSE_MAIN 345 printf("\n[transpose] main map buffer to file <%s>\n", INPUT_FILE_PATH ); 346 #endif 347 348 // main thread display input image on FBF 349 if( fbf_write( image_in, 350 npixels, 351 0 ) ) 352 { 353 printf("\n[transpose error] main cannot access FBF\n"); 354 exit( 0 ); 355 } 356 357 #if SAVE_RESULT_IMAGE 358 359 // main thread open output file 360 int fd_out = open( OUTPUT_FILE_PATH , O_CREAT , 0 ); 361 362 if ( fd_out < 0 ) 363 { 364 printf("\n[transpose error] main cannot open file %s\n", OUTPUT_FILE_PATH ); 365 exit( 0 ); 366 } 367 368 #if VERBOSE_MAIN 369 printf("\n[transpose] main open file <%s> / fd = %d\n", OUTPUT_FILE_PATH , fd_out ); 370 #endif 371 372 // main thread map image_out buffer to output image file 373 image_out = (unsigned char *)mmap( NULL, 374 npixels, 375 PROT_WRITE, 376 MAP_FILE | MAP_SHARED, 377 fd_out, 378 0 ); // offset 379 if ( image_out == NULL ) 380 { 381 printf("\n[transpose error] main cannot map buf_out to file %s\n", OUTPUT_FILE_PATH ); 382 exit( 0 ); 383 } 384 385 #if VERBOSE_MAIN 386 printf("\n[transpose] main map buffer to file <%s>\n", OUTPUT_FILE_PATH ); 387 #endif 388 389 #endif // SAVE_RESULT_IMAGE 390 391 ///////////////////////////////////////////////////////////////////////////////////// 392 get_cycle( &end_sequencial_cycle ); 393 SEQUENCIAL_TIME = (unsigned int)(end_sequencial_cycle - start_cycle); 394 ///////////////////////////////////////////////////////////////////////////////////// 395 396 ////////////////// 397 if( NO_PLACEMENT ) 398 { 399 // the tid value for the main thread is always 0 400 // main thread creates new threads with tid in [1,nthreads-1] 401 unsigned int tid; 402 for ( tid = 0 ; tid < nthreads ; tid++ ) 256 403 { 257 cxy = HAL_CXY_FROM_XY( x , y ); 258 for( l = 0 ; l < ncores ; l++ ) 404 // register tid value in exec_args[tid] array 405 exec_args[tid].tid = tid; 406 407 // create other threads 408 if( tid > 0 ) 259 409 { 260 // no other thread on the core running the main 261 if( (cxy != cxy_main) || (l != lid_main) ) 410 if ( pthread_create( &exec_trdid[tid], 411 NULL, // no attribute 412 &execute, 413 &exec_args[tid] ) ) 262 414 { 263 // define thread attributes 264 attr[cxy][l].attributes = PT_ATTR_CLUSTER_DEFINED | PT_ATTR_CORE_DEFINED; 265 attr[cxy][l].cxy = cxy; 266 attr[cxy][l].lid = l; 267 268 tid[cxy][l] = (((x * y_size) + y) * ncores) + l; 415 printf("\n[transpose error] cannot create thread %d\n", tid ); 416 exit( 0 ); 417 } 418 419 #if VERBOSE_MAIN 420 printf("\n[transpose] main created thread %d\n", tid ); 421 #endif 422 423 } 424 else 425 { 426 tid_main = 0; 427 } 428 } // end for tid 429 430 // main thread calls itself the execute() function 431 execute( &exec_args[0] ); 432 433 // main thread wait other threads completion 434 for ( tid = 1 ; tid < nthreads ; tid++ ) 435 { 436 unsigned int * status; 437 438 // main wait thread[tid] status 439 if ( pthread_join( exec_trdid[tid], (void*)(&status)) ) 440 { 441 printf("\n[transpose error] main cannot join thread %d\n", tid ); 442 exit( 0 ); 443 } 444 445 // check status 446 if( *status != THREAD_EXIT_SUCCESS ) 447 { 448 printf("\n[transpose error] thread %x returned failure\n", tid ); 449 exit( 0 ); 450 } 451 452 #if VERBOSE_MAIN 453 printf("\n[transpose] main successfully joined thread %x\n", tid ); 454 #endif 455 456 } // end for tid 457 458 } // end if no_placement 459 460 //////////////////////// 461 if( EXPLICIT_PLACEMENT ) 462 { 463 // main thread places each other threads on a specific core[cxy][lid] 464 // but the actual thread creation is sequencial 465 unsigned int x; 466 unsigned int y; 467 unsigned int l; 468 unsigned int cxy; // cluster identifier 469 unsigned int tid; // thread continuous index 470 471 for( x = 0 ; x < x_size ; x++ ) 472 { 473 for( y = 0 ; y < y_size ; y++ ) 474 { 475 cxy = HAL_CXY_FROM_XY( x , y ); 476 for( l = 0 ; l < ncores ; l++ ) 477 { 478 // compute thread continuous index 479 tid = (((x * y_size) + y) * ncores) + l; 480 481 // register tid value in exec_args[tid] array 482 exec_args[tid].tid = tid; 483 484 // no thread created on the core running the main 485 if( (cxy != cxy_main) || (l != lid_main) ) 486 { 487 // define thread attributes 488 exec_attr[tid].attributes = PT_ATTR_CLUSTER_DEFINED | 489 PT_ATTR_CORE_DEFINED; 490 exec_attr[tid].cxy = cxy; 491 exec_attr[tid].lid = l; 269 492 270 // create thread on core[cxy,l] 271 if (pthread_create( &trdid[cxy][l], 272 &attr[cxy][l], 273 &execute, 274 &tid[cxy][l] ) ) 493 // create thread[tid] on core[cxy][l] 494 if ( pthread_create( &exec_trdid[tid], 495 &exec_attr[tid], 496 &execute, 497 &exec_args[tid] ) ) 498 { 499 printf("\n[transpose error] cannot create thread %d\n", tid ); 500 exit( 0 ); 501 } 502 #if VERBOSE_MAIN 503 printf("\n[transpose] main created thread[%d] on core[%x,%d]\n", tid, cxy, l ); 504 #endif 505 } 506 else 275 507 { 276 printf("\n[convol error] created thread %x on core[%x][%d]\n", 277 trdid[cxy][l] , cxy , l ); 278 exit( 0 ); 508 tid_main = tid; 279 509 } 280 #if VERBOSE281 printf("\n[transpose] main created thread[%x,%d]\n", cxy, l );282 #endif283 510 } 284 511 } 285 512 } 286 } 287 288 // main thread calls itself the execute() function 289 execute( &tid[cxy_main][lid_main] ); 290 291 // main thread wait other threads completion 292 for( x = 0 ; x < x_size ; x++ ) 293 { 294 for( y = 0 ; y < y_size ; y++ ) 513 514 // main thread calls itself the execute() function 515 execute( &exec_args[tid_main] ); 516 517 // main thread wait other threads completion 518 for( tid = 0 ; tid < nthreads ; tid++ ) 295 519 { 296 cxy = HAL_CXY_FROM_XY( x , y );297 for( l = 0 ; l < ncores ; l++)520 // no other thread on the core running the main 521 if( tid != tid_main ) 298 522 { 299 // no other thread on the core running the main 300 if( (cxy != cxy_main) || (l != lid_main) ) 523 unsigned int * status; 524 525 // wait thread[tid] 526 if( pthread_join( exec_trdid[tid] , (void*)(&status) ) ) 301 527 { 302 unsigned int * status; 303 304 // wait thread[cxy][l] 305 if( pthread_join( trdid[cxy][l] , (void*)(&status) ) ) 306 { 307 printf("\n[transpose error] main cannot join thread[%x,%d]\n", cxy, l ); 308 exit( 0 ); 309 } 528 printf("\n[transpose error] main cannot join thread %d\n", tid ); 529 exit( 0 ); 530 } 310 531 311 // check status 312 if( *status != THREAD_EXIT_SUCCESS ) 313 { 314 printf("\n[transpose error] thread[%x,%d] returned failure\n", cxy, l ); 315 exit( 0 ); 316 } 317 #if VERBOSE 318 printf("\n[transpose] main joined thread[%x,%d]\n", cxy, l ); 319 #endif 532 // check status 533 if( *status != THREAD_EXIT_SUCCESS ) 534 { 535 printf("\n[transpose error] thread %d returned failure\n", tid ); 536 exit( 0 ); 320 537 } 538 #if VERBOSE_MAIN 539 printf("\n[transpose] main joined thread %d on core[%x,%d]\n", tid , cxy , l ); 540 #endif 321 541 } 322 542 } 323 } 324 325 ///////////////////////////////326 #else // no explicit placement 327 328 // main thread launch other threads329 unsigned int n;330 for ( n = 1 ; n < nthreads ; n++ )331 {332 tid[n] = n;333 if ( pthread_create( &trdid[n],334 NULL, // no attribute 335 &execute,336 &tid[n] ) )543 } // end if explicit_placement 544 545 //////////////////////// 546 if( PARALLEL_PLACEMENT ) 547 { 548 // compute covering DQT size an level 549 unsigned int z = (x_size > y_size) ? x_size : y_size; 550 unsigned int root_level = ((z == 1) ? 0 : 551 ((z == 2) ? 1 : 552 ((z == 4) ? 2 : 553 ((z == 8) ? 3 : 4)))); 554 555 // create & execute the working threads 556 if( pthread_parallel_create( root_level , &execute ) ) 337 557 { 338 printf("\n[transpose error] cannot create thread %d\n", n);558 printf("\n[transpose error] in %s\n", __FUNCTION__ ); 339 559 exit( 0 ); 340 560 } 341 342 #if VERBOSE 343 printf("\n[transpose] main created thread %d\n", tid[n] ); 344 #endif 345 346 } 347 348 // main thread calls itself the execute() function 349 execute( &tid[0] ); 350 351 // main thread wait other threads completion 352 for ( n = 1 ; n < nthreads ; n++ ) 353 { 354 unsigned int * status; 355 356 // main wait thread[n] status 357 if ( pthread_join( trdid[n], (void*)(&status)) ) 358 { 359 printf("\n[transpose error] main cannot join thread %d\n", n ); 360 exit( 0 ); 361 } 362 363 // check status 364 if( *status != THREAD_EXIT_SUCCESS ) 365 { 366 printf("\n[transpose error] thread %x returned failure\n", n ); 367 exit( 0 ); 368 } 369 370 #if VERBOSE 371 printf("\n[transpose] main successfully joined thread %x\n", tid[n] ); 372 #endif 373 374 } 375 376 #endif 377 378 // instrumentation 379 instrument(); 380 381 // close input and output files 561 } // end if parallel_placement 562 563 564 ///////////////////////////////////////////////////////////////////////////// 565 get_cycle( &end_parallel_cycle ); 566 PARALLEL_TIME = (unsigned int)(end_parallel_cycle - end_sequencial_cycle); 567 ///////////////////////////////////////////////////////////////////////////// 568 569 // main thread register instrumentation results 570 instrument( f , filename ); 571 572 // main thread close input file 382 573 close( fd_in ); 574 575 #if SAVE_RESULT_IMAGE 576 577 // main thread close output file 383 578 close( fd_out ); 384 579 385 // suicide 580 #endif 581 582 // main close instrumentation file 583 fclose( f ); 584 585 // main thread suicide 386 586 exit( 0 ); 387 587 … … 390 590 391 591 392 /////////////////////////////////// 393 void execute( unsigned int * ptid ) 592 593 /////////////////////////////////////////////////// 594 void execute( pthread_parallel_work_args_t * args ) 394 595 { 395 596 unsigned long long date; 396 597 397 unsigned int l; // line index for loops 398 unsigned int p; // pixel index for loops 399 400 // get thread continuous index 401 unsigned int my_tid = *ptid; 598 unsigned int l; // line index for loop 599 unsigned int p; // pixel index for loop 600 601 // WARNING 602 //A thread is identified by the tid index, defined in the "args" structure. 603 // This index being in range [0,nclusters*ncores-1] we can always write 604 // tid == cid * ncores + lid 605 // with cid in [0,nclusters-1] and lid in [0,ncores-1]. 606 // if NO_PLACEMENT, there is no relation between these 607 // thread [cid][lid] indexes, and the core coordinates [cxy][lpid] 608 609 // get thread abstract identifiers 610 unsigned int tid = args->tid; 611 unsigned int cid = tid / ncores; 612 unsigned int lid = tid % ncores; 613 614 #if VERBOSE_EXEC 615 unsigned int cxy; 616 unsigned int lpid; 617 get_core_id( &cxy , &lpid ); // get core physical identifiers 618 printf("\n[transpose] exec[%d] on core[%x,%d] enters parallel exec\n", 619 tid , cxy , lpid ); 620 #endif 621 622 get_cycle( &date ); 623 LOAD_START[cid][lid] = (unsigned int)date; 402 624 403 625 // build total number of pixels per image 404 626 unsigned int npixels = IMAGE_SIZE * IMAGE_SIZE; 405 627 406 // nuild total number of threads and clusters 407 unsigned int nthreads = x_size * y_size * ncores; 628 // build total number of threads and clusters 408 629 unsigned int nclusters = x_size * y_size; 409 410 // get cluster continuous index and core index from tid 411 // we use (tid == cid * ncores + lid) 412 unsigned int cid = my_tid / ncores; // continuous index 413 unsigned int lid = my_tid % ncores; // core local index 414 415 // get cluster identifier from cid 416 // we use (cid == x * y_size + y) 417 unsigned int x = cid / y_size; // X cluster coordinate 418 unsigned int y = cid % y_size; // Y cluster coordinate 419 unsigned int cxy = HAL_CXY_FROM_XY(x,y); 420 421 #if VERBOSE 422 printf("\n[transpose] thread[%d] start on core[%x,%d]\n", my_tid , cxy , lid ); 423 #endif 424 425 // In each cluster cxy, thread[cxy,0] map input file 426 // to buf_in[cxy] and map output file to buf_in[cxy] 427 428 get_cycle( &date ); 429 MMAP_START[cxy][lid] = (unsigned int)date; 430 431 if ( lid == 0 ) 432 { 433 unsigned int length = npixels / nclusters; 434 unsigned int offset = length * cid; 435 436 // map buf_in 437 buf_in[cid] = mmap( NULL, 438 length, 439 PROT_READ, 440 MAP_SHARED, 441 fd_in, 442 offset ); 443 444 if ( buf_in[cid] == NULL ) 630 unsigned int nthreads = nclusters * ncores; 631 632 unsigned int buf_size = npixels / nclusters; // number of bytes in buf_in & buf_out 633 unsigned int offset = cid * buf_size; // offset in file (bytes) 634 635 unsigned char * buf_in = NULL; // private pointer on local input buffer 636 unsigned char * buf_out = NULL; // private pointer on local output buffer 637 638 // Each thread[cid,0] allocate a local buffer buf_in, and register 639 // the base adress in the global variable buf_in_ptr[cid] 640 // this local buffer is shared by all threads with the same cid 641 if( lid == 0 ) 642 { 643 // allocate buf_in 644 buf_in = (unsigned char *)malloc( buf_size ); 645 646 if( buf_in == NULL ) 445 647 { 446 printf("\n[transpose error] thread[% x,%d] cannot map input file\n", cxy, lid);648 printf("\n[transpose error] thread[%d] cannot allocate buf_in\n", tid ); 447 649 pthread_exit( &THREAD_EXIT_FAILURE ); 448 650 } 449 450 #if VERBOSE 451 printf("\n[transpose] thread[%x,%d] map input file / length %x / offset %x / buf_in %x\n", 452 cxy, lid, length, offset, buf_in[cid] ); 453 #endif 454 455 // map buf_out 456 buf_out[cid] = mmap( NULL, 457 length, 458 PROT_WRITE, 459 MAP_SHARED, 460 fd_out, 461 offset ); 462 463 if ( buf_out[cid] == NULL ) 651 652 // register buf_in buffer in global array of pointers 653 buf_in_ptr[cid] = buf_in; 654 655 #if VERBOSE_EXEC 656 printf("\n[transpose] exec[%d] on core[%x,%d] allocated buf_in = %x\n", 657 tid , cxy , lpid , buf_in ); 658 #endif 659 660 } 661 662 // Each thread[cid,0] copy relevant part of the image_in to buf_in 663 if( lid == 0 ) 664 { 665 memcpy( buf_in, 666 image_in + offset, 667 buf_size ); 668 } 669 670 #if VERBOSE_EXEC 671 printf("\n[transpose] exec[%d] on core[%x,%d] loaded buf_in[%d]\n", 672 tid , cxy , lpid , cid ); 673 #endif 674 675 // Each thread[cid,0] allocate a local buffer buf_out, and register 676 // the base adress in the global variable buf_out_ptr[cid] 677 if( lid == 0 ) 678 { 679 // allocate buf_out 680 buf_out = (unsigned char *)malloc( buf_size ); 681 682 if( buf_out == NULL ) 464 683 { 465 printf("\n[transpose error] thread[% x,%d] cannot map output file\n", cxy, lid);684 printf("\n[transpose error] thread[%d] cannot allocate buf_in\n", tid ); 466 685 pthread_exit( &THREAD_EXIT_FAILURE ); 467 686 } 468 469 #if VERBOSE 470 printf("\n[transpose] thread[%x,%d] map output file / length %x / offset %x / buf_out %x\n", 471 cxy, lid, length, offset, buf_out[cid] ); 472 #endif 473 474 } 475 687 688 // register buf_in buffer in global array of pointers 689 buf_out_ptr[cid] = buf_out; 690 691 #if VERBOSE_EXEC 692 printf("\n[transpose] exec[%d] on core[%x,%d] allocated buf_out = %x\n", 693 tid , cxy , lpid , buf_out ); 694 #endif 695 696 } 697 476 698 get_cycle( &date ); 477 MMAP_END[cxy][lid] = (unsigned int)date;699 LOAD_END[cid][lid] = (unsigned int)date; 478 700 479 701 ///////////////////////////////// 480 702 pthread_barrier_wait( &barrier ); 481 703 482 // parallel transpose from buf_in to buf_out 483 // each thread makes the transposition for nlt lines (nlt = IMAGE_SIZE/nthreads) 704 get_cycle( &date ); 705 TRSP_START[cid][lid] = (unsigned int)date; 706 707 // All threads contribute to parallel transpose from buf_in to buf_out 708 // each thread makes the transposition for nlt lines (nlt = npixels/nthreads) 484 709 // from line [tid*nlt] to line [(tid + 1)*nlt - 1] 485 710 // (p,l) are the absolute pixel coordinates in the source image 711 // (l,p) are the absolute pixel coordinates in the source image 712 // (p,l) are the absolute pixel coordinates in the dest image 486 713 487 714 get_cycle( &date ); 488 TRSP_START[c xy][lid] = (unsigned int)date;715 TRSP_START[cid][lid] = (unsigned int)date; 489 716 490 717 unsigned int nlt = IMAGE_SIZE / nthreads; // number of lines per thread 491 718 unsigned int nlc = IMAGE_SIZE / nclusters; // number of lines per cluster 492 719 493 unsigned int src_c luster;720 unsigned int src_cid; 494 721 unsigned int src_index; 495 unsigned int dst_c luster;722 unsigned int dst_cid; 496 723 unsigned int dst_index; 497 724 498 725 unsigned char byte; 499 726 500 unsigned int first = my_tid * nlt; // first line index for a given thread727 unsigned int first = tid * nlt; // first line index for a given thread 501 728 unsigned int last = first + nlt; // last line index for a given thread 502 729 730 // loop on lines handled by this thread 503 731 for ( l = first ; l < last ; l++ ) 504 732 { 505 // in each iteration we transfer one byte733 // loop on pixels in one line (one pixel per iteration) 506 734 for ( p = 0 ; p < IMAGE_SIZE ; p++ ) 507 735 { 508 736 // read one byte from local buf_in 509 src_cluster = l / nlc; 510 src_index = (l % nlc) * IMAGE_SIZE + p; 511 byte = buf_in[src_cluster][src_index]; 737 src_cid = l / nlc; 738 src_index = (l % nlc) * IMAGE_SIZE + p; 739 740 byte = buf_in_ptr[src_cid][src_index]; 512 741 513 742 // write one byte to remote buf_out 514 dst_c luster= p / nlc;515 dst_index 516 517 buf_out [dst_cluster][dst_index] = byte;743 dst_cid = p / nlc; 744 dst_index = (p % nlc) * IMAGE_SIZE + l; 745 746 buf_out_ptr[dst_cid][dst_index] = byte; 518 747 } 519 748 } 520 749 521 #if VERBOSE 522 printf("\n[transpose] thread[%x,%d] completes transposed\n", cxy, lid ); 750 #if VERBOSE_EXEC 751 printf("\n[transpose] exec[%d] on core[%x,%d] completes transpose\n", 752 tid , cxy , lpid ); 523 753 #endif 524 754 525 755 get_cycle( &date ); 526 TRSP_END[c xy][lid] = (unsigned int)date;756 TRSP_END[cid][lid] = (unsigned int)date; 527 757 528 758 ///////////////////////////////// 529 759 pthread_barrier_wait( &barrier ); 530 760 531 // parallel display from local buf_out to frame buffer532 // all threads contribute to display533 534 761 get_cycle( &date ); 535 DISP_START[cxy][lid] = (unsigned int)date; 536 762 DISP_START[cid][lid] = (unsigned int)date; 763 764 // All threads contribute to parallel display 765 // from local buf_out to frame buffer 537 766 unsigned int npt = npixels / nthreads; // number of pixels per thread 538 767 539 if( fbf_write( &buf_out [cid][lid * npt],768 if( fbf_write( &buf_out_ptr[cid][lid * npt], 540 769 npt, 541 npt * my_tid ) )542 { 543 printf("\n[transpose error] thread[% x,%d] cannot access FBF\n", cxy, lid );770 npt * tid ) ) 771 { 772 printf("\n[transpose error] thread[%d] cannot access FBF\n", tid ); 544 773 pthread_exit( &THREAD_EXIT_FAILURE ); 545 774 } 546 775 547 #if VERBOSE 548 printf("\n[transpose] thread[%x,%d] completes display\n", cxy, lid ); 776 #if VERBOSE_EXEC 777 printf("\n[transpose] exec[%d] on core [%x,%d] completes display\n", 778 tid, cxy , lpid ); 549 779 #endif 550 780 551 781 get_cycle( &date ); 552 DISP_END[c xy][lid] = (unsigned int)date;782 DISP_END[cid][lid] = (unsigned int)date; 553 783 554 784 ///////////////////////////////// 555 785 pthread_barrier_wait( &barrier ); 556 786 557 // all threads, but thread[0,0,0], suicide 558 if ( (cxy != cxy_main) || (lid != lid_main) ) 559 { 787 #if SAVE_RESULT_IMAGE 788 789 // Each thread[cid,0] copy buf_out to relevant part of image_out 790 if( lid == 0 ) 791 { 792 memcpy( image_out + offset, 793 buf_out, 794 buf_size ); 795 } 796 797 #if VERBOSE_EXEC 798 printf("\n[transpose] exec[%d] on core[%x,%d] saved buf_out[%d]\n", 799 tid , cxy , lpid , cid ); 800 #endif 801 802 #endif 803 804 // Each thread[cid,0] releases local buffer buf_out 805 if( lid == 0 ) 806 { 807 // release buf_out 808 free( buf_in ); 809 free( buf_out ); 810 } 811 812 // thread termination depends on the placement policy 813 if( PARALLEL_PLACEMENT ) 814 { 815 // <work> threads are runing in detached mode 816 // each thread must signal completion by calling barrier 817 // passed in arguments before exit 818 819 pthread_barrier_wait( args->barrier ); 820 560 821 pthread_exit( &THREAD_EXIT_SUCCESS ); 561 822 } 823 else 824 { 825 // <work> threads are running in attached mode 826 // each thread, but de main, simply exit 827 if ( tid != tid_main ) pthread_exit( &THREAD_EXIT_SUCCESS ); 828 } 562 829 563 830 } // end execute() … … 565 832 566 833 567 /////////////////////// 568 void instrument( void ) 834 /////////////////////////// 835 void instrument( FILE * f, 836 char * filename ) 569 837 { 570 838 unsigned int x, y, l; 839 840 #if VERBOSE_EXEC 841 printf("\n[transpose] main enters instrument\n" ); 842 #endif 571 843 572 844 unsigned int min_load_start = 0xFFFFFFFF; … … 583 855 unsigned int max_disp_ended = 0; 584 856 585 char string[64];586 587 snprintf( string , 64 , "/home/transpose_%d_%d_%d" , x_size , y_size , ncores );588 589 // open instrumentation file590 FILE * f = fopen( string , NULL );591 if ( f == NULL )592 {593 printf("\n[transpose error] cannot open instrumentation file %s\n", string );594 exit( 0 );595 }596 597 857 for (x = 0; x < x_size; x++) 598 858 { 599 859 for (y = 0; y < y_size; y++) 600 860 { 601 unsigned int c xy = HAL_CXY_FROM_XY( x , y );861 unsigned int cid = y_size * x + y; 602 862 603 863 for ( l = 0 ; l < ncores ; l++ ) 604 864 { 605 if ( MMAP_START[cxy][l] < min_load_start) min_load_start = MMAP_START[cxy][l];606 if ( MMAP_START[cxy][l] > max_load_start) max_load_start = MMAP_START[cxy][l];607 if ( MMAP_END[cxy][l] < min_load_ended) min_load_ended = MMAP_END[cxy][l];608 if ( MMAP_END[cxy][l] > max_load_ended) max_load_ended = MMAP_END[cxy][l];609 if (TRSP_START[c xy][l] < min_trsp_start) min_trsp_start = TRSP_START[cxy][l];610 if (TRSP_START[c xy][l] > max_trsp_start) max_trsp_start = TRSP_START[cxy][l];611 if (TRSP_END[c xy][l] < min_trsp_ended) min_trsp_ended = TRSP_END[cxy][l];612 if (TRSP_END[c xy][l] > max_trsp_ended) max_trsp_ended = TRSP_END[cxy][l];613 if (DISP_START[c xy][l] < min_disp_start) min_disp_start = DISP_START[cxy][l];614 if (DISP_START[c xy][l] > max_disp_start) max_disp_start = DISP_START[cxy][l];615 if (DISP_END[c xy][l] < min_disp_ended) min_disp_ended = DISP_END[cxy][l];616 if (DISP_END[c xy][l] > max_disp_ended) max_disp_ended = DISP_END[cxy][l];865 if (LOAD_START[cid][l] < min_load_start) min_load_start = LOAD_START[cid][l]; 866 if (LOAD_START[cid][l] > max_load_start) max_load_start = LOAD_START[cid][l]; 867 if (LOAD_END[cid][l] < min_load_ended) min_load_ended = LOAD_END[cid][l]; 868 if (LOAD_END[cid][l] > max_load_ended) max_load_ended = LOAD_END[cid][l]; 869 if (TRSP_START[cid][l] < min_trsp_start) min_trsp_start = TRSP_START[cid][l]; 870 if (TRSP_START[cid][l] > max_trsp_start) max_trsp_start = TRSP_START[cid][l]; 871 if (TRSP_END[cid][l] < min_trsp_ended) min_trsp_ended = TRSP_END[cid][l]; 872 if (TRSP_END[cid][l] > max_trsp_ended) max_trsp_ended = TRSP_END[cid][l]; 873 if (DISP_START[cid][l] < min_disp_start) min_disp_start = DISP_START[cid][l]; 874 if (DISP_START[cid][l] > max_disp_start) max_disp_start = DISP_START[cid][l]; 875 if (DISP_END[cid][l] < min_disp_ended) min_disp_ended = DISP_END[cid][l]; 876 if (DISP_END[cid][l] > max_disp_ended) max_disp_ended = DISP_END[cid][l]; 617 877 } 618 878 } 619 879 } 620 880 621 printf( "\n ------ %s ------\n" , string ); 622 fprintf( f , "\n ------ %s ------\n" , string ); 623 624 printf( " - MMAP_START : min = %d / max = %d / med = %d / delta = %d\n", 625 min_load_start, max_load_start, (min_load_start+max_load_start)/2, 626 max_load_start-min_load_start ); 627 628 fprintf( f , " - MMAP_START : min = %d / max = %d / med = %d / delta = %d\n", 629 min_load_start, max_load_start, (min_load_start+max_load_start)/2, 630 max_load_start-min_load_start ); 631 632 printf( " - MMAP_END : min = %d / max = %d / med = %d / delta = %d\n", 633 min_load_ended, max_load_ended, (min_load_ended+max_load_ended)/2, 634 max_load_ended-min_load_ended ); 635 636 fprintf( f , " - MMAP_END : min = %d / max = %d / med = %d / delta = %d\n", 637 min_load_ended, max_load_ended, (min_load_ended+max_load_ended)/2, 638 max_load_ended-min_load_ended ); 639 640 printf( " - TRSP_START : min = %d / max = %d / med = %d / delta = %d\n", 641 min_trsp_start, max_trsp_start, (min_trsp_start+max_trsp_start)/2, 642 max_trsp_start-min_trsp_start ); 643 644 fprintf( f , " - TRSP_START : min = %d / max = %d / med = %d / delta = %d\n", 645 min_trsp_start, max_trsp_start, (min_trsp_start+max_trsp_start)/2, 646 max_trsp_start-min_trsp_start ); 647 648 printf( " - TRSP_END : min = %d / max = %d / med = %d / delta = %d\n", 649 min_trsp_ended, max_trsp_ended, (min_trsp_ended+max_trsp_ended)/2, 650 max_trsp_ended-min_trsp_ended ); 651 652 fprintf( f , " - TRSP_END : min = %d / max = %d / med = %d / delta = %d\n", 653 min_trsp_ended, max_trsp_ended, (min_trsp_ended+max_trsp_ended)/2, 654 max_trsp_ended-min_trsp_ended ); 655 656 printf( " - DISP_START : min = %d / max = %d / med = %d / delta = %d\n", 657 min_disp_start, max_disp_start, (min_disp_start+max_disp_start)/2, 658 max_disp_start-min_disp_start ); 659 660 fprintf( f , " - DISP_START : min = %d / max = %d / med = %d / delta = %d\n", 661 min_disp_start, max_disp_start, (min_disp_start+max_disp_start)/2, 662 max_disp_start-min_disp_start ); 663 664 printf( " - DISP_END : min = %d / max = %d / med = %d / delta = %d\n", 665 min_disp_ended, max_disp_ended, (min_disp_ended+max_disp_ended)/2, 666 max_disp_ended-min_disp_ended ); 667 668 fprintf( f , " - DISP_END : min = %d / max = %d / med = %d / delta = %d\n", 669 min_disp_ended, max_disp_ended, (min_disp_ended+max_disp_ended)/2, 670 max_disp_ended-min_disp_ended ); 671 672 fclose( f ); 881 printf( "\n ------ %s ------\n" , filename ); 882 fprintf( f , "\n ------ %s ------\n" , filename ); 883 884 printf( " - LOAD_START : min = %d / max = %d / delta = %d\n", 885 min_load_start, max_load_start, max_load_start-min_load_start ); 886 fprintf( f , " - LOAD_START : min = %d / max = %d / delta = %d\n", 887 min_load_start, max_load_start, max_load_start-min_load_start ); 888 889 printf( " - LOAD_END : min = %d / max = %d / delta = %d\n", 890 min_load_ended, max_load_ended, max_load_ended-min_load_ended ); 891 fprintf( f , " - LOAD_END : min = %d / max = %d / delta = %d\n", 892 min_load_ended, max_load_ended, max_load_ended-min_load_ended ); 893 894 printf( " - TRSP_START : min = %d / max = %d / delta = %d\n", 895 min_trsp_start, max_trsp_start, max_trsp_start-min_trsp_start ); 896 fprintf( f , " - TRSP_START : min = %d / max = %d / delta = %d\n", 897 min_trsp_start, max_trsp_start, max_trsp_start-min_trsp_start ); 898 899 printf( " - TRSP_END : min = %d / max = %d / delta = %d\n", 900 min_trsp_ended, max_trsp_ended, max_trsp_ended-min_trsp_ended ); 901 fprintf( f , " - TRSP_END : min = %d / max = %d / delta = %d\n", 902 min_trsp_ended, max_trsp_ended, max_trsp_ended-min_trsp_ended ); 903 904 printf( " - DISP_START : min = %d / max = %d / delta = %d\n", 905 min_disp_start, max_disp_start, max_disp_start-min_disp_start ); 906 fprintf( f , " - DISP_START : min = %d / max = %d / delta = %d\n", 907 min_disp_start, max_disp_start, max_disp_start-min_disp_start ); 908 909 printf( " - DISP_END : min = %d / max = %d / delta = %d\n", 910 min_disp_ended, max_disp_ended, max_disp_ended-min_disp_ended ); 911 fprintf( f , " - DISP_END : min = %d / max = %d / delta = %d\n", 912 min_disp_ended, max_disp_ended, max_disp_ended-min_disp_ended ); 913 914 printf( "\n Sequencial = %d / Parallel = %d\n", SEQUENCIAL_TIME, PARALLEL_TIME ); 915 fprintf( f , "\n Sequencial = %d / Parallel = %d\n", SEQUENCIAL_TIME, PARALLEL_TIME ); 673 916 674 917 } // end instrument() -
trunk/user/transpose/transpose.ld
r646 r652 1 /*************************************************************************** *1 /*************************************************************************** 2 2 * Definition of the base address for all virtual segments 3 *************************************************************************** **/3 ***************************************************************************/ 4 4 5 5 seg_code_base = 0x400000; 6 7 /*************************************************************************** 8 * Define code entry point (e_entry field in .elf file) 9 ***************************************************************************/ 10 11 ENTRY( main ) 6 12 7 13 /***************************************************************************
Note: See TracChangeset
for help on using the changeset viewer.