Changeset 676 for trunk/user/convol
- Timestamp:
- Nov 20, 2020, 12:11:35 AM (4 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/user/convol/convol.c
r659 r676 8 8 // per core, and uses the POSIX threads API. 9 9 // 10 // The main() function can be launched on any processor P[x,y,l]. 11 // It makes the initialisations, launch (N-1) threads to run the execute() function 12 // on the (N-1) other processors than P[x,y,l], call himself the execute() function, 13 // and finally call the instrument() function to display instrumentation results 14 // when the parallel execution is completed. 10 // The input image is read from a file and the output image is saved to another file. 11 // 12 // - number of clusters containing processors must be power of 2 no larger than 256. 13 // - number of processors per cluster must be power of 2 no larger than 4. 14 // - number of working threads is the number of cores availables in the hardware 15 // architecture : nthreads = nclusters * ncores. 15 16 // 16 17 // The convolution kernel is defined in the execute() function. 17 18 // It can be factored in two independant line and column convolution products. 18 // The five buffers containing the image are distributed in clusters.19 // For the philips image, it is a [201]*[35] pixels rectangle, and the.20 //21 // The (1024 * 1024) pixels image is read from a file (2 bytes per pixel).22 19 // 23 // - number of clusters containing processors must be power of 2 no larger than 256. 24 // - number of processors per cluster must be power of 2 no larger than 4. 20 // The main() function can be launched on any processor. 21 // - It checks software requirements versus the hardware resources. 22 // - It open & maps the input file to a global <image_in> buffer. 23 // - it open & maps the output file to another global <image_out> buffer. 24 // - it open the instrumentation file. 25 // - it creates & activates two FBF windows to display input & output images. 26 // - it launches other threads to run in parallel the execute() function. 27 // - it saves the instrumentation results on disk. 28 // - it closes the input, output, & instrumentation files. 29 // - it deletes the FBF input & output windows. 25 30 // 26 // The number N of working threads is always defined by the number of cores availables 27 // in the architecture, but this application supports three placement modes. 31 // The execute() function is executed in parallel by all threads. These threads are 32 // working on 5 arrays of distributed buffers, indexed by the cluster index [cid]. 33 // - A[cid]: contain the distributed initial image (NL/NCLUSTERS lines per cluster). 34 // - B[cid]: is the result of horizontal filter, then transpose B <= Trsp(HF(A) 35 // - C[cid]: is the result of vertical image, then transpose : c <= Trsp(VF(B) 36 // - D[cid]: is the the difference between A and FH(A) : D <= A - FH(A) 37 // - Z[cid]: contain the distributed final image Z <= C + D 38 // 39 // It can be split in four phases separated by synchronisation barriers: 40 // 1. Initialisation: 41 // Allocates the 5 A[cid],B[cid],C[cid],D[cid],Z[cid] buffers, initialise A[cid] 42 // from the <image_in> buffer, and display the initial image on FBF if rquired. 43 // 2. Horizontal Filter: 44 // Set B[cid] and D[cid] from A[cid]. Read data accesses are local, write data 45 // accesses are remote, to implement the transpose. 46 // 3. Vertical Filter: 47 // Set C[cid] from B[cid]. Read data accesses are local, write data accesses 48 // are remote, to implement the transpose. 49 // 4. Save results: 50 // Set the Z[cid] from C[cid] and D[cid]. All read and write access are local. 51 // Move the final image (Z[cid] buffer) to the <image_out> buffer. 52 // 53 // This application supports three placement modes, implemented in the main() function. 28 54 // In all modes, the working threads are identified by the [tid] continuous index 29 55 // in range [0, NTHREADS-1], and defines how the lines are shared amongst the threads. 30 56 // This continuous index can always be decomposed in two continuous sub-indexes: 31 // tid == cid * ncores+ lid, where cid is in [0,NCLUSTERS-1] and lid in [0,NCORES-1].57 // tid == cid * NCORES + lid, where cid is in [0,NCLUSTERS-1] and lid in [0,NCORES-1]. 32 58 // 33 59 // - NO_PLACEMENT: the main thread is itsef a working thread. The (N_1) other working … … 38 64 // but has tid = 0 (i.e. cid = 0 & tid = 0). 39 65 // 40 // - EXPLICIT_PLACEMENT: the main thread is again a working thread, but the placement of66 // - EXPLICIT_PLACEMENT: the main thread is again a working thread, but the placement 41 67 // of the threads on the cores is explicitely controled by the main thread to have 42 68 // exactly one working thread per core, and the [cxy][lpid] core coordinates for a given … … 46 72 // - PARALLEL_PLACEMENT: the main thread is not anymore a working thread, and uses the 47 73 // non standard pthread_parallel_create() function to avoid the costly sequencial 48 // loops for pthread_create() and pthread_join(). It garant yone working thread74 // loops for pthread_create() and pthread_join(). It garanties one working thread 49 75 // per core, and the same relation between the thread[tid] and the core[cxy][lpid]. 50 76 // … … 65 91 66 92 #define VERBOSE_MAIN 1 67 #define VERBOSE_EXEC 093 #define VERBOSE_EXEC 1 68 94 #define SUPER_VERBOSE 0 69 95 … … 74 100 #define THREADS_MAX (X_MAX * Y_MAX * CORES_MAX) 75 101 76 #define IMAGE_IN_PATH "misc/philips_1024_2.raw" 77 #define IMAGE_IN_PIXEL_SIZE 2 // 2 bytes per pixel 78 79 #define IMAGE_OUT_PATH "misc/philips_after_1O24.raw" 80 #define IMAGE_OUT_PIXEL_SIZE 1 // 1 bytes per pixel 81 82 #define FBF_TYPE 420 83 #define NL 1024 84 #define NP 1024 85 #define NB_PIXELS (NP * NL) 102 #define IMAGE_TYPE 420 // pixel encoding type 103 #define INPUT_IMAGE_PATH "misc/couple_512.raw" // default image_in 104 #define OUTPUT_IMAGE_PATH "misc/couple_conv_512.raw" // default image_out 105 #define NL 512 // default nlines 106 #define NP 512 // default npixels 86 107 87 108 #define NO_PLACEMENT 0 … … 89 110 #define PARALLEL_PLACEMENT 1 90 111 112 #define INTERACTIVE_MODE 0 91 113 #define USE_DQT_BARRIER 1 92 114 #define INITIAL_DISPLAY_ENABLE 1 … … 116 138 unsigned int V_BEG[CLUSTERS_MAX][CORES_MAX] = {{ 0 }}; 117 139 unsigned int V_END[CLUSTERS_MAX][CORES_MAX] = {{ 0 }}; 118 unsigned int D_BEG[CLUSTERS_MAX][CORES_MAX] = {{ 0 }};119 unsigned int D_END[CLUSTERS_MAX][CORES_MAX] = {{ 0 }};140 unsigned int F_BEG[CLUSTERS_MAX][CORES_MAX] = {{ 0 }}; 141 unsigned int F_END[CLUSTERS_MAX][CORES_MAX] = {{ 0 }}; 120 142 121 143 // pointer on buffer containing the input image, maped by the main to the input file … … 128 150 unsigned int THREAD_EXIT_SUCCESS = 0; 129 151 unsigned int THREAD_EXIT_FAILURE = 1; 152 153 // pointer and identifier for FBF windows 154 void * in_win_buf; 155 int in_wid; 156 void * out_win_buf; 157 int out_wid; 130 158 131 159 // synchronization barrier … … 137 165 unsigned int ncores; // number of processors per cluster 138 166 167 // main thread continuous index 168 unsigned int tid_main; 169 139 170 // arrays of pointers on distributed buffers in all clusters 140 unsigned short* GA[CLUSTERS_MAX];171 unsigned char * GA[CLUSTERS_MAX]; 141 172 int * GB[CLUSTERS_MAX]; 142 173 int * GC[CLUSTERS_MAX]; … … 153 184 pthread_parallel_work_args_t exec_args[THREADS_MAX]; 154 185 155 // main thread continuous index 156 unsigned int tid_main; 186 // image features 187 unsigned int image_nl; 188 unsigned int image_np; 189 char input_image_path[128]; 190 char output_image_path[128]; 157 191 158 192 ///////////////////////////////////////////////////////////////////////////////////// … … 166 200 ///////////////// 167 201 void main( void ) 202 ///////////////// 168 203 { 169 204 unsigned long long start_cycle; … … 222 257 unsigned int nthreads = nclusters * ncores; 223 258 259 // get input and output images pathnames and size 260 if( INTERACTIVE_MODE ) 261 { 262 // get image size 263 printf("\n[convol] image nlines : "); 264 get_uint32( &image_nl ); 265 266 printf("\n[convol] image npixels : "); 267 get_uint32( &image_np ); 268 269 printf("\n[convol] input image path : "); 270 get_string( input_image_path , 128 ); 271 272 printf("[convol] output image path : "); 273 get_string( output_image_path , 128 ); 274 } 275 else 276 { 277 image_nl = NL; 278 image_np = NP; 279 strcpy( input_image_path , INPUT_IMAGE_PATH ); 280 strcpy( output_image_path , OUTPUT_IMAGE_PATH ); 281 } 282 224 283 // main thread get FBF size and type 225 unsignedint fbf_width;226 unsignedint fbf_height;227 unsignedint fbf_type;284 int fbf_width; 285 int fbf_height; 286 int fbf_type; 228 287 fbf_get_config( &fbf_width , &fbf_height , &fbf_type ); 229 288 230 if( (fbf_width != NP) || (fbf_height != NL) || (fbf_type != FBF_TYPE) ) 231 { 232 printf("\n[convol error] image does not fit FBF size or type\n"); 233 exit( 0 ); 234 } 235 236 if( nthreads > NL ) 237 { 238 printf("\n[convol error] number of threads larger than number of lines\n"); 289 if( ((unsigned int)fbf_width < image_np) || 290 ((unsigned int)fbf_height < image_nl) || 291 (fbf_type != IMAGE_TYPE) ) 292 { 293 printf("\n[convol error] image not acceptable\n" 294 "FBF width = %d / npixels = %d\n" 295 "FBF height = %d / nlines = %d\n" 296 "FBF type = %d / expected = %d\n", 297 fbf_width, image_np, fbf_height, image_nl, fbf_type, IMAGE_TYPE ); 298 exit( 0 ); 299 } 300 301 if( nthreads > image_nl ) 302 { 303 printf("\n[convol error] nthreads (%d] larger than nlines (%d)\n", 304 nthreads , image_nl ); 239 305 exit( 0 ); 240 306 } … … 248 314 // build instrumentation file name 249 315 if( USE_DQT_BARRIER ) 250 snprintf( instru_name , 32 , " conv_dqt_no_place_%d_%d", x_size * y_size , ncores );316 snprintf( instru_name , 32 , "dqt_no_place_%d_%d", x_size * y_size , ncores ); 251 317 else 252 snprintf( instru_name , 32 , " conv_smp_no_place_%d_%d", x_size * y_size , ncores );318 snprintf( instru_name , 32 , "smp_no_place_%d_%d", x_size * y_size , ncores ); 253 319 } 254 320 … … 260 326 // build instrumentation file name 261 327 if( USE_DQT_BARRIER ) 262 snprintf( instru_name , 32 , " conv_dqt_explicit_%d_%d", x_size * y_size , ncores );328 snprintf( instru_name , 32 , "dqt_explicit_%d_%d", x_size * y_size , ncores ); 263 329 else 264 snprintf( instru_name , 32 , " conv_smp_explicit_%d_%d", x_size * y_size , ncores );330 snprintf( instru_name , 32 , "smp_explicit_%d_%d", x_size * y_size , ncores ); 265 331 } 266 332 … … 272 338 // build instrumentation file name 273 339 if( USE_DQT_BARRIER ) 274 snprintf( instru_name , 32 , " conv_dqt_parallel_%d_%d", x_size * y_size , ncores );340 snprintf( instru_name , 32 , "dqt_parallel_%d_%d", x_size * y_size , ncores ); 275 341 else 276 snprintf( instru_name , 32 , " conv_smp_parallel_%d_%d", x_size * y_size , ncores );342 snprintf( instru_name , 32 , "smp_parallel_%d_%d", x_size * y_size , ncores ); 277 343 } 278 344 279 345 // open instrumentation file 280 snprintf( instru_path , 64 , "/home/ %s", instru_name );346 snprintf( instru_path , 64 , "/home/convol/%s", instru_name ); 281 347 FILE * f_instru = fopen( instru_path , NULL ); 282 348 if ( f_instru == NULL ) … … 289 355 printf("\n[convol] main on core[%x,%d] open instrumentation file %s\n", 290 356 cxy_main, lid_main, instru_path ); 357 #endif 358 359 // main create an FBF window for input image 360 in_wid = fbf_create_window( 0, // l_zero 361 0, // p_zero 362 image_nl, // lines 363 image_np, // pixels 364 &in_win_buf ); 365 if( in_wid < 0 ) 366 { 367 printf("\n[transpose error] cannot open FBF window for %s\n", 368 input_image_path); 369 exit( 0 ); 370 } 371 372 // activate window 373 error = fbf_active_window( in_wid , 1 ); 374 375 if( error ) 376 { 377 printf("\n[transpose error] cannot activate window for %s\n", 378 input_image_path ); 379 exit( 0 ); 380 } 381 382 #if VERBOSE_MAIN 383 printf("\n[convol] main on core[%x,%d] created FBF window (wid %d) for <%s>\n", 384 cxy_main, lid_main, in_wid, input_image_path ); 385 #endif 386 387 // main create an FBF window for output image 388 out_wid = fbf_create_window( 0, // l_zero 389 image_np, // p_zero 390 image_nl, // lines 391 image_np, // pixels 392 &out_win_buf ); 393 if( out_wid < 0 ) 394 { 395 printf("\n[transpose error] cannot create FBF window for %s\n", 396 output_image_path); 397 exit( 0 ); 398 } 399 400 // activate window 401 error = fbf_active_window( out_wid , 1 ); 402 403 if( error ) 404 { 405 printf("\n[transpose error] cannot activate window for %s\n", 406 output_image_path ); 407 exit( 0 ); 408 } 409 410 #if VERBOSE_MAIN 411 printf("\n[convol] main on core[%x,%d] created FBF window (wid %d) for <%s>\n", 412 cxy_main, lid_main, out_wid, output_image_path ); 291 413 #endif 292 414 … … 312 434 313 435 #if VERBOSE_MAIN 314 printf("\n[convol] main on core[%x,%d] complete sbarrier init\n",436 printf("\n[convol] main on core[%x,%d] completed barrier init\n", 315 437 cxy_main, lid_main ); 316 438 #endif 317 439 318 440 // main open input file 319 int fd_in = open( IMAGE_IN_PATH, O_RDONLY , 0 );441 int fd_in = open( input_image_path , O_RDONLY , 0 ); 320 442 321 443 if ( fd_in < 0 ) 322 444 { 323 printf("\n[convol error] cannot open input file <%s>\n", IMAGE_IN_PATH ); 324 exit( 0 ); 325 } 326 327 #if VERBOSE_MAIN 328 printf("\n[convol] main on core[%x,%d] open file <%s>\n", 329 cxy_main, lid_main, IMAGE_IN_PATH ); 330 #endif 331 332 // main thread map image_in buffer to input file 445 printf("\n[convol error] cannot open input file <%s>\n", input_image_path ); 446 exit( 0 ); 447 } 448 449 // main thread map input file to image_in buffer 333 450 image_in = (unsigned char *)mmap( NULL, 334 NB_PIXELS * IMAGE_IN_PIXEL_SIZE,451 image_np * image_nl, 335 452 PROT_READ, 336 453 MAP_FILE | MAP_SHARED, … … 339 456 if ( image_in == NULL ) 340 457 { 341 printf("\n[convol error] main cannot map buffer to file %s\n", IMAGE_IN_PATH);458 printf("\n[convol error] main cannot map buffer to file %s\n", input_image_path ); 342 459 exit( 0 ); 343 460 } 344 461 345 462 #if VERBOSE_MAIN 346 printf("\n[convol] main on core[%x,%x] map buffer to file <%s>\n",347 cxy_main, lid_main, IMAGE_IN_PATH);463 printf("\n[convol] main on core[%x,%x] map <image_in> buffer to file <%s>\n", 464 cxy_main, lid_main, input_image_path ); 348 465 #endif 349 466 350 467 // main thread open output file 351 int fd_out = open( IMAGE_OUT_PATH, O_CREAT , 0 );468 int fd_out = open( output_image_path , O_CREAT , 0 ); 352 469 353 470 if ( fd_out < 0 ) 354 471 { 355 printf("\n[convol error] main cannot open file %s\n", IMAGE_OUT_PATH ); 356 exit( 0 ); 357 } 358 359 #if VERBOSE_MAIN 360 printf("\n[convol] main on core[%x,%d] open file <%s>\n", 361 cxy_main, lid_main, IMAGE_OUT_PATH ); 362 #endif 472 printf("\n[convol error] main cannot open file %s\n", output_image_path ); 473 exit( 0 ); 474 } 363 475 364 476 // main thread map image_out buffer to output file 365 477 image_out = (unsigned char *)mmap( NULL, 366 NB_PIXELS + IMAGE_OUT_PIXEL_SIZE,478 image_np * image_nl, 367 479 PROT_WRITE, 368 480 MAP_FILE | MAP_SHARED, … … 371 483 if ( image_out == NULL ) 372 484 { 373 printf("\n[convol error] main cannot map buffer to file %s\n", IMAGE_OUT_PATH);485 printf("\n[convol error] main cannot map buffer to file %s\n", output_image_path ); 374 486 exit( 0 ); 375 487 } 376 488 377 489 #if VERBOSE_MAIN 378 printf("\n[convol] main on core[%x,%x] map buffer to file <%s>\n",379 cxy_main, lid_main, IMAGE_OUT_PATH);490 printf("\n[convol] main on core[%x,%x] map <image_out> buffer to file <%s>\n", 491 cxy_main, lid_main, output_image_path ); 380 492 #endif 381 493 … … 389 501 { 390 502 // the tid value for the main thread is always 0 391 // main thread creates newthreads with tid in [1,nthreads-1]503 // main thread creates other threads with tid in [1,nthreads-1] 392 504 unsigned int tid; 393 505 for ( tid = 0 ; tid < nthreads ; tid++ ) … … 587 699 #endif 588 700 701 // ask confirm for exit 702 if( INTERACTIVE_MODE ) 703 { 704 char byte; 705 printf("\n[convol] press any key to to delete FBF windows and exit\n"); 706 getc( &byte ); 707 } 708 709 // main thread delete FBF windows 710 fbf_delete_window( in_wid ); 711 fbf_delete_window( out_wid ); 712 713 #if VERBOSE_MAIN 714 printf("\n[convol] main deleted FBF windows\n" ); 715 #endif 716 589 717 // main thread suicide 590 718 exit( 0 ); … … 597 725 598 726 727 728 729 730 599 731 ////////////////////////////////// 600 732 void * execute( void * arguments ) 601 733 ////////////////////////////////// 602 734 { 603 735 unsigned long long date; … … 628 760 // thread [cid][lid] indexes, and the core coordinates [cxy][lpid] 629 761 630 // get thread abstract identifiers 762 // get thread abstract identifiers[cid,lid] from tid 631 763 unsigned int tid = args->tid; 632 764 unsigned int cid = tid / ncores; … … 642 774 #endif 643 775 644 // build total number of threads andclusters from global variables776 // compute nthreads and nclusters from global variables 645 777 unsigned int nclusters = x_size * y_size; 646 778 unsigned int nthreads = nclusters * ncores; … … 652 784 unsigned int z; // vertical filter index 653 785 654 unsigned int lines_per_thread = NL / nthreads; 655 unsigned int lines_per_cluster = NL / nclusters; 656 unsigned int pixels_per_thread = NP / nthreads; 657 unsigned int pixels_per_cluster = NP / nclusters; 658 659 // compute number of pixels stored in one abstract cluster cid 660 unsigned int local_pixels = NL * NP / nclusters; 661 662 unsigned int first, last; 786 unsigned int lines_per_thread = image_nl / nthreads; 787 unsigned int lines_per_cluster = image_nl / nclusters; 788 unsigned int pixels_per_thread = image_np / nthreads; 789 unsigned int pixels_per_cluster = image_np / nclusters; 790 791 // compute number of pixels stored in one cluster 792 unsigned int local_pixels = image_nl * image_np / nclusters; 663 793 664 794 get_cycle( &date ); 665 795 START[cid][lid] = (unsigned int)date; 666 796 667 // Each thread[cid][0] allocates 5 local buffers,797 // Each thread[cid][0] allocates 5 buffers local cluster cid 668 798 // and registers these 5 pointers in the global arrays 669 799 if ( lid == 0 ) 670 800 { 671 GA[cid] = malloc( local_pixels * sizeof( unsigned short) );801 GA[cid] = malloc( local_pixels * sizeof( unsigned char ) ); 672 802 GB[cid] = malloc( local_pixels * sizeof( int ) ); 673 803 GC[cid] = malloc( local_pixels * sizeof( int ) ); … … 675 805 GZ[cid] = malloc( local_pixels * sizeof( unsigned char ) ); 676 806 677 if( (GA[cid] == NULL) || (GB[cid] == NULL) || (GC[cid] == NULL) || 678 (GD[cid] == NULL) || (GZ[cid] == NULL) ) 807 if( (GA[cid] == NULL) || 808 (GB[cid] == NULL) || 809 (GC[cid] == NULL) || 810 (GD[cid] == NULL) || 811 (GZ[cid] == NULL) ) 679 812 { 680 813 printf("\n[convol error] thread[%d] cannot allocate buf_in\n", tid ); … … 684 817 #if VERBOSE_EXEC 685 818 get_cycle( &date ); 686 printf( 819 printf("\n[convol] exec[%d] on core[%x,%d] allocated shared buffers / cycle %d\n" 687 820 " GA %x / GB %x / GC %x / GD %x / GZ %x\n", 688 821 tid, cxy , lpid, (unsigned int)date, GA[cid], GB[cid], GC[cid], GD[cid], GZ[cid] ); … … 694 827 pthread_barrier_wait( &barrier ); 695 828 696 // Each thread[ cid,lid] allocate and initialisein its private stack829 // Each thread[tid] allocates and initialises in its private stack 697 830 // a copy of the arrays of pointers on the distributed buffers. 698 unsigned short* A[CLUSTERS_MAX];831 unsigned char * A[CLUSTERS_MAX]; 699 832 int * B[CLUSTERS_MAX]; 700 833 int * C[CLUSTERS_MAX]; … … 711 844 } 712 845 713 // Each thread[cid,0] access the file containing the input image, to load 714 // the local A[cid] buffer. Other threads are waiting on the barrier. 715 if ( lid==0 ) 716 { 717 unsigned int size = local_pixels * sizeof( unsigned short ); 718 unsigned int offset = size * cid; 719 720 memcpy( A[cid], 721 image_in + offset, 722 size ); 846 unsigned int npixels = image_np * lines_per_thread; // pixels moved by any thread 847 unsigned int g_offset = npixels * tid; // offset in global buffer for tid 848 unsigned int l_offset = npixels * lid; // offset in local buffer for tid 849 850 // min and max line indexes handled by thread[tid] for a global buffer 851 unsigned int global_lmin = tid * lines_per_thread; 852 unsigned int global_lmax = global_lmin + lines_per_thread; 853 854 // min and max line indexes handled by thread[tid] for a local buffer 855 unsigned int local_lmin = lid * lines_per_thread; 856 unsigned int local_lmax = local_lmin + lines_per_thread; 857 858 // pmin and pmax pixel indexes handled by thread[tid] in a column 859 unsigned int column_pmin = tid * pixels_per_thread; 860 unsigned int column_pmax = column_pmin + pixels_per_thread; 861 862 // Each thread[tid] copy npixels from image_in buffer to local A[cid] buffer 863 memcpy( A[cid] + l_offset, 864 image_in + g_offset, 865 npixels ); 723 866 724 867 #if VERBOSE_EXEC … … 728 871 #endif 729 872 730 } 731 732 // Optionnal parallel display of the initial image stored in A[c] buffers. 733 // Eah thread[cid,lid] displays (NL/nthreads) lines. 734 873 // Optionnal parallel display for the initial image 735 874 if ( INITIAL_DISPLAY_ENABLE ) 736 875 { 737 unsigned int line; 738 unsigned int offset = lines_per_thread * lid; 739 740 for ( l = 0 ; l < lines_per_thread ; l++ ) 741 { 742 line = offset + l; 743 744 // copy TA[cid] to TZ[cid] 745 for ( p = 0 ; p < NP ; p++ ) 746 { 747 TZ(cid, line, p) = (unsigned char)(TA(cid, line, p) >> 8); 748 } 749 750 // display one line to frame buffer 751 if (fbf_write( &TZ(cid, line, 0), // first pixel in TZ 752 NP, // number of bytes 753 NP*(l + (tid * lines_per_thread)))) // offset in FBF 754 { 755 printf("\n[convol error] in %s : thread[%d] cannot access FBF\n", 756 __FUNCTION__ , tid ); 757 pthread_exit( &THREAD_EXIT_FAILURE ); 758 } 876 // each thread[tid] copy npixels from A[cid] to in_win_buf buffer 877 memcpy( in_win_buf + g_offset, 878 A[cid] + l_offset, 879 npixels ); 880 881 // refresh the FBF window 882 if( fbf_refresh_window( in_wid , global_lmin , global_lmax ) ) 883 { 884 printf("\n[convol error] in %s : thread[%d] cannot access FBF\n", 885 __FUNCTION__ , tid ); 886 pthread_exit( &THREAD_EXIT_FAILURE ); 759 887 } 760 888 … … 771 899 //////////////////////////////////////////////////////////// 772 900 // parallel horizontal filter : 773 // B <= convol(FH(A))901 // B <= Transpose(FH(A)) 774 902 // D <= A - FH(A) 775 // Each thread computes ( NL/nthreads) lines.903 // Each thread computes (image_nl/nthreads) lines. 776 904 // The image must be extended : 777 905 // if (z<0) TA(cid,l,z) == TA(cid,l,0) 778 // if (z> NP-1) TA(cid,l,z) == TA(cid,l,NP-1)906 // if (z>image_np-1) TA(cid,l,z) == TA(cid,l,image_np-1) 779 907 //////////////////////////////////////////////////////////// 780 908 … … 782 910 H_BEG[cid][lid] = (unsigned int)date; 783 911 784 // l = absolute line index / p = absolute pixel index 785 // first & last define which lines are handled by a given thread 786 787 first = tid * lines_per_thread; 788 last = first + lines_per_thread; 789 790 for (l = first; l < last; l++) 912 // l = global line index / p = absolute pixel index 913 914 for (l = global_lmin; l < global_lmax; l++) 791 915 { 792 916 // src_c and src_l are the cluster index and the line index for A & D … … 814 938 TD(src_c, src_l, p) = (int) TA(src_c, src_l, p) - sum_p / hnorm; 815 939 } 816 // second domain : from (hrange+1) to ( NP-hrange-1)817 for (p = hrange + 1; p < NP- hrange; p++)940 // second domain : from (hrange+1) to (image_np-hrange-1) 941 for (p = hrange + 1; p < image_np - hrange; p++) 818 942 { 819 943 // dst_c and dst_p are the cluster index and the pixel index for B … … 825 949 TD(src_c, src_l, p) = (int) TA(src_c, src_l, p) - sum_p / hnorm; 826 950 } 827 // third domain : from ( NP-hrange) to (NP-1)828 for (p = NP - hrange; p < NP; p++)951 // third domain : from (image_np-hrange) to (image_np-1) 952 for (p = image_np - hrange; p < image_np; p++) 829 953 { 830 954 // dst_c and dst_p are the cluster index and the pixel index for B 831 955 int dst_c = p / pixels_per_cluster; 832 956 int dst_p = p % pixels_per_cluster; 833 sum_p = sum_p + (int) TA(src_c, src_l, NP- 1)957 sum_p = sum_p + (int) TA(src_c, src_l, image_np - 1) 834 958 - (int) TA(src_c, src_l, p - hrange - 1); 835 959 TB(dst_c, dst_p, l) = sum_p / hnorm; … … 858 982 /////////////////////////////////////////////////////////////// 859 983 // parallel vertical filter : 860 // C <= transpose(FV(B))861 // Each thread computes ( NP/nthreads) columns984 // C <= Transpose(FV(B)) 985 // Each thread computes (image_np/nthreads) columns 862 986 // The image must be extended : 863 987 // if (l<0) TB(cid,p,l) == TB(cid,p,0) 864 // if (l> NL-1) TB(cid,p,l) == TB(cid,p,NL-1)988 // if (l>image_nl-1) TB(cid,p,l) == TB(cid,p,image_nl-1) 865 989 /////////////////////////////////////////////////////////////// 866 990 … … 868 992 V_BEG[cid][lid] = (unsigned int)date; 869 993 870 // l = absolute line index / p = absolute pixel index 871 // first & last define which pixels are handled by a given thread 872 873 first = tid * pixels_per_thread; 874 last = first + pixels_per_thread; 875 876 for (p = first; p < last; p++) 994 // l = global line index / p = pixel index in column 995 996 for (p = column_pmin; p < column_pmax ; p++) 877 997 { 878 998 // src_c and src_p are the cluster index and the pixel index for B … … 883 1003 884 1004 // We use the specific values of the vertical ep-filter 885 // To minimize the number of tests, the NLlines are split in three domains1005 // To minimize the number of tests, the image_nl lines are split in three domains 886 1006 887 1007 // first domain : explicit computation for the first 18 values … … 899 1019 } 900 1020 // second domain 901 for (l = 18; l < NL- 17; l++)1021 for (l = 18; l < image_nl - 17; l++) 902 1022 { 903 1023 // dst_c and dst_l are the cluster index and the line index for C … … 919 1039 } 920 1040 // third domain 921 for (l = NL - 17; l < NL; l++)1041 for (l = image_nl - 17; l < image_nl; l++) 922 1042 { 923 1043 // dst_c and dst_l are the cluster index and the line index for C … … 925 1045 int dst_l = l % lines_per_cluster; 926 1046 927 sum_l = sum_l + TB(src_c, src_p, min(l + 4, NL- 1))928 + TB(src_c, src_p, min(l + 8, NL- 1))929 + TB(src_c, src_p, min(l + 11, NL- 1))930 + TB(src_c, src_p, min(l + 15, NL- 1))931 + TB(src_c, src_p, min(l + 17, NL- 1))1047 sum_l = sum_l + TB(src_c, src_p, min(l + 4, image_nl - 1)) 1048 + TB(src_c, src_p, min(l + 8, image_nl - 1)) 1049 + TB(src_c, src_p, min(l + 11, image_nl - 1)) 1050 + TB(src_c, src_p, min(l + 15, image_nl - 1)) 1051 + TB(src_c, src_p, min(l + 17, image_nl - 1)) 932 1052 - TB(src_c, src_p, l - 5) 933 1053 - TB(src_c, src_p, l - 9) … … 958 1078 pthread_barrier_wait( &barrier ); 959 1079 960 // Optional parallel display of the final image Z <= D + C 961 // Eah thread[x,y,p] displays (NL/nthreads) lines. 962 1080 /////////////////////////////////////////////////////////////// 1081 // build final image in local Z buffer from C & D local buffers 1082 // store it in output image file, and display it on FBF. 1083 // Z <= C + D 1084 /////////////////////////////////////////////////////////////// 1085 1086 get_cycle( &date ); 1087 F_BEG[cid][lid] = (unsigned int)date; 1088 1089 // Each thread[tid] set local buffer Z[cid] from local buffers C[cid] & D[cid] 1090 1091 for( l = local_lmin ; l < local_lmax ; l++ ) 1092 { 1093 for( p = 0 ; p < image_np ; p++ ) 1094 { 1095 TZ(cid,l,p) = TC(cid,l,p) + TD(cid,l,p); 1096 } 1097 } 1098 1099 // Each thread[tid] copy npixels from Z[cid] buffer to image_out buffer 1100 memcpy( image_out + g_offset, 1101 Z[cid] + l_offset, 1102 npixels ); 1103 1104 // Optional parallel display of the final image 963 1105 if ( FINAL_DISPLAY_ENABLE ) 964 1106 { 965 get_cycle( &date ); 966 D_BEG[cid][lid] = (unsigned int)date; 967 968 unsigned int line; 969 unsigned int offset = lines_per_thread * lid; 970 971 for ( l = 0 ; l < lines_per_thread ; l++ ) 972 { 973 line = offset + l; 974 975 for ( p = 0 ; p < NP ; p++ ) 976 { 977 TZ(cid, line, p) = 978 (unsigned char)( (TD(cid, line, p) + 979 TC(cid, line, p) ) >> 8 ); 980 } 981 982 if (fbf_write( &TZ(cid, line, 0), // first pixel in TZ 983 NP, // number of bytes 984 NP*(l + (tid * lines_per_thread)))) // offset in FBF 985 { 986 printf("\n[convol error] thread[%d] cannot access FBF\n", tid ); 987 pthread_exit( &THREAD_EXIT_FAILURE ); 988 } 989 } 990 991 get_cycle( &date ); 992 D_END[cid][lid] = (unsigned int)date; 993 994 #if VERBOSE_EXEC 1107 // each thread[tid] copy npixels from Z[cid] to out_win_buf buffer 1108 memcpy( out_win_buf + g_offset, 1109 Z[cid] + l_offset, 1110 npixels ); 1111 1112 // refresh the FBF window 1113 if( fbf_refresh_window( out_wid , global_lmin , global_lmax ) ) 1114 { 1115 printf("\n[convol error] in %s : thread[%d] cannot access FBF\n", 1116 __FUNCTION__ , tid ); 1117 pthread_exit( &THREAD_EXIT_FAILURE ); 1118 } 1119 1120 #if VERBOSE_EXEC 995 1121 get_cycle( &date ); 996 1122 printf( "\n[convol] exec[%d] on core[%x,%d] completed final display / cycle %d\n", 997 tid , cxy , l id , (unsigned int)date );1123 tid , cxy , lpid , (unsigned int)date ); 998 1124 #endif 999 1125 … … 1010 1136 } 1011 1137 1138 get_cycle( &date ); 1139 F_END[cid][lid] = (unsigned int)date; 1140 1012 1141 // thread termination depends on the placement policy 1013 1142 if( PARALLEL_PLACEMENT ) … … 1031 1160 1032 1161 } // end execute() 1162 1163 1164 1033 1165 1034 1166 … … 1057 1189 unsigned int max_v_end = 0; 1058 1190 1059 unsigned int min_ d_beg = 0xFFFFFFFF;1060 unsigned int max_ d_beg = 0;1061 1062 unsigned int min_ d_end = 0xFFFFFFFF;1063 unsigned int max_ d_end = 0;1191 unsigned int min_f_beg = 0xFFFFFFFF; 1192 unsigned int max_f_beg = 0; 1193 1194 unsigned int min_f_end = 0xFFFFFFFF; 1195 unsigned int max_f_end = 0; 1064 1196 1065 1197 for (cc = 0; cc < nclusters; cc++) … … 1082 1214 if (V_END[cc][pp] > max_v_end) max_v_end = V_END[cc][pp]; 1083 1215 1084 if ( D_BEG[cc][pp] < min_d_beg) min_d_beg = D_BEG[cc][pp];1085 if ( D_BEG[cc][pp] > max_d_beg) max_d_beg = D_BEG[cc][pp];1086 1087 if ( D_END[cc][pp] < min_d_end) min_d_end = D_END[cc][pp];1088 if ( D_END[cc][pp] > max_d_end) max_d_end = D_END[cc][pp];1216 if (F_BEG[cc][pp] < min_f_beg) min_f_beg = F_BEG[cc][pp]; 1217 if (F_BEG[cc][pp] > max_f_beg) max_f_beg = F_BEG[cc][pp]; 1218 1219 if (F_END[cc][pp] < min_f_end) min_f_end = F_END[cc][pp]; 1220 if (F_END[cc][pp] > max_f_end) max_f_end = F_END[cc][pp]; 1089 1221 } 1090 1222 } … … 1109 1241 1110 1242 printf(" - D_BEG : min = %d / max = %d / med = %d / delta = %d\n", 1111 min_ d_beg, max_d_beg, (min_d_beg+max_d_beg)/2, max_d_beg-min_d_beg);1243 min_f_beg, max_f_beg, (min_f_beg+max_f_beg)/2, max_f_beg-min_f_beg); 1112 1244 1113 1245 printf(" - D_END : min = %d / max = %d / med = %d / delta = %d\n", 1114 min_ d_end, max_d_end, (min_d_end+max_d_end)/2, max_d_end-min_d_end);1246 min_f_end, max_f_end, (min_f_end+max_f_end)/2, max_f_end-min_f_end); 1115 1247 1116 1248 printf( "\n General Scenario (Kcycles)\n" ); … … 1119 1251 printf( " - BARRIER HORI/VERT = %d\n", (min_v_beg - max_h_end)/1000 ); 1120 1252 printf( " - V_FILTER = %d\n", (max_v_end - min_v_beg)/1000 ); 1121 printf( " - BARRIER VERT/DISP = %d\n", (min_ d_beg - max_v_end)/1000 );1122 printf( " - DISPLAY = %d\n", (max_ d_end - min_d_beg)/1000 );1253 printf( " - BARRIER VERT/DISP = %d\n", (min_f_beg - max_v_end)/1000 ); 1254 printf( " - DISPLAY = %d\n", (max_f_end - min_f_beg)/1000 ); 1123 1255 printf( " \nSEQUENCIAL = %d / PARALLEL = %d\n", 1124 1256 SEQUENCIAL_TIME/1000, PARALLEL_TIME/1000 ); … … 1143 1275 1144 1276 fprintf( f , " - D_BEG : min = %d / max = %d / med = %d / delta = %d\n", 1145 min_ d_beg, max_d_beg, (min_d_beg+max_d_beg)/2, max_d_beg-min_d_beg);1277 min_f_beg, max_f_beg, (min_f_beg+max_f_beg)/2, max_f_beg-min_f_beg); 1146 1278 1147 1279 fprintf( f , " - D_END : min = %d / max = %d / med = %d / delta = %d\n", 1148 min_ d_end, max_d_end, (min_d_end+max_d_end)/2, max_d_end-min_d_end);1280 min_f_end, max_f_end, (min_f_end+max_f_end)/2, max_f_end-min_f_end); 1149 1281 1150 1282 fprintf( f , "\n General Scenario (Kcycles)\n" ); … … 1153 1285 fprintf( f , " - BARRIER HORI/VERT = %d\n", (min_v_beg - max_h_end)/1000 ); 1154 1286 fprintf( f , " - V_FILTER = %d\n", (max_v_end - min_v_beg)/1000 ); 1155 fprintf( f , " - BARRIER VERT/DISP = %d\n", (min_ d_beg - max_v_end)/1000 );1156 fprintf( f , " - DISPLAY = %d\n", (max_d_end - min_d_beg)/1000 );1287 fprintf( f , " - BARRIER VERT/DISP = %d\n", (min_f_beg - max_v_end)/1000 ); 1288 fprintf( f , " - SAVE = %d\n", (max_f_end - min_f_beg)/1000 ); 1157 1289 fprintf( f , " \nSEQUENCIAL = %d / PARALLEL = %d\n", 1158 1290 SEQUENCIAL_TIME/1000, PARALLEL_TIME/1000 );
Note: See TracChangeset
for help on using the changeset viewer.