| [581] | 1 | GENERAL INFORMATION: | 
|---|
 | 2 |  | 
|---|
 | 3 | The OCEAN program simulates large-scale ocean movements based on eddy and | 
|---|
 | 4 | boundary currents, and is an enhanced version of the SPLASH Ocean code. | 
|---|
 | 5 | A description of the functionality of this code can be found in the  | 
|---|
 | 6 | original SPLASH report.  The implementations contained in SPLASH-2  | 
|---|
 | 7 | differ from the original SPLASH implementation in the following ways: | 
|---|
 | 8 |  | 
|---|
 | 9 |   (1) The SPLASH-2 implementations are written in C rather than  | 
|---|
 | 10 |       FORTRAN. | 
|---|
 | 11 |   (2) Grids are partitioned into square-like subgrids rather than  | 
|---|
 | 12 |       groups of columns to improve the communication to computation  | 
|---|
 | 13 |       ratio. | 
|---|
 | 14 |   (3) The SOR solver in the SPLASH Ocean code has been replaced with a | 
|---|
 | 15 |       restricted Red-Black Gauss-Seidel Multigrid solver based on that  | 
|---|
 | 16 |       presented in: | 
|---|
 | 17 |  | 
|---|
 | 18 |       Brandt, A. Multi-Level Adaptive Solutions to Boundary-Value Problems. | 
|---|
 | 19 |            Mathematics of Computation, 31(138):333-390, April 1977. | 
|---|
 | 20 |  | 
|---|
 | 21 |       The solver is restricted so that each processor has as least two | 
|---|
 | 22 |       grid points in each dimension in each grid subpartition. | 
|---|
 | 23 |  | 
|---|
 | 24 | Two implementations are provided in the SPLASH-2 distribution: | 
|---|
 | 25 |  | 
|---|
 | 26 |   (1) Non-contiguous partition allocation | 
|---|
 | 27 |  | 
|---|
 | 28 |       This implementation (contained in the non_contiguous_partitions | 
|---|
 | 29 |       subdirectory) implements the grids to be operated on with | 
|---|
 | 30 |       two-dimensional arrays.  This data structure prevents partitions  | 
|---|
 | 31 |       from being allocated contiguously, but leads to a conceptually  | 
|---|
 | 32 |       simple programming implementation. | 
|---|
 | 33 |  | 
|---|
 | 34 |   (2) Contiguous partition allocation | 
|---|
 | 35 |  | 
|---|
 | 36 |       This implementation (contained in the contiguous_partitions  | 
|---|
 | 37 |       subdirectory) implements the grids to be operated on with | 
|---|
 | 38 |       3-dimensional arrays.  The first dimension specifies the processor | 
|---|
 | 39 |       which owns the partition, and the second and third dimensions  | 
|---|
 | 40 |       specify the x and y offset within a partition.  This data structure  | 
|---|
 | 41 |       allows partitions to be allocated contiguously and entirely in the  | 
|---|
 | 42 |       local memory of processors that "own" them, thus enhancing data | 
|---|
 | 43 |       locality properties. | 
|---|
 | 44 |  | 
|---|
 | 45 | The contiguous partition allocation implementation is described in: | 
|---|
 | 46 |  | 
|---|
 | 47 | Woo, S. C., Singh, J. P., and Hennessy, J. L.  The Performance Advantages | 
|---|
 | 48 |      of Integrating Message Passing in Cache-Coherent Multiprocessors. | 
|---|
 | 49 |      Technical Report CSL-TR-93-593, Stanford University, December 1993. | 
|---|
 | 50 |  | 
|---|
 | 51 | A detailed description of both versions will appear in the SPLASH-2 report. | 
|---|
 | 52 | The non-contiguous partition allocation implementation is conceptually | 
|---|
 | 53 | similar, except for the use of statically allocated 2-dimensional arrays. | 
|---|
 | 54 |  | 
|---|
 | 55 | These programs work under both the Unix FORK and SPROC models. | 
|---|
 | 56 |  | 
|---|
 | 57 | RUNNING THE PROGRAM: | 
|---|
 | 58 |  | 
|---|
 | 59 | To see how to run the program, please see the comment at the top of the | 
|---|
 | 60 | file main.C, or run the application with the "-h" command line option. | 
|---|
 | 61 | Five command line parameters can be specified, of which the ones which | 
|---|
 | 62 | would normally be changed are the number of grid points in each dimension, | 
|---|
 | 63 | and the number of processors.  The number of grid points must be a | 
|---|
 | 64 | (power of 2+2) in each dimension (e.g. 130, 258, etc.).  The number of | 
|---|
 | 65 | processors must be a power of 2.  Timing information is printed out at  | 
|---|
 | 66 | the end of the program.  The first timestep is considered part of the  | 
|---|
 | 67 | initialization phase of the program, and hence is not included in the  | 
|---|
 | 68 | "Total time without initialization." | 
|---|
 | 69 |  | 
|---|
 | 70 | BASE PROBLEM SIZE: | 
|---|
 | 71 |  | 
|---|
 | 72 | The base problem size for an upto-64 processor machine is a 258x258 grid. | 
|---|
 | 73 | The default values should be used for other parameters (except the number | 
|---|
 | 74 | of processors, which can be varied).  In addition, sample output files  | 
|---|
 | 75 | for the default parameters for each version of the code are contained in  | 
|---|
 | 76 | the file correct.out in each subdirectory. | 
|---|
 | 77 |  | 
|---|
 | 78 | DATA DISTRIBUTION: | 
|---|
 | 79 |  | 
|---|
 | 80 | Our "POSSIBLE ENHANCEMENT" comments in the source code tell where one | 
|---|
 | 81 | might want to distribute data and how.  Data distribution has an impact | 
|---|
 | 82 | on performance on the Stanford DASH multiprocessor. | 
|---|
 | 83 |  | 
|---|