[581] | 1 | GENERAL INFORMATION: |
---|
| 2 | |
---|
| 3 | The OCEAN program simulates large-scale ocean movements based on eddy and |
---|
| 4 | boundary currents, and is an enhanced version of the SPLASH Ocean code. |
---|
| 5 | A description of the functionality of this code can be found in the |
---|
| 6 | original SPLASH report. The implementations contained in SPLASH-2 |
---|
| 7 | differ from the original SPLASH implementation in the following ways: |
---|
| 8 | |
---|
| 9 | (1) The SPLASH-2 implementations are written in C rather than |
---|
| 10 | FORTRAN. |
---|
| 11 | (2) Grids are partitioned into square-like subgrids rather than |
---|
| 12 | groups of columns to improve the communication to computation |
---|
| 13 | ratio. |
---|
| 14 | (3) The SOR solver in the SPLASH Ocean code has been replaced with a |
---|
| 15 | restricted Red-Black Gauss-Seidel Multigrid solver based on that |
---|
| 16 | presented in: |
---|
| 17 | |
---|
| 18 | Brandt, A. Multi-Level Adaptive Solutions to Boundary-Value Problems. |
---|
| 19 | Mathematics of Computation, 31(138):333-390, April 1977. |
---|
| 20 | |
---|
| 21 | The solver is restricted so that each processor has as least two |
---|
| 22 | grid points in each dimension in each grid subpartition. |
---|
| 23 | |
---|
| 24 | Two implementations are provided in the SPLASH-2 distribution: |
---|
| 25 | |
---|
| 26 | (1) Non-contiguous partition allocation |
---|
| 27 | |
---|
| 28 | This implementation (contained in the non_contiguous_partitions |
---|
| 29 | subdirectory) implements the grids to be operated on with |
---|
| 30 | two-dimensional arrays. This data structure prevents partitions |
---|
| 31 | from being allocated contiguously, but leads to a conceptually |
---|
| 32 | simple programming implementation. |
---|
| 33 | |
---|
| 34 | (2) Contiguous partition allocation |
---|
| 35 | |
---|
| 36 | This implementation (contained in the contiguous_partitions |
---|
| 37 | subdirectory) implements the grids to be operated on with |
---|
| 38 | 3-dimensional arrays. The first dimension specifies the processor |
---|
| 39 | which owns the partition, and the second and third dimensions |
---|
| 40 | specify the x and y offset within a partition. This data structure |
---|
| 41 | allows partitions to be allocated contiguously and entirely in the |
---|
| 42 | local memory of processors that "own" them, thus enhancing data |
---|
| 43 | locality properties. |
---|
| 44 | |
---|
| 45 | The contiguous partition allocation implementation is described in: |
---|
| 46 | |
---|
| 47 | Woo, S. C., Singh, J. P., and Hennessy, J. L. The Performance Advantages |
---|
| 48 | of Integrating Message Passing in Cache-Coherent Multiprocessors. |
---|
| 49 | Technical Report CSL-TR-93-593, Stanford University, December 1993. |
---|
| 50 | |
---|
| 51 | A detailed description of both versions will appear in the SPLASH-2 report. |
---|
| 52 | The non-contiguous partition allocation implementation is conceptually |
---|
| 53 | similar, except for the use of statically allocated 2-dimensional arrays. |
---|
| 54 | |
---|
| 55 | These programs work under both the Unix FORK and SPROC models. |
---|
| 56 | |
---|
| 57 | RUNNING THE PROGRAM: |
---|
| 58 | |
---|
| 59 | To see how to run the program, please see the comment at the top of the |
---|
| 60 | file main.C, or run the application with the "-h" command line option. |
---|
| 61 | Five command line parameters can be specified, of which the ones which |
---|
| 62 | would normally be changed are the number of grid points in each dimension, |
---|
| 63 | and the number of processors. The number of grid points must be a |
---|
| 64 | (power of 2+2) in each dimension (e.g. 130, 258, etc.). The number of |
---|
| 65 | processors must be a power of 2. Timing information is printed out at |
---|
| 66 | the end of the program. The first timestep is considered part of the |
---|
| 67 | initialization phase of the program, and hence is not included in the |
---|
| 68 | "Total time without initialization." |
---|
| 69 | |
---|
| 70 | BASE PROBLEM SIZE: |
---|
| 71 | |
---|
| 72 | The base problem size for an upto-64 processor machine is a 258x258 grid. |
---|
| 73 | The default values should be used for other parameters (except the number |
---|
| 74 | of processors, which can be varied). In addition, sample output files |
---|
| 75 | for the default parameters for each version of the code are contained in |
---|
| 76 | the file correct.out in each subdirectory. |
---|
| 77 | |
---|
| 78 | DATA DISTRIBUTION: |
---|
| 79 | |
---|
| 80 | Our "POSSIBLE ENHANCEMENT" comments in the source code tell where one |
---|
| 81 | might want to distribute data and how. Data distribution has an impact |
---|
| 82 | on performance on the Stanford DASH multiprocessor. |
---|
| 83 | |
---|