| [581] | 1 | GENERAL INFORMATION: |
|---|
| 2 | |
|---|
| 3 | The OCEAN program simulates large-scale ocean movements based on eddy and |
|---|
| 4 | boundary currents, and is an enhanced version of the SPLASH Ocean code. |
|---|
| 5 | A description of the functionality of this code can be found in the |
|---|
| 6 | original SPLASH report. The implementations contained in SPLASH-2 |
|---|
| 7 | differ from the original SPLASH implementation in the following ways: |
|---|
| 8 | |
|---|
| 9 | (1) The SPLASH-2 implementations are written in C rather than |
|---|
| 10 | FORTRAN. |
|---|
| 11 | (2) Grids are partitioned into square-like subgrids rather than |
|---|
| 12 | groups of columns to improve the communication to computation |
|---|
| 13 | ratio. |
|---|
| 14 | (3) The SOR solver in the SPLASH Ocean code has been replaced with a |
|---|
| 15 | restricted Red-Black Gauss-Seidel Multigrid solver based on that |
|---|
| 16 | presented in: |
|---|
| 17 | |
|---|
| 18 | Brandt, A. Multi-Level Adaptive Solutions to Boundary-Value Problems. |
|---|
| 19 | Mathematics of Computation, 31(138):333-390, April 1977. |
|---|
| 20 | |
|---|
| 21 | The solver is restricted so that each processor has as least two |
|---|
| 22 | grid points in each dimension in each grid subpartition. |
|---|
| 23 | |
|---|
| 24 | Two implementations are provided in the SPLASH-2 distribution: |
|---|
| 25 | |
|---|
| 26 | (1) Non-contiguous partition allocation |
|---|
| 27 | |
|---|
| 28 | This implementation (contained in the non_contiguous_partitions |
|---|
| 29 | subdirectory) implements the grids to be operated on with |
|---|
| 30 | two-dimensional arrays. This data structure prevents partitions |
|---|
| 31 | from being allocated contiguously, but leads to a conceptually |
|---|
| 32 | simple programming implementation. |
|---|
| 33 | |
|---|
| 34 | (2) Contiguous partition allocation |
|---|
| 35 | |
|---|
| 36 | This implementation (contained in the contiguous_partitions |
|---|
| 37 | subdirectory) implements the grids to be operated on with |
|---|
| 38 | 3-dimensional arrays. The first dimension specifies the processor |
|---|
| 39 | which owns the partition, and the second and third dimensions |
|---|
| 40 | specify the x and y offset within a partition. This data structure |
|---|
| 41 | allows partitions to be allocated contiguously and entirely in the |
|---|
| 42 | local memory of processors that "own" them, thus enhancing data |
|---|
| 43 | locality properties. |
|---|
| 44 | |
|---|
| 45 | The contiguous partition allocation implementation is described in: |
|---|
| 46 | |
|---|
| 47 | Woo, S. C., Singh, J. P., and Hennessy, J. L. The Performance Advantages |
|---|
| 48 | of Integrating Message Passing in Cache-Coherent Multiprocessors. |
|---|
| 49 | Technical Report CSL-TR-93-593, Stanford University, December 1993. |
|---|
| 50 | |
|---|
| 51 | A detailed description of both versions will appear in the SPLASH-2 report. |
|---|
| 52 | The non-contiguous partition allocation implementation is conceptually |
|---|
| 53 | similar, except for the use of statically allocated 2-dimensional arrays. |
|---|
| 54 | |
|---|
| 55 | These programs work under both the Unix FORK and SPROC models. |
|---|
| 56 | |
|---|
| 57 | RUNNING THE PROGRAM: |
|---|
| 58 | |
|---|
| 59 | To see how to run the program, please see the comment at the top of the |
|---|
| 60 | file main.C, or run the application with the "-h" command line option. |
|---|
| 61 | Five command line parameters can be specified, of which the ones which |
|---|
| 62 | would normally be changed are the number of grid points in each dimension, |
|---|
| 63 | and the number of processors. The number of grid points must be a |
|---|
| 64 | (power of 2+2) in each dimension (e.g. 130, 258, etc.). The number of |
|---|
| 65 | processors must be a power of 2. Timing information is printed out at |
|---|
| 66 | the end of the program. The first timestep is considered part of the |
|---|
| 67 | initialization phase of the program, and hence is not included in the |
|---|
| 68 | "Total time without initialization." |
|---|
| 69 | |
|---|
| 70 | BASE PROBLEM SIZE: |
|---|
| 71 | |
|---|
| 72 | The base problem size for an upto-64 processor machine is a 258x258 grid. |
|---|
| 73 | The default values should be used for other parameters (except the number |
|---|
| 74 | of processors, which can be varied). In addition, sample output files |
|---|
| 75 | for the default parameters for each version of the code are contained in |
|---|
| 76 | the file correct.out in each subdirectory. |
|---|
| 77 | |
|---|
| 78 | DATA DISTRIBUTION: |
|---|
| 79 | |
|---|
| 80 | Our "POSSIBLE ENHANCEMENT" comments in the source code tell where one |
|---|
| 81 | might want to distribute data and how. Data distribution has an impact |
|---|
| 82 | on performance on the Stanford DASH multiprocessor. |
|---|
| 83 | |
|---|