1 | GENERAL INFORMATION: |
---|
2 | |
---|
3 | The OCEAN program simulates large-scale ocean movements based on eddy and |
---|
4 | boundary currents, and is an enhanced version of the SPLASH Ocean code. |
---|
5 | A description of the functionality of this code can be found in the |
---|
6 | original SPLASH report. The implementations contained in SPLASH-2 |
---|
7 | differ from the original SPLASH implementation in the following ways: |
---|
8 | |
---|
9 | (1) The SPLASH-2 implementations are written in C rather than |
---|
10 | FORTRAN. |
---|
11 | (2) Grids are partitioned into square-like subgrids rather than |
---|
12 | groups of columns to improve the communication to computation |
---|
13 | ratio. |
---|
14 | (3) The SOR solver in the SPLASH Ocean code has been replaced with a |
---|
15 | restricted Red-Black Gauss-Seidel Multigrid solver based on that |
---|
16 | presented in: |
---|
17 | |
---|
18 | Brandt, A. Multi-Level Adaptive Solutions to Boundary-Value Problems. |
---|
19 | Mathematics of Computation, 31(138):333-390, April 1977. |
---|
20 | |
---|
21 | The solver is restricted so that each processor has as least two |
---|
22 | grid points in each dimension in each grid subpartition. |
---|
23 | |
---|
24 | Two implementations are provided in the SPLASH-2 distribution: |
---|
25 | |
---|
26 | (1) Non-contiguous partition allocation |
---|
27 | |
---|
28 | This implementation (contained in the non_contiguous_partitions |
---|
29 | subdirectory) implements the grids to be operated on with |
---|
30 | two-dimensional arrays. This data structure prevents partitions |
---|
31 | from being allocated contiguously, but leads to a conceptually |
---|
32 | simple programming implementation. |
---|
33 | |
---|
34 | (2) Contiguous partition allocation |
---|
35 | |
---|
36 | This implementation (contained in the contiguous_partitions |
---|
37 | subdirectory) implements the grids to be operated on with |
---|
38 | 3-dimensional arrays. The first dimension specifies the processor |
---|
39 | which owns the partition, and the second and third dimensions |
---|
40 | specify the x and y offset within a partition. This data structure |
---|
41 | allows partitions to be allocated contiguously and entirely in the |
---|
42 | local memory of processors that "own" them, thus enhancing data |
---|
43 | locality properties. |
---|
44 | |
---|
45 | The contiguous partition allocation implementation is described in: |
---|
46 | |
---|
47 | Woo, S. C., Singh, J. P., and Hennessy, J. L. The Performance Advantages |
---|
48 | of Integrating Message Passing in Cache-Coherent Multiprocessors. |
---|
49 | Technical Report CSL-TR-93-593, Stanford University, December 1993. |
---|
50 | |
---|
51 | A detailed description of both versions will appear in the SPLASH-2 report. |
---|
52 | The non-contiguous partition allocation implementation is conceptually |
---|
53 | similar, except for the use of statically allocated 2-dimensional arrays. |
---|
54 | |
---|
55 | These programs work under both the Unix FORK and SPROC models. |
---|
56 | |
---|
57 | RUNNING THE PROGRAM: |
---|
58 | |
---|
59 | To see how to run the program, please see the comment at the top of the |
---|
60 | file main.C, or run the application with the "-h" command line option. |
---|
61 | Five command line parameters can be specified, of which the ones which |
---|
62 | would normally be changed are the number of grid points in each dimension, |
---|
63 | and the number of processors. The number of grid points must be a |
---|
64 | (power of 2+2) in each dimension (e.g. 130, 258, etc.). The number of |
---|
65 | processors must be a power of 2. Timing information is printed out at |
---|
66 | the end of the program. The first timestep is considered part of the |
---|
67 | initialization phase of the program, and hence is not included in the |
---|
68 | "Total time without initialization." |
---|
69 | |
---|
70 | BASE PROBLEM SIZE: |
---|
71 | |
---|
72 | The base problem size for an upto-64 processor machine is a 258x258 grid. |
---|
73 | The default values should be used for other parameters (except the number |
---|
74 | of processors, which can be varied). In addition, sample output files |
---|
75 | for the default parameters for each version of the code are contained in |
---|
76 | the file correct.out in each subdirectory. |
---|
77 | |
---|
78 | DATA DISTRIBUTION: |
---|
79 | |
---|
80 | Our "POSSIBLE ENHANCEMENT" comments in the source code tell where one |
---|
81 | might want to distribute data and how. Data distribution has an impact |
---|
82 | on performance on the Stanford DASH multiprocessor. |
---|
83 | |
---|