Research Group of Prof. Dr. M. Griebel
Institute for Numerical Simulation
maximize
Parnass2

  [Home][Overview][Applications][Hardware][Software][Related Sites][Credits]
 
 
 

Performance

Baseline measurements 
Applications 

 
 
 
 

Performance of Parnass2:

processors 128
peak performance 51.2 GFlop/s
51212 BogoMips
Linpack MP 29.60 GFlop/s 
with Nmax=53000, N1/2=6600
SPEC fp_rate95 13760
SPEC int_rate95 19240
Network bisection bandwidth 82 Gbit/s

Network Bandwidth

peak bandwith of ping-pong test with MPI

 
Software
node to node
process to process in a node
MPICH device
TCP/IP on Fast Ethernet
70 Mbit/s
245Mbit/s
 chameleon
Myricom TCP driver
80 Mbit/s
245Mbit/s
 chameleon
TCP on GM 0.2
not ready
not ready
 chameleon
BIP-IP 0.95b
180 Mbit/s
245Mbit/s
 chameleon
BullDog/ BDM
300 Mbit/s
n/a
 myri 
BIP-MPI 0.95b
350 Mbit/s
n/a
 gmpi
GM-MPICH on BIP 0.95b
?
n/a
 GM
GM-MPICH on GM 0.2
not ready
not ready
 GM
HPVM/ Fast Messages 1.0a
530 Mbit/s
260Mbit/s
 FM
Score 2.2/ MPICH-PM
500 Mbit/s
n/a
 score 
Score 2.4/ MPICH-PM
850 Mbit/s
1150 Mbit/s
 score + pmvm 

 
 

Performance of a single processor:

peak performance 400 MFlop/s
Linpack 280 MFlop/s (Linpack1000: 225 MFlop/s, double precision)
Mips 400.59 BogoMips
SPEC fp95 12.4  (base 11.4)
SPEC int95 15.8  (base 15.8)

Performance of a processing node:

peak performance 800 MFlop/s
Linpack MP 440 MFlop/s
Mips 801.18 BogoMips
SPEC fp_rate95 215  (base 202)
SPEC int_rate95 296  (base 295)

 

Navier-Stokes Simulation

Computer Processors channel flow 
192*96*192 = 3.5e6 cells
backward facing step 
96*80*59 = 4.53e5 cells
Parnass2 1  - 65   sec
2 23-30 minutes (due to swapping) 37.75 sec
4 80 sec 18.5  sec
8 42 sec 10.45 sec
16 21 sec  5.25 sec
32    2.25 sec
56    1   sec
64  5.5 sec  1.5 sec
Parnass 
(Origin 200 node)
1   81 sec
2   46 sec
3   32 sec
4   23 sec

 

Molecular Dynamics Simulation

Simulation with 250047 particles, Execution times (secs)

 
Processors Parnass2 one SGI Origin 200 node 
Parnass
network of SGI O2 nodes 
Parnass
2 4.402 13.369 59.664
4 2.419  7.5077 29.939
8 1.282  - 15.192
16 0.835  -  8.934

 

Simulation on Parnass2, Execution times (secs)

 
Processors 3375 particles  29791 particles  250047 particles  2000376 particles 
2 0.0424 0.485 4.402 37.118
4 0.176 0.256 2.419 18.67
8 0.019 0.147 1.282 10.32
16 0.012 0.100 0.835 6.009

 
 
 

Parallel Algebraic Multigrid

All AMG numbers presented here are preliminary and have been contributed by A. Krechel, K. Stüben (SCAI, GMD).
Execution time (secs) vs. number of processors (number of iterations).

 

Coal furnace (326,000 unknowns)

1 (18) 2 (22) 4 (21) 8 (20) 16 (21)
Parnass2 total 46.40 27.24 14.63 8.57 5.11
cycle 2.11 1.07 0.59 0.34 0.20
setup 8.47 3.78 2.34 1.77 0.93
IBM SP2 total 70.38 39.15 21.64 13.98 -
cycle 2.92 1.40 0.80 0.52 -
setup 17.79 8.29 4.83 3.55 -

 

Speedup (rel. to smallest Proc Config) vs. number of processors (number of iterations).


 
1 (18) 2 (22) 4 (21) 8 (20) 16 (21)
Parnass2 total 1.00 1.70 3.17 5.41 9.08
cycle 1.00 1.98 3.60 6.20 10.59
setup 1.00 2.24 3.62 4.79 9.11
IBM SP2 total 1.00 1.80 3.25 5.04 -
cycle 1.00 2.08 3.65 5.60 -
setup 1.00 2.15 3.68 5.01 -

Mercedes-Benz underhood flow (910,000 unknowns)

Execution time (secs) vs. number of processors (number of iterations).

 
1 (18) 2 (18) 4 (18) 8 (17) 16 (20) 32 (22)
Parnass2 total 129.70 62.84 37.00 19.95 14.60 -
cycle 5.72 2.83 1.67 0.93 0.52 -
setup 26.65 11.95 7.02 4.21 4.23 -
NEC Cenju4 total 245.44 125.87 66.97 35.82 25.17 18.16
cycle 10.00 5.17 2.77 1.54 0.97 0.65
setup 65.52 32.77 16.78 9.66 5.82 3.89
Speedup (rel. to smallest Proc Config) vs. number of processors (number of iterations).

 
1 (18) 2 (18) 4 (18) 8 (17) 16 (20) 32 (22)
Parnass2 total 1.00 2.06 3.50 6.50 8.88 -
cycle 1.00 2.02 3.44 6.18 11.04 -
setup 1.00 2.23 3.80 6.33 6.30 -
NEC Cenju4 total 1.00 1.95 3.68 6.85 9.75 13.52
cycle 1.00 1.93 3.61 6.50 10.33 15.41
setup 1.00 2.00 3.90 6.78 11.26 16.84

 
 

Mercedes-Benz external flow (2,230,000 unknowns)

Execution time (secs) vs. number of processors (number of iterations).

 
 
 
4 (26) 8 (21) 16 (24) 32 (27) 64 (27)
Parnass2 total 114.68 52.34 32.48 - -
cycle 3.77 2.02 1.09 - -
setup 16.56 9.88 6.24 - -
IBM SP2 total 159.41 69.50 47.25 36.00 -
cycle 4.50 2.46 1.54 1.06 -
setup 42.45 17.75 10.25 7.50 -
NEC Cenju4 total 227.92 96.31 54.71 36.66 24.32
cycle 7.05 3.44 1.80 1.12 0.74
setup 44.61 24.11 11.60 6.50 4.24
Speedup (rel. to smallest Proc Config) vs. number of processors (number of iterations).

 
4 (26) 8 (21) 16 (24) 32 (27) 64 (27)
Parnass2 total 1.00 2.19 3.53 - -
cycle 1.00 1.87 3.45 - -
setup 1.00 1.68 2.65 - -
IBM SP2 total 1.00 2.29 3.37 4.43 -
cycle 1.00 1.83 2.92 4.26 -
setup 1.00 2.39 4.14 5.66 -
NEC Cenju4 total 1.00 2.37 4.17 6.22 9.37
cycle 1.00 2.05 3.93 6.31 9.48
setup 1.00 1.85 3.85 6.86 10.52




Adaptive parallel Multigrid

Poisson problem 2D

Sequential execution times (secs)

Computer Processor Clock 
(MHz)
Time with optimization Time with debug option
SGI Challenge R8k 75 855.34  
SGI O2 R5k 180 738.82  
SGI Indigo2 R4400 200 614.74  
SGI O2 (new revision R5k 180 555.64 1195.89
Laptop Pentium 75 528.64  
SGI Indigo2 R4400 250 526.91 892.56
SGI O2 R10k 150 378.38 656.85
SGI O2 (new revision) R10k 196 289.63  
SGI Origin 200 R10k 180 291.54 512.42
SGI Origin 2000 R10k 195 266.85  
PC Pentium II 350 101.97 506.77
PC Alpha AXP164 533 88.617  

Poissson problem 2D, adaptive

Simulation on Parnass2, Execution times (secs)

time
processors
nodes
1
2
4
8
16
32
64
134 
0.37 
0.26 
0.24 
0.24 
0.24 
0.27 
0.27 
224 
0.69 
0.49 
0.36 
0.34 
0.31 
0.30 
0.32 
384 
1.27 
0.85 
0.69 
0.51 
0.42 
0.37 
0.35 
682 
2.38 
1.48 
1.04 
0.75 
0.57 
0.45 
0.41 
1243 
4.54
2.81 
1.81 
1.21 
0.83 
0.60 
0.51 
2320 
8.75
4.92 
3.13 
1.95 
1.25 
0.86 
0.62 
4391 
17.0
9.30
5.19 
3.26 
1.89 
1.25 
0.85 
8460 
33.5
17.8 
10.1 
5.57 
3.27 
1.92 
1.26 
16469 
66.9
34.4 
18.1 
10.1 
5.50 
3.21 
1.99 
32291 
133 
67.7 
35.4 
19.3 
10.3 
5.50 
3.27 
63736 
263 
134 
68.5 
36.6 
19.2 
10.5 
5.66 
126271
529 
272 
139 
70.4 
36.7 
19.1 
10.3 
250911
 
560 
278 
143 
71.8 
36.8 
19.1 

Simulation on a Cray T3E-1200 (600MHz) , Execution times (secs)

time
processors
nodes
1
4
16
64
128
256
1089 
5.08 
1.27 
0.72
0.64 
0.84 
1.30
1662 
5.85 
2.01 
0.97
0.72 
0.86 
1.33
2745 
10.7 
3.26 
1.37
0.85 
0.94 
1.38
4834 
20.3 
5.84 
2.01
1.08 
1.08 
1.46
8915 
39.8 
10.9 
3.38
1.42 
1.26 
1.56
16948 
78.5 
39.7 
5.68
2.08 
1.66 
1.78
32788 
157 
77.7 
10.7
3.34 
2.30 
2.14
64251 
   
20.7
5.97 
3.62 
2.80
126810 
     
10.9 
6.14 
4.12
251468 
       
11.2 
6.64
500135 
       
21.2 
11.7
996531 
       
41.0 
21.8
1988043 
       
80.6 
41.4

Poissson problem 3D, adaptive

Simulation on Parnass2, Execution times (secs)

time
processors
nodes
1
2
4
8
16
32
64
1191 
5.82 
3.64
2.53 
1.75 
1.35 
1.21 
1.22 
2178 
12.3 
7.94
5.07 
3.82 
3.02 
2.20 
1.97 
4454 
28.5 
17.0
11.0 
7.00 
4.74 
3.57 
3.00 
10061 
71.9 
43.9
26.5 
16.2 
10.2 
7.04 
5.16 
24215 
190 
108 
60.8 
36.3 
21.0 
14.0 
9.07 
61361 
510 
280 
157 
87.7 
49.0 
29.5 
17.7 
160384 
1418 
772 
404 
217 
125 
70.5 
40.8 
429613 
     
602 
318 
175 
95.8 

Simulation on a Cray T3E-1200 (600MHz) , Execution times (secs)

time
processors
nodes
1
4
16
64
128
256
35937 
291 
85.6 
29.6
11.2
7.61 
5.94 
50904 
423 
129 
41.0
14.8
10.1 
7.17 
89076 
405 
236 
71.2
24.6
14.6 
9.98 
189581 
   
154 
49.7
29.2 
17.2 
460421 
     
109 
61.1 
35.6 
1201650 
       
142 
77.2 
3251102 
       
345 
188 

Lame equation 3D, adaptive

Simulation on Parnass2, Execution times (secs)

time
processors
nodes
dof
1
2
4
8
16
32
64
125 
375 
0.10 
0.12 
0.11 
1.50 
2.91
   
450 
1350 
1.44 
0.99 
0.80 
1.35 
0.50 
0.39 
1.05 
1155 
3465 
4.14 
2.48 
1.71 
1.32 
1.00 
0.70 
2.74 
4412 
13236
19.0 
10.3 
6.09 
5.23 
3.07 
1.89 
1.21 
18890 
56670
98.6 
50.3 
28.1 
20.6 
11.6 
6.35 
3.70 
93021 
279063
582 
294 
157 
102 
54.8 
28.2 
15.1 
506620 
1519860
     
556 
306 
155 
78.1 
3178218
9534654
           
494 

Simulation on a Cray T3E-1200 (600MHz) , Execution times (secs)

time
processors
nodes
dof
1
4
16
64
128
256
512
768
1024
35937 
107811 
162 
34.1
9.11
2.23
1.23 
0.75 
0.55 
0.56 
0.52
109873 
329619 
435 
108 
29.6
7.20
3.57 
1.87 
1.13 
0.91 
0.80
410546 
1231638 
   
114 
28.6
14.2 
7.02 
3.51 
2.48 
1.94
1857030 
5571090 
     
133 
67.1 
33.3 
16.5 
11.0 
 
9619175 
28857525
       
351 
       

 
 
 

Adaptive Sparse Grids


Convection-Diffusion problem, 3D

Simulation on Parnass2, Execution times (secs)

nodes 1/h
processors
1 2 4 8 16 32 64
81 4 0.03 0.03 0.07 0.08 0.11     
225 8 0.12 0.09 0.09 0.11 0.16 0.20 0.20
593 16 0.63 0.41 0.33 0.32 0.38 0.44 0.53
1505 32 3.78 2.29 1.60 1.34 1.26 1.33 1.53
3713 64 22.1 13.3 8.79 6.39 5.17 4.47 4.47
8961 128 68.1 40.7 24.8 16.2 11.9 8.89 7.56
21249 256 201 119 66.1 40.1 28.0 18.6 13.5
49665 512 575 379 169 106 71.6   28.0
114689 1024 1630     275 179   62.6

Simulation on a Cray T3E-600 (300MHz) , Execution times (secs)

nodes 1/h
processors
1 4 16 32 64 128 256 512
81 4 0.04 0.06 0.10 0.14 0.16      
225 8 0.25 0.20 0.38 0.54 0.57 0.59    
593 16 1.31 0.71 0.89 1.24 1.53 1.87 2.34 3.26
1505 32 10.0 3.66 3.03 3.67 5.75 5.28 7.17 9.97
3713 64 52.5 20.1 12.8 11.9 13.1 15.4 20.5 29.9
8961 128 158 58.2 29.6 22.6 20.9 23.0 27.9 40.6
21249 256 473 192 66.6 44.9 67.6 32.2 36.7 94.0
49665 512 1426 396 156 95.4 62.0 48.6 48.2 102.2
114689 1024 4112       125 85.9 68.5  

Convection-Diffusion problem, 3D, adaptive sparse grid

Simulation on Parnass2, Execution times (secs)

nodes 1/h
processors
1 2 4 8 16 32
81 4 0.03 0.04 0.05 0.07 0.11  
201 8 0.07 0.05 0.05 0.07 0.08 0.09
411 16 0.21 0.13 0.12 0.13 0.17 0.20
711 32 0.78 0.48 0.38 0.36 0.41 0.51
1143 64 2.60 1.49 1.06 0.93 0.92 1.14
1921 128 8.69 5.99 3.70 2.88 2.70 2.83
3299 256 39.3 20.7 13.8 9.62 7.79 7.32
6041 512 177 91.0 56.8 39.5 28.6 22.0
11787 1024 949 525 271 177 138 88.2
22911 2048     1280 761 660 358

 
 

Computing time on a T3E-1200/ T3E-600 donated by Cray Research.