
Beowulf Cluster  Costeffective Supercomputing Beowulf clusters are the costeffective approach to supercomputing. By fully utilizing the capacity of "commodity off the shelf" hardware and the performance of open source software, Beowulf clusters offer massive parallelcomputing power that rivals traditional supercomputers at a fraction of the cost. At Chaogic Systems, we custom design and manufacture Beowulf clusters according to specific application needs. Whether it's scientific, engineering, or financial, we carefully analyze the nature of intended applications, and optimize critical system hardware/software configurations to deliver supercomputing solutions that minimize computational bottlenecks and maximize application peak performance. One of our first clusters was a 64 processor dual Athlon cluster built for a University of Houstonphysics research group headed by professor Kevin Bassler. Prof. Bassler's research interests include complex dynamic systems, such as lattice structures of superconducting materials and emergent behaviors of simple agent networks. His work often involves mathematical modeling and simulations that require massive computing power. The cluster had a peak performance of 51.96 Gflops with Linpack benchmark, where N is set to 60000. Linpack benchmark solves a dense linear system of order N using Gaussian elimination. It is the industry standard for benchmarking supercomputers. In comparison, a typical Intel P4 2GHz desktop PC has roughly 1 Gflops of computing power. Hardware/Software Configurations DualCPU Computing Nodes (x 32): DualCPU Master Node: RAID Array Storage: Interconnect: Software: Linpack Benchmark Results
============================================================================ HPLinpack 1.0  HighPerformance Linpack benchmark  September 27, 2000 Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK ============================================================================ An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 60000 NB : 60 P : 8 Q : 8 PFACT : Right NBMIN : 4 NDIV : 2 RFACT : Right BCAST : 1ringM DEPTH : 1 SWAP : Mix (threshold = 64) L1 : transposed form U : transposed form EQUIL : yes ALIGN : 8 double precision words   The matrix A is randomly generated for each test.  The following scaled residual checks will be computed: 1) Axb_oo / ( eps * A_1 * N ) 2) Axb_oo / ( eps * A_1 * x_1 ) 3) Axb_oo / ( eps * A_oo * x_oo )  The relative machine precision (eps) is taken to be 1.110223e16  Computational tests pass if scaled residuals are less than 16.0 ============================================================================ T/V N NB P Q Time Gflops  W11R2R4 60000 60 8 8 2771.39 5.196e+01  Axb_oo / ( eps * A_1 * N ) = 0.0522793 ...... PASSED Axb_oo / ( eps * A_1 * x_1 ) = 0.0127078 ...... PASSED Axb_oo / ( eps * A_oo * x_oo ) = 0.0024242 ...... PASSED ============================================================================ Finished 1 tests with the following results: 1 tests completed and passed residual checks, 0 tests completed and failed residual checks, 0 tests skipped because of illegal input values.  End of Tests.


