Domain Name Info:
Ex: chaogic.com
Beowulf Cluster - Cost-effective Supercomputing

Beowulf clusters are the cost-effective approach to supercomputing. By fully utilizing the capacity of "commodity off the shelf" hardware and the performance of open source software, Beowulf clusters offer massive parallel-computing power that rivals traditional supercomputers at a fraction of the cost.

At Chaogic Systems, we custom design and manufacture Beowulf clusters according to specific application needs. Whether it's scientific, engineering, or financial, we carefully analyze the nature of intended applications, and optimize critical system hardware/software configurations to deliver supercomputing solutions that minimize computational bottlenecks and maximize application peak performance.

One of our first clusters was a 64 processor dual Athlon cluster built for a University of Houstonphysics research group headed by professor Kevin Bassler. Prof. Bassler's research interests include complex dynamic systems, such as lattice structures of superconducting materials and emergent behaviors of simple agent networks. His work often involves mathematical modeling and simulations that require massive computing power.

The cluster had a peak performance of 51.96 Gflops with Linpack benchmark, where N is set to 60000. Linpack benchmark solves a dense linear system of order N using Gaussian elimination. It is the industry standard for benchmarking supercomputers. In comparison, a typical Intel P4 2GHz desktop PC has roughly 1 Gflops of computing power.

Hardware/Software Configurations

Dual-CPU Computing Nodes (x 32):
  • AMD Athlon MP 2200+ CPU (x 2)
  • Corsair 1GB PC2100 Registered ECC DDR RAM
  • Tyan Tiger MPX S2466N Motherboard
  • Seagate 40G 7200RPM IDE Hard Drive (Local Caching)
  • Built-in 3Com 3C920 100Mbit NIC

    Dual-CPU Master Node:
  • AMD Athlon MP 2200+ CPU (x 2)
  • Corsair 2GB PC2100 Registered ECC DDR RAM
  • Tyan Thunder K7X S2468UGN Motherboard
  • Seagate 36.7G 10000RPM U160 SCSI Hard Drive (x 2)
  • Built-in 3Com 3C920 100Mbit NIC (x 2)
  • 3Com 3C996B-T Gigabit Copper Server NIC
  • 3Com 3C996-SX Gigabit Fiber-SX Server NIC

    RAID Array Storage:
  • Raidstor 8 Bay IDE Array w/ 120GB HD, 840GB @ RAID5
  • Raidstor 12 Bay IDE Array w/ 200GB HD, 2.20TB @ RAID5

    Inter-connect:
  • KTI 24 port 100Mbit Switch w/ 800Mbit Trunking (x 3)

    Software:
  • Operating System: RedHat Linux 7.3 / Kernel 2.4.18
  • Message Passing Library: MPICH 1.2.3, LAM 6.5.6
  • Math Library: ATLAS 3.4.1, LAPACK 3.0
  • Scheduler: OpenPBS 2.3.16

    Linpack Benchmark Results
    ============================================================================
    HPLinpack 1.0  --  High-Performance Linpack benchmark  --  September 27, 2000
    Written by A. Petitet and R. Clint Whaley,  Innovative Computing Labs.,  UTK
    ============================================================================
    
    An explanation of the input/output parameters follows:
    T/V    : Wall time / encoded variant.
    N      : The order of the coefficient matrix A.
    NB     : The partitioning blocking factor.
    P      : The number of process rows.
    Q      : The number of process columns.
    Time   : Time in seconds to solve the linear system.
    Gflops : Rate of execution for solving the linear system.
    
    The following parameter values will be used:
    
    N      :   60000
    NB     :      60
    P      :       8
    Q      :       8
    PFACT  :   Right
    NBMIN  :       4
    NDIV   :       2
    RFACT  :   Right
    BCAST  :  1ringM
    DEPTH  :       1
    SWAP   : Mix (threshold = 64)
    L1     : transposed form
    U      : transposed form
    EQUIL  : yes
    ALIGN  : 8 double precision words
    
    ----------------------------------------------------------------------------
    
    - The matrix A is randomly generated for each test.
    - The following scaled residual checks will be computed:
       1) ||Ax-b||_oo / ( eps * ||A||_1  * N        )
       2) ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  )
       3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )
    - The relative machine precision (eps) is taken to be          1.110223e-16
    - Computational tests pass if scaled residuals are less than           16.0
    
    ============================================================================
    T/V                N    NB     P     Q               Time             Gflops
    ----------------------------------------------------------------------------
    W11R2R4        60000    60     8     8            2771.39          5.196e+01
    ----------------------------------------------------------------------------
    ||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0522793 ...... PASSED
    ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0127078 ...... PASSED
    ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0024242 ...... PASSED
    ============================================================================
    
    Finished      1 tests with the following results:
                  1 tests completed and passed residual checks,
                  0 tests completed and failed residual checks,
                  0 tests skipped because of illegal input values.
    ----------------------------------------------------------------------------
    
    End of Tests.
    

  • Projects : Beowulf Cluster

     
    Copyright© chaogic.com, Chaogic Systems, LLC
    >>Top of Page