Help

How to use the Beowulf Supercomputer

Note: This tutorial assumes some unix and programming experience.

While logged into the cluster, you will have the same home directory shared accross all nodes of the supercomputer via NFS. It is important to keep in mind that file accesses to your home directory will be slower if called from computers other than head.ohm.calvin.edu. If you are writing a high-performance program and need to write to a temporary file, consider using /tmp, which will be local to each machine.

There are two ways of taking advantage of the Beowulf Supercomputer:

  1. by running specific programs on specific slave nodes (using rsh) and
  2. by running a single program on all of the nodes (MPI).

  1. rsh
  2. The easiest way to take advantage of the supercomputer is to run specific programs on specific nodes. This is excellent for quick easy jobs, but is not as useful for large, extensive programs.

    The slave nodes on the supercomputer are named s0 - s15, with masters named m1 and m2, and the fileserver named head. To run a command on a specific computer, launch a remote shell on the slave node using rsh. For example if we want to check the memory usage on slave node 0, we can do:

    [dvos12@m1 dvos12]$ rsh s0 free
                 total       used       free     shared    buffers     cached
    Mem:       1035524     505804     529720      58480     327600      17324
    -/+ buffers/cache:     160880     874644
    Swap:      1132540          0    1132540
      

    The usage for rsh is

    rsh nodename command
    where nodename is the name of slave node you want to run the program on (we ran on slave node #0 in the example above). command is any unix command.

    Another useful command is rcp. rcp allows you to copy files from one computer to another.

    There are also secure versions of rsh and rcp called ssh and scp. They behave like the former commands, with the exception that all information sent between computers using these commands is encrypted. That is vital for sending things like passwords accross the Internet, but is not as important for a closed environment such as this. Note that ssh and scp are a little slower becaues of the encryption, so rsh and rcp should be used when performance is key.

    There are also some commands built specifically for this computer. Say, for example, that you want to run a single command on all the nodes at once. You can use ohmsh. Type 'ohmsh -h' to find out the syntax. Example: 'ohmsh -s date' will get the date from all of the slave nodes, in order.

      [dvos12@m1 dvos12]$ ohmsh -s date
      rsh s0 date
      Fri Oct 12 16:43:33 EDT 2001
      rsh s1 date
      Fri Oct 12 16:43:33 EDT 2001
      rsh s2 date
      Fri Oct 12 16:43:33 EDT 2001
      rsh s3 date
      Fri Oct 12 16:43:34 EDT 2001
      rsh s4 date
      Fri Oct 12 16:43:33 EDT 2001
      rsh s5 date
      Fri Oct 12 16:43:33 EDT 2001
      rsh s6 date
      Fri Oct 12 16:43:33 EDT 2001
      rsh s7 date
      Fri Oct 12 16:43:34 EDT 2001
      rsh s8 date
      Fri Oct 12 16:43:34 EDT 2001
      rsh s9 date
      Fri Oct 12 16:43:34 EDT 2001
      rsh s10 date
      Fri Oct 12 16:43:34 EDT 2001
      rsh s11 date
      Fri Oct 12 16:43:33 EDT 2001
      rsh s12 date
      Fri Oct 12 16:43:34 EDT 2001
      rsh s13 date
      Fri Oct 12 16:43:34 EDT 2001
      rsh s14 date
      Fri Oct 12 16:43:34 EDT 2001
      rsh s15 date
      Fri Oct 12 16:43:34 EDT 2001
    

    ohmcp can be used to copy files from one computer to multiple computers.

    ohmps is a very useful command that allows you to check on all user processes running on the cluster. This is handy for finding out if you still have a program running on one of the nodes.

    ohmsh, ohmcp, and ohmps are all found in /ohm/scripts It is recommended that you add that path to your $PATH variable if you use these commands often

  3. MPI
  4. The more powerful way to take advantage of the supercomputer is to write the code itself in such a way that uses the supercomputer. This is more flexible and is the recommended approach. One way of doing this is to use a library to handle the communication between all of the nodes. The most common such library is MPI -- Message Passing Interface.

    MPI allows programmers to write a single program that will simultaneously run on all nodes at once. MPI can be used with either Fortran or C.

    Because MPI is more extensive, it is too large to cover in one web page. But, there are several good references:

    Both of these can be found in the cluster room (SB 381). Stop by if you have any questions.

    To run an MPI application, execute 'mpirun -np num_nodes executable' where num_nodes is the number of nodes the program will run on and executable is the compiled program you wish to run. To run a program on all available nodes, use 'mpirun --all-nodes executable'.

    Links

    1. MPI Guide in pdf and postscript format.
    2. MPI Tutorial with C and Fortran examples MPI Collective Communication
    3. MPI Standard
    4. MPI C and Fortran Reference
    5. Examples

    To compile an MPI program, use 'mpicc'. For example,

    mpicc greetings.c -o greetings
    To run an MPI program, use 'mpirun'. For example,
    mprun -np 17 ./greetings
    will run 'greetings' on 17 machines, (16 nodes + 1 master).
    The commands are located in /usr/local/mpich/mpich/bin/, so you may want to add that directory to your $PATH variable.