Note: This tutorial assumes some unix and programming experience.
While logged into the cluster, you will have the same home directory shared accross all nodes of the supercomputer via NFS. It is important to keep in mind that file accesses to your home directory will be slower if called from computers other than head.ohm.calvin.edu. If you are writing a high-performance program and need to write to a temporary file, consider using /tmp, which will be local to each machine.
There are two ways of taking advantage of the Beowulf Supercomputer:
The easiest way to take advantage of the supercomputer is to run specific programs on specific nodes. This is excellent for quick easy jobs, but is not as useful for large, extensive programs.
The slave nodes on the supercomputer are named s0 - s15, with masters named m1 and m2, and the fileserver named head. To run a command on a specific computer, launch a remote shell on the slave node using rsh. For example if we want to check the memory usage on slave node 0, we can do:
[dvos12@m1 dvos12]$ rsh s0 free
total used free shared buffers cached
Mem: 1035524 505804 529720 58480 327600 17324
-/+ buffers/cache: 160880 874644
Swap: 1132540 0 1132540
The usage for rsh is
where nodename is the name of slave node you want to run the program on (we ran on slave node #0 in the example above). command is any unix command.rsh nodename command
Another useful command is rcp. rcp allows you to copy files from one computer to another.
There are also secure versions of rsh and rcp called ssh and scp. They behave like the former commands, with the exception that all information sent between computers using these commands is encrypted. That is vital for sending things like passwords accross the Internet, but is not as important for a closed environment such as this. Note that ssh and scp are a little slower becaues of the encryption, so rsh and rcp should be used when performance is key.
There are also some commands built specifically for this computer. Say, for example, that you want to run a single command on all the nodes at once. You can use ohmsh. Type 'ohmsh -h' to find out the syntax. Example: 'ohmsh -s date' will get the date from all of the slave nodes, in order.
[dvos12@m1 dvos12]$ ohmsh -s date rsh s0 date Fri Oct 12 16:43:33 EDT 2001 rsh s1 date Fri Oct 12 16:43:33 EDT 2001 rsh s2 date Fri Oct 12 16:43:33 EDT 2001 rsh s3 date Fri Oct 12 16:43:34 EDT 2001 rsh s4 date Fri Oct 12 16:43:33 EDT 2001 rsh s5 date Fri Oct 12 16:43:33 EDT 2001 rsh s6 date Fri Oct 12 16:43:33 EDT 2001 rsh s7 date Fri Oct 12 16:43:34 EDT 2001 rsh s8 date Fri Oct 12 16:43:34 EDT 2001 rsh s9 date Fri Oct 12 16:43:34 EDT 2001 rsh s10 date Fri Oct 12 16:43:34 EDT 2001 rsh s11 date Fri Oct 12 16:43:33 EDT 2001 rsh s12 date Fri Oct 12 16:43:34 EDT 2001 rsh s13 date Fri Oct 12 16:43:34 EDT 2001 rsh s14 date Fri Oct 12 16:43:34 EDT 2001 rsh s15 date Fri Oct 12 16:43:34 EDT 2001
ohmcp can be used to copy files from one computer to multiple computers.
ohmps is a very useful command that allows you to check on all user processes running on the cluster. This is handy for finding out if you still have a program running on one of the nodes.
ohmsh, ohmcp, and ohmps are all found in /ohm/scripts It is recommended that you add that path to your $PATH variable if you use these commands often
The more powerful way to take advantage of the supercomputer is to write the code itself in such a way that uses the supercomputer. This is more flexible and is the recommended approach. One way of doing this is to use a library to handle the communication between all of the nodes. The most common such library is MPI -- Message Passing Interface.
MPI allows programmers to write a single program that will simultaneously run on all nodes at once. MPI can be used with either Fortran or C.
Because MPI is more extensive, it is too large to cover in one web page. But, there are several good references:
- Pacheco, Peter S. Paralel Programming with MPI. Morgan Kaufmann : San Francisco
- MPI Tutorial
Both of these can be found in the cluster room (SB 381). Stop by if you have any questions.
To run an MPI application, execute 'mpirun -np num_nodes executable' where num_nodes is the number of nodes the program will run on and executable is the compiled program you wish to run. To run a program on all available nodes, use 'mpirun --all-nodes executable'.
To compile an MPI program, use 'mpicc'. For example,
To run an MPI program, use 'mpirun'. For example,mpicc greetings.c -o greetings
will run 'greetings' on 17 machines, (16 nodes + 1 master).mprun -np 17 ./greetings