What you need to know to run your project
Developing projects on the red/blue cluster is a bit different
from developing and testing a project on the CSIC or Detective
cluster. Unlike those clusters, resources are managed with
a scheduler and access is more restricted.
The cluster is composed of a frontend (redleader) and 27
processing nodes. The frontend is to be used for editing
your code, doing light compiles and such. To run any processing
or testing of your code, you must submit it through the
scheduler.
The scheduler takes care of assigning processing nodes
to jobs. Basically, when you get assigned a node, you will
be the only person on it for the duration of your job. After
your timelimit is up or your process ends, the node will
be cleaned and locked down for the next submission.
- Logging in
To gain access to any of the nodes you will first need
to log into redleader.umiacs.umd.edu using
ssh. This machine acts as a gateway to the rest of the
cluster. No intensive processing is to be run
on redleader. This machine is shared with every
other person in the class and in various research projects
throughout the institute. If you run an intensive process
on redleader, it will be killed so other research will
not be affected.
- Changing your password
The UMIACS cluster is part of a larger DCE/kerberos
installation. Unfortunately, it's not possible to change
your password on any cluster linux machines at this
time. You will have to ssh into odin.cfar.umd.edu ,
mellum.cfar.umd.edu , or fenris.cfar.umd.edu
and run the 'passwd' command to change
your password.
- Setting up your environment
After you are logged in, you will have to set your
account up to allow pbs to access is from any of the
processing nodes. This is required since pbs will write
the stdout and stderr to files in your account. Use
ssh-keygen with no password to create a keypair that can be used to grant
access for your jobs. These can be generated by running the following:
cd $HOME
ssh-keygen -t rsa1 -N "" -f $HOME/.ssh/identity
ssh-keygen -t rsa -N "" -f $HOME/.ssh/id_rsa
ssh-keygen -t dsa -N "" -f $HOME/.ssh/id_dsa
cd .ssh
touch authorized_keys authorized_keys2
cat identity.pub >> authorized_keys
cat id_rsa.pub id_dsa.pub >> authorized_keys2
chmod 640 authorized_keys authorized_keys2
To test your keys, you should be able to 'ssh redleader' and be returned to a prompt.
- Requesting interactive usage
Sometimes you will want to test an intensive program without
preparing a submission script and going through the hassle of the scheduler.
You can run '/opt/UMtorque/bin/qsub -I' to request interactive usage on a node. After running
qsub -I your shell will hang until a resource can be allocated to you. When
the resource has been allocated, it will open up a new shell on the
allocated node. You can now ssh into the node for the duration of the
allocated shell. When you logout from the initial shell, or your timelimit
is up, the node will again be locked down and you will have to ask the
scheduler for access again.
- Running your first job
We know that your project is likely to require MPI,
PVM, or parallel libraries, but walking through a simple
'hello world' submission script will help you understand
how submitting jobs works a bit better.
- Create a submission file
In your home directory on redleader, create a file
called test.sh that contains the following:
#!/bin/bash
#PBS -lwalltime=10:00
#PBS -lnodes=3
echo hello world
hostname
echo finding each node I have access to
for node in `cat ${PBS_NODEFILE}` ; do
echo ----------
/usr/bin/ssh $node hostname
echo ----------
done
The script is a normal shell script except that
it includes extra #PBS directives. These directives
control how you request resources on the cluster.
In this case we are requesting 10 minutes of total
node time split across 3 nodes. Each node will be
given 3:33 minutes of access to you. Often times
people will forget to specify walltime for jobs
over 2 nodes. The default walltime is 48hrs/node,
so requesting 3 nodes will try to schedule 144 hours
of cluster time which exceeds the maximum allowed.
- submit the job to the scheduler using /opt/UMtorque/bin/qsub
[toaster@redleader ~]$ /opt/UMtorque/bin/qsub test.sh
20483.rogueleader.umiacs.umd.edu
You can check the status of your job by running
/opt/UMtorque/bin/qstat
[toaster@redleader ~]$ /opt/UMtorque/bin/qstat -n
rogueleader.umiacs.umd.edu:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
20483.roguelead toaster dque test.sh 8210 3 -- -- 48:00 R --
red03/0+blue12/0+blue11/0
This shows us that the job is running 'R' and is
using nodes red03, blue12, and blue11. A 'Q' for
status means that your job is waiting in line for
resources to free up. If you requested too many
resources, your job will sit in queue until the
end of time.
- check output
When your job is finished, you will have two files
in the directory you submitted the job from. They
contain stdout (.oJOBID) and stderr (.eJOBID)
The job we submitted above generated an empty error
file test.sh.e20483 and the following
stdout file:
[toaster@redleader ~]$ cat test.sh.o20483
echo toaster hard maxlogins 18 >> /etc/security/limits.conf
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
hello world
red03.umiacs.umd.edu
finding each node I have access to
----------
red03.umiacs.umd.edu
----------
----------
blue12.umiacs.umd.edu
----------
----------
blue11.umiacs.umd.edu
----------
The first three lines in your output are a standard
part of how we have our cluster configured and do
not affect how your program runs.
- Running an MPI program
Now down to the part you actually care about, running an MPI program. We have two different MPI installations at UMIACS, LAM and MPICH. LAM is the default in /usr/local/, several version of MPICH is available in /usr/local/stow/mpich-version. First, you need to have an MPI based program written. Here's a simple one:
alltoall.c
- To compile this program and execute under using LAM do the following:
It can be compiled by doing: mpicc alltoall.c -o alltoall
The submission file to use will need to call a wrapper program to initialize the MPI environment for your program to run
#PBS -l nodes=4
#PBS -l walltime=5:00
cd ~/
/usr/local/bin/lamboot $PBS_NODEFILE
/usr/local/bin/mpirun C alltoall
/usr/local/bin/lamhalt
Output files for this job: STDOUT
and STDERR
- To compile and run this program under MPICH you need to change your environment a little:
The following script will set the appropriate environment.
setenv MPI_ROOT /usr/local/stow/mpich-version
setenv MPI_LIB $MPI_ROOT/lib
setenv MPI_INC $MPI_ROOT/include
setenv MPI_BIN $MPI_ROOT/bin
# add MPICH commands to your path (includes mpirun and mpicc)
set path=($MPI_BIN $path)
# add MPICH man pages to your manpath
if ( $?MANPATH ) then
setenv MANPATH $MPI_ROOT/man:$MANPATH
else
setenv MANPATH $MPI_ROOT/man
endif
It can be compiled by doing: mpicc alltoall.c -o alltoall (remember we changed our environment to point to MPICH's mpicc)
The submission file is almost the same except you need to call pbsmpich instead of pbslam.
#PBS -l nodes=10
#PBS -l walltime=40:00
cd ~/mpitest/mpich
exec pbsmpich -vCD ./alltoall
Please note that if you compile your program with either mpich or pbs, you MUST execute it in the same environment. When using MPICH it's common to compile using /usr/local/bin/mpicc (LAM) and then attempting to run using pbsmpich. This will fail and you will get an error message similiar to the following:
It seems that there is no lamd running on the host blue12.umiacs.umd.edu.
This indicates that the LAM/MPI runtime environment is not operating.
The LAM/MPI runtime environment is necessary for MPI programs to run
(the MPI program tired to invoke the "MPI_Init" function).
Please run the "lamboot" command the start the LAM/MPI runtime
environment. See the LAM/MPI documentation for how to invoke
"lamboot" across multiple machines.
This introduction is just barely enough to get you started.
You'll understand the cluster much better if you read the
user manual at:
cluster-manual.html |