History of High Performance Computing in Cambridge

Cambridge-Cranfield HPCF  > Information for Users  > User Guide Pt5 Maxwell

Back to Pt4 ( Franklin)

Maxwell: Running Jobs

When logging into Maxwell you will actually get connected to the front-end machine, Maxwell-a. The frontend can be used for short interactive jobs and compilation. The frontend has 2.2 Ghz Opteron processors and 4Gb memory. Batch jobs are submitted to the queuing system and processed on the computation nodes. There should be no need to log into the computation nodes directly.


Interactive Usage

The front-end machine has only 2 cpus and is used for login and file serving as a result interactive jobs are strongly discouraged. Please use the queuing system.


Batch Jobs

Maxwell uses the Sun Grid Engine Batch queuing system. There are queues s<n>, t<n> and u<n>, where <n> is the number of processors and may be 2, 4, 8, 16, 32 or 64.  The 's' queues have a 24 hours real-time limit, the 't' ones a 2 hour limit and the 'u' ones a 10 minute limit. (The ``s'' queues are for production runs, ``t'' and ``u'' are for testing and development). There are also s<n>-l, t<n>-l and u<n>-l, which select nodes with larger memory to run on where <n> in this case can be 2, 4, 8 ,16 or 32.

Generally, the s32 logical queue is the best one for production work.

To submit a job, type qsub -Q queue test.job on one of the frontend machines, where the test.job may be a script or executable. It will be scheduled and run in the directory (and most of the environment) that it was submitted from and NOT as if it were a fresh login. Note that the -Q option is a local feature. To run a program using MPI on 16 processors a suitable job submission script is

maxwell-a [1] cat test.job
#!/bin/sh
date
mpirun -np 16 myprog.mpi

and this can be submitted to the 24hr queue with

maxwell-a [2] qsub -Q s16 test.job
your job 2352 ("job.go") has been submitted

The command qstatu can be used to monitor the status of the job

maxwell-a [3] qstat

job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
199 0.75000 test.job pas r 08/17/2004 17:42:32 s64@maxwell-a01-grid 16

The state of a job is usually one of (r)unning, (q)ueued, (d)eleted or (t)hreaded (ie about to be executed).

The command qstatj job-ID gives more information about the state of a particular job. 

To delete a job use the qdel command

maxwell-a [4] qdel 2352
spqr1 has deleted job 2352

Both stdout and stderr are returned in files such as job.go.o2352 and job.go.e2352.


Job Scheduling and Priority Access

Maxwell runs Sun's Grid Engine queuing system. Grid Engine schedules jobs in an attempt to share the available resources between different users and projects. The command qstats provides information on job scheduling and a user's resources.

bash-2.05b$ qstats
priority is the job priority=ntckts/2+nurg/2
ntckts is the normalised number of sharetree tickets for the job
nurg is the normalised urgency based on how long the job has been queued for
The pts is the percentage share of the resources the project should get
The pas is the percentage share of the resources the project has had
JID user project state priority ntckts nurg sub/start at q pts pas
2586 pjb40 damtp2 r 0.49852 0.00527 1.00000 4/1 18:52:31 s32* 0.08 9.68
2598 pjb40 damtp2 r 0.30566 0.00574 0.67946 4/1 21:14:32 s32* 0.08 9.68
2608 smh36 damtp2 r 0.00087 0.00175 0.00000 5/1 08:57:30 s16* 0.08 9.68
2607 ndd21 phys1 qw 0.68360 1.00000 0.36720 5/1 03:46:04 s32* 69.39 11.96
2614 ndd21 phys1 qw 0.37636 0.50010 0.25261 5/1 10:22:31 u32* 69.39 11.96
2601 zl222 chem1 qw 0.33731 0.00681 0.66781 4/1 10:25:56 s32* 0.07 11.42
2611 pjb40 damtp2 qw 0.13378 0.00549 0.26207 5/1 09:49:48 s32* 0.08 9.68


Compilation

We give a quick example of fortran programming with MPI. Further details will be found on the CCHPCF website.

On the CCHPCF, the relevant commands have been surrounded by wrapper scripts, so that the default is to provide a reasonable set of defaults for the CCHPCF systems and some detection of potential pitfalls. This is optional, and can be disabled entirely, but you are advised not to do that unless you know what you are doing.

By default the environment variable HPCF_MODE is set to ``yes''. This will set a good level of optimisation and link to the Sun Performance Libraries, which contain optimised versions of BLAS, LAPACK, FFT routines etc.

To compile an MPI program you should set the environment variable HPCF_MPI to ``yes''. For (ba)sh

maxwell-a> export HPCF_MPI=yes

This will set up the correct paths and libraries. A complete example of compiling a Fortran MPI program is given below:

maxwell-a> cat hello.f90
program hello
include 'mpif.h'
integer npe,mype,ierr

call mpi_init(ierr)
if (ierr.ne.0) stop 'MPI initialisation error'

call mpi_comm_rank(mpi_comm_world, mype, ierr)
call mpi_comm_size(mpi_comm_world, npe, ierr)

write(*,101) mype,npe
101 format(' Hello parallel world, I am process ',I3,' out of ',I3)

call mpi_finalize(ierr)
end program hello

maxwell-a> export HPCF_MPI=yes
maxwell-a> mppathf90 hello.f90
maxwell-a>
maxwell-a> mpirun -np 2 ./a.out
Hello parallel world, I am process 0 out of 2
Hello parallel world, I am process 1 out of 2
maxwell-a>
Back to Pt3 (Job Submission) Forward to Pt5 (Hartree)