History of High Performance Computing in Cambridge

Cambridge-Cranfield HPCF  > Information for Users  > User Guide Pt6 Hodgkin

Back to Pt5 (Hartree)  

Hodgkin: Running Jobs

Hodgkin appears to the user to be very similar to a single UNIX workstation.


Interactive Usage

Short serial jobs maybe run interactively on hodgkin. MPI jobs must be submitted to the Batch queue.


Batch Jobs

On the Origin 2000, jobs are submitted from the computer itself, not from a front-end workstation. The LSF queuing system is used. There are queues s, t and u, where is the number of processors and may be 2, 4, 8, 16 or 32. The 's' queues have a 24 hours real-time limit, the 't' ones a 2 hour limit, and the 'u' ones a 10 minute limit. Queue s32 may be used only by special request. In all cases, a job with processors may use up to 2x GB of memory

To submit a job, type bsub -q queue program on hodgkin, where the program may be a script or executable. It will be scheduled and run in the directory (and most of the environment) that it was submitted from and NOT as if it were a fresh login. To run a program using MPI on 8 processors a suitable job submission script is

hodgkin [6] cat test.job
#!/bin/sh
date
mpirun -np 8 myprog.mpi

and this can be submitted to the 24hr queue with

hodgkin [7] bsub -q s8 test.job

By default, LSF returns stdout and stderr by mail, and there is a 1 MB limit on the size of mail messages, so you must remember to redirect output in the script or by using bsub -q s8 test.job -o out_file -e err_file if it is likely to be large. Mail messages can be read using mail or pine.

To track the progress of a job, check the queue status and kill a job, use the bqueues, bjobs and bkill commands - see the man pages for details. The command top can also be used to track CPU and memory usage.

Please be very careful not to use more CPUs or memory that the queue supports. This area is very unstable and, if you do, your job might suddenly stop working or (worse) interfere with other users' jobs. For example, -np on the mpirun command must match the in the queue name.


Back to Pt5 (Hartree)