History of High Performance Computing in Cambridge

Cambridge-Cranfield HPCF  > Information for Users  > User Guide Pt5 Hartree

Back to Pt4 (Franklin)  

Hartree: Running Jobs

When logging into Hartree you will actually get connected to one of the I/O nodes, hartree_a. The I/O nodes can be used for short interactive jobs and compilation. Batch jobs are submitted to the queuing system and processed on the computation nodes. There should be no need to log into the computation nodes directly.


Interactive Usage

Short serial jobs can be run on the I/O nodes. To run an MPI exectuable for testing or debugging, use the command poe eg

hartree_a [1] poe prog.x -procs 2

Your current directory must contain a host.list file for node allocation, with as many lines as processors requested eg,

hartree_a [2]cat host.list
hartree_a
hartree_a

Note that executable compiled with certain Power4 optimisations may not run interactively on the I/O nodes. See the webpages on compilation for more details


Batch Jobs

On Hartree the LoadLeveller scheduler is used. The queues are named [u | t | s][2 | 4 | 8 | 16 | 32 ]: 'u' for 10 minutes, 't' for 2 hours, and 's' for 24 hours. (The ``s'' queues are for production runs, ``t'' and ``u'' are for testing and development) All queues of 8 or more processors have exclusive use of the node(s) they run on. A single node contains 8 1.1GHz Power4 processors and 16GB of shared memory.

For communication intensive codes (eg any FFT based codes) it is recommended to use the 8 processor queues. Jobs requiring larger resources should use Franklin

To run an MPI program on 8 processors in the 2hr queue a suitable job submission script is

hartree_a [6] cat test.job
#!/bin/sh
#@ class = t8
#@queue

date
./myprog.mpi

Note that the mpriun command is not used. The first three lines are required by the Loadleveller scheduler, only the class should be altered to reflect the queue you wish to submit to.

This script can be submitted with

hartree_a [7] llsubmit test.job 
stderr will be sent to LoadL.err.$(jobid).$(stepid)
stdout will be sent to LoadL.out.$(jobid).$(stepid)
llsubmit: Processed .. Submit Filter: "/var/loadl/home/prefilter".
llsubmit: The job "hartree_a.823" has been submitted.

The llq command is used to monitor the status of the job.

hartree_a [8] llq
Id                   Owner   Submitted   ST PRI Class  Running On 
-------------------- ------- ----------- -- --- ------ -----------
hartree_e.14120.0    spqr1    2/23 10:43 ST 50  t8    hartree_b

The llcancel command is used to delete jobs.

hartree_a [9] llcancel  hartree_e.14120.0
llcancel: Cancel command has been sent to the central manager.

Back to Pt4 (Franklin)