| Cambridge-Cranfield HPCF > Historical Information > Turing Hitachi S3600 |
Turing was decommissioned at the end of February 1999. The pages on Turing are not being maintained, but are being left accessible for historical interest. They may well have bad links and similar errors.
Assume the executable file run_job contains the following:
#!/bin/sh
set -e
if [ "$1" = '' ]
then
echo Queue name not set
exit 1
fi
if [ "$2" = '' ]
then
echo Job script not set
exit 1
fi
qsub -q $1 <<input
#!/bin/sh
#@\$-me
$2 -$1
input
Assume the executable file job_script contains the following:
#!/bin/sh
set -e
f77 -i,E,U -W0,'hap,opt(o(s),uinline(1))' -o crunch crunch.f
# or: f90 -i,E -W0,'hap,opt(o(s),uinline(1))' -o crunch crunch.f
crunch
rm crunch
Typing 'run_job v10 job_script' on the 3050 workstation will submit a
NQS job to execute the script job_script in queue v10 on
turing.hpcf. When it has finished, it will send a mail message to you
(on the workstation you submitted it from) and leave the standard output and
standard error in files with names like STDIN.o1234 and
STDIN.e1234. See the NQS User's Guide for more information.
The above scripts are only examples, and you can modify them or use different ones to taste. For example, the f77 command could be replaced by make, if you work that way, or by a call to your own F77 command which sets up your preferred options. Similarly, f90 or even cc could be used instead of f77. But the general method of using scripts rather than retyping complex commands is recommended.
Note that the uinline(1) has no effect unless inlining directives are included in your Fortran source. A better method for portable code is to keep them in a separate file, and there is an example of how to do this.
Assume that the files compile_script and run_script contain the following:
#!/bin/sh
set -e
f77 -i,E,U -W0,'hap,opt(o(s),uinline(1,EXT(incon)))' -c IO.f
f77 -i,E,U -W0,'hap,opt(o(s),uinline(1,EXT(incon)))' -c init.f
f77 -i,E,U -W0,'hap,opt(o(s),uinline(1,EXT(incon)))' -c restart.f
f77 -i,E,U -W0,'hap,opt(o(s),uinline(1,EXT(incon)))' -c compute.f
f77 -i,E,U -W0,'hap,opt(o(s),uinline(1,EXT(incon)))' -c main.f
f77 -i,E,U -o crunch IO.o init.o restart.o compute.o main.o
# or the Fortran 90 equivalent commands.
and:
#!/bin/sh
set -e
crunch
Then typing 'run_job s05 compile_script' will
compile and link the program crunch. You would then execute the
program in queue v10 by typing 'run_job v10 run_script'.
Note that the uinline(1,EXT(incon)) takes inlining directives from file incon, and there are examples of this.
open(10,access='direct',form='unformatted',recl=4096,
* maxrec=16384,type='es')
Other than the above, the file can be used like any other Fortran
direct-access file, except that it will disappear when the command finishes,
and that access will be much faster than to disk. The OPEN statement will fail
if there is not enough space available. It is also possible to specify the
size in a configuration file, as well as initialise and save the contents - for
details, see the Fortran 77 or 90 User's Guide.
f77 -i,E,U -W0,'hap,opt(o(s))' crunch.f -lnag
If your program uses the level 0 or level 1 BLAS, you should request that they
are inlined, and there is a file of inlining directives available in the file
/usr/include/NAG_BLAS. For example:
f77 -i,E,U -c \
-W0,'hap,opt(o(s),uinline(1,EXT(/usr/include/NAG_BLAS)))' \
IO.f init.f restart.f compute.f main.f
f77 -i,E,U -o \
crunch IO.o init.o restart.o compute.o main.o -lnag
There is also a file /usr/include/NAG_functions that contains
inlining directives for selected NAG functions (mostly from the S and X
chapters).
To call the Matrix/HAP library, you use the option -lmathe80, which selects the version that is tuned for the S-3600 model 180. The manuals describe other possibilities, but it is unlikely that you will want to use them. For example:
f77 -i,E,U -W0,'hap,opt(o(s))' crunch.f -lmathe80
The -i,E,U is not critical in this case, but it is recommended. To
call the MSL2 library, just use -lmsl2 instead of or in addition to
-lnag or -lmathe80.
Assume the files program.f, epsilon.c and job_script contain the following:
...
double precision x, epsilon
x = epsilon()
...
and
#include \lt;float.h\gt;
double EPSILON (void) {return DBL_EPSILON;}
and
#!/bin/sh
set -e
cc -c -O epsilon.c
f77 -i,E,U -W0,'hap,opt(o(s),uinline(1,EXT(incon)))' -c crinkle.f
f77 -i,E,U -o crinkle crinkle.o epsilon.o
# or: f90 -i,E -W0,'hap,opt(o(s))' -c crinkle.f
# and: f90 -i,E -o crinkle crinkle.o epsilon.o
program
rm program
then typing 'run_job v10 job_script' will
compile and run the mixed language program. You can merge the different
compilation steps (i.e. compile and link in one command) if you prefer. For
larger programs, you should compile and link separately in queue s05
before running the program.
Note that the Fortran function name must be upper-cased in the C code. If you need to call C functions with names in lower or mixed case (including most standard C and UNIX functions), you should use the interface routines documented in the Fortran 77 and Fortran 90 User's Guides (i.e. the other method referred to above). You are strongly advised not to use different -i,... options, because of the difficulty in mixing code with different casing conventions.
All these directives are of the form *VOPTION suboption and must immediately precede the DO loop that they are intended to affect; as with most other vector systems, loops caused by GOTO statements are not recognised. The following are examples of important directives; in all cases, a list of arrays (i.e. (A,B,C)) can be specified where the example shows just (A).
When tuning the NAG library (mainly the BLAS), the directives used were mostly INDEP, some OVLAP(S,) and a couple of OVLAP(ES,) and OVLAP(L,).
The INDEP(A) option states that arrays A and C are `independent' of the loop index I, in the sense that the elements of arrays can be operated on in any order without effecting the result. For example:
*voption indep(a)
DO 10 I = 1,N,K
10 A(I) = A(I)+B(I)
*voption indep(c)
DO 20 I = 1,N
C L is assumed to be non-zero
J = L*I
20 C(J) = C(J)+B(I)
In the case of array A (but not C), the compiler could have
deduced this because K is forbidden to be zero by the Fortran
standard, but the current version uses the same logic for the two cases and so
needs to be told that the subscript is `safe'.
The S-3600 handles indirect vector addressing efficiently, but the compiler needs some help, especially when an indirect reference occurs on the left hand side of an assignment. If the elements of the index vector are distinct, then the INDEP(A) option should be used, as for unknown step sizes. For example:
*voption indep(a)
DO 10 I = 1,N
10 A(L(I)) = A(L(I))+B(I)
If the elements of L are NOT distinct, then using
the INDEP(A) option could give wrong answers. The
OVLAP(S,(A)) option states that fetches from A may overlap a
previous store, but all references to A in the same loop iteration are
for the same value. For example:
*voption ovlap(s,(a))
DO 10 I = 1,N
A(L(I)) = B(I)
10 B(I) = A(L(I))+1.0
The OVLAP(ES,(A)) option is similar, but states that the previous
store always uses an index corresponding to elements that will be fetched at a
later iteration (i.e. L(I) >= M(I)).
OVLAP(L,(A)) is the converse (i.e. with L(I) <
M(I)).
The NOVEC option unconditionally prevents vectorisation, in the rare cases that this is a bad idea or would produce incorrect results; it is unlikely to give much performance gain. The VEC option forces vectorisation under most circumstances, but is definitely a sledgehammer approach. You should avoid it if possible, because of the risk of generating incorrect code.
*uinline utility.f(flip,flop,flap)
*uinline muckle.f(flugga)
The standard form of inlining directive will inline only the simplest and
smallest routines (up to 30 lines), but is fairly safe. You can inline
routines with some forms of COMMON block, DATA statement etc., but you should
read the warnings in the User's Guide before doing this. And you should
check that inlining larger routines will not 'bloat' your code beyond all
reason. To relax the restrictions, use the following form of inlining
directives:
*uinline utility.f(flip,flop,flap),extended
*uinline muckle.f(flugga),extended
These directives will insert the code of the routines FLIP,
FLOP and FLAP from file utility.f and
FLUGGA from file muckle.f (note that the filename must be in
the correct case) everywhere there is a call to them. If these directives are
at the start of a source file (e.g. crunch.f), then they will apply to
the whole of that source file. Unfortunately, there is no shorthand for
specifying a routine from the current source file.
It is much better to use a separate file of inlining directives than to modify your main source, and this is the recommended method, especially when inlining library code such as the BLAS. But do remember that the search path is interpreted when the compiler is run, and not relative to the file of inlining directives. Assume that the file inline.control contains the following directives:
*uinline /users/hd1/fred/includes/inlining(ddot,dxscal)
*uinline my_lib/utility.f(flip,flop,flap)
Then using the following command in your compilation script will cause the
routines DDOT, DXSCAL, FLIP, FLOP and
FLAP to be taken from the specified files and inlined:
f77 -i,E,U -W0,'hap,opt(o(s),uinline(1,ext(inline.control)))' \
-c crinkle.f
Note that you may get confused if you have a routine in a source file and you
specify it in *uinline directive with a different file name; the
inlining directive will USUALLY take precedence. Watch out
for this trap!
You must also compile and link the inlined routines, because the compiler produces an external reference even for routines that are used only in inlined form. And remember that this can cause confusion when you put diagnostics into a routine that is eligible for inlining.
There is also an example of how to inline the BLAS from the NAG library.
#!/bin/sh
set -e
f77 -i,E,U -W0,'hap(diag(2)),opt(o(s),uinline(2)),list(e(0))' \
-c compute.f
# or: f90 -i,E -W0,'hap(diag(2)),opt(o(s)),list(e(0))' \
# -c compute.f
Then typing 'run_job s05 compile_script' will
produce a large number of compilation messages in the output file with a name
like STDIN.e1234. There will be somewhat cryptic, but will say
whether a particular loop was vectorised or a routine call inlined. If these
actions are not happening, and you think that they should, check the User's
Guides for possible explanations and appropriate directives. There are
examples of using vectorisation directives.
If you are using a separate file for inlining directives (which is recommended), you should replace uinline(2) by uinline(2,EXT(inline.control)), where inline.control is the file containing inlining directives.
#!/bin/sh
set -e
f77 -i,E,U -W0,'analyze(c),hap,opt(o(s),uinline(1))' -c compute.f
# or: f90 -i,E -W0,'analyze(c),hap,opt(o(s))' -c compute.f
f77 -i,E,U -o crunch IO.o init.o restart.o compute.o main.o
crunch
rm crunch
Then typing 'run_job v10 job_script' will
submit a job to compile module compute.f for profiling, link it with
other modules and run it. When it runs to normal completion (i.e. it does not
crash), it will produce a file ft.count. This will contain execution
counts down to the statement block level (i.e. consecutive statements without a
branch or loop).
The commands f77ts and f77tv seem to be just front ends to f77 with the analyze option. There is also an analyze option to estimate relative CPU times. These variations are not generally recommended.
As a possibly-extreme, but certainly real-world, example of this distortion, a test code gave the following VPU and CPU times with different profiling / analysis:
| Option | CPU | VPU |
|---|---|---|
| None | 2.837 | 1.467 |
| prof -p | 2.837 | 1.467 |
| prof -g | 3.015 | 1.460 |
| analyze(c) | 3.494 | 1.470 |
| analyze(r) | 10.085 | 1.468 |
Incidentally, this can also be used as a debugging tool. When you are testing code, you can use profiling to check that code that you think is being executed has actually been executed! This is especially useful when checking error and exceptional case handling.
DOUBLE PRECISION X1, X2, V1, V2
CALL XCLOCK
CALL VCLOCK
...
CALL XCLOCK(X1,5)
CALL VCLOCK(V1,5)
CALL CALC
CALL XCLOCK(X2,5)
CALL VCLOCK(V2,5)
WRITE (*,' Scalar time for CALC:',F8.2,' seconds') X2-X1
WRITE (*,' Vector time for CALC:',F8.2,' seconds') V2-V1
Because of their overheads, timing functions should not be inserted between
individual statements. Their results may be misleading for sections of code
that take less than about 100 microseconds to run, and better results will be
obtained if the interval is more like 0.01 seconds.For production work, you may find it more convenient to look at the accounting figures in subdirectories of /usr/local/acct/turing on either of the workstations or turing.hpcf itself. Subdirectory usage contains monthly summaries of major users and commands, giving not just CPU time but vectorisation efficiencies and much else. Subdirectory recent contains the last 3 months' daily summaries, including statistics on the largest jobs that finished during that day.
To insert these checks, you need to compile all relevant routines with a different set of options. Assume job_script contains the following:
#!/bin/sh
set -e
f77 -i,E,U -W0,'testmode(e(1),g,a(2),s)' -o crunch crunch.f
# or: f90 -i,E -W0,'testmode(e(1),g,a(2),s)' -o crunch crunch.f
crunch
rm crunch
Then typing 'run_job s05 job_script' will
compile, link and run the program crunch with all run-time checks
enabled.
Note that Fortran 77 on the Hitachi 3050s, NAG Fortran 90 on all systems that have it and some other compilers have options to include some or all of these checks. Please use these on workstations in preference to turing.hpcf if you need more than occasional debugging.
Currently, the interactive debugger does not support Fortran 90 programs. Experienced hackers may be able to debug Fortran 90 using a C-level debugger, but most users are advised not to bother. You should add diagnostic checking code and WRITE statements instead.If your program has crashed and created a core file, but did not print a traceback, you can usually use the debugger to get one. You need to log in to turing.hpcf by typing the following command from one of lovelace.hpcf or hooke.hpcf:
rlogin turing
After you have logged in, you need to type:
sdb crunch core
t
q
If this is enough information, you should then delete the core file
and log out again. For more detailed interactive debugging, you usually need to
recompile all relevant routines (to create a symbol table), but you do not need
to recompile your whole program. Assume the file job_script contains
the following:
#!/bin/sh
set -e
f77 -i,E,U -g -c compute.f
f77 -i,E,U -o crunch IO.o init.o restart.o compute.o main.o
Then typing 'run_job s05 compile_script' will
compile and link program crunch with symbol tables for module
compute.f. Note that it is impossible to use the interactive debugger
on programs that have been vectorised (i.e. compiled with the
-W0,hap... option), because the two options are incompatible, so you
have to use automatic run-time checks if the problem occurs
only in vectorised code. You should then log in to turing.hpcf
as described above and type:
sdb crunch
The use of sdb is documented in the HI-OSF/1-MJ OSCBASE Application
Programmer's Guide. It is a C debugger adapted for use with Fortran 77, and so
is not as user friendly as it might be. Note that the program file
crunch and any core file must not be changed before running
the sdb command. You can debug code that has no symbol table, but
it falls into the category of advanced hacking and is best avoided.
Interactive debugging is a very inefficient way of using turing.hpcf, so please use a workstation in preference whenever possible. If too much interactive debugging causes performance problems, it may have to be locked out.
For example, to write SR2201 format data to units 10, 11 and 16, you could run it using a command like the following:
crunch -F'runst(cvout(3050r(10,11,16)))'
To read SR2201 format data, use cvin rather than cvout. At
present, it is not possible to read and write SR2201 format data in the same
program.The format used on the SR2201 is big-endian IEEE, which is the same as used on the Motorola 68K (i.e. old Sun, old HP etc.), Sun SPARC, HP PA-RISC, IBM RS/6000 (and SP-2) and PowerPC running AIX (in all cases, except possibly for extended precision floating-point). However, SR2201 Fortran unformatted files cannot be transferred directly to or from those systems, because of differences in the implementation of Fortran unformatted records. A conversion utility should be fairly easy to write - please ask Nick Maclaren for details.
The format is NOT the same as used on the DEC VAX, DEC Alpha (including the Cray T3D), MIPS (i.e. SGI and old DEC), Intel, Cray (i.e. YMP) or PowerPC running Windows NT; data must be converted to human-readable format (i.e. using Fortran formatted I/O statements) for import from and export to these systems. A general conversion utility is impossible.