History of High Performance Computing in Cambridge

Cambridge-Cranfield HPCF  > Meeting on Clusters

The CCHPCF hosted a brief meeting on the use of clusters for medium to large scale scientific computation at 2 pm on Monday 6 December in room Meeting Room 3 at the Centre for Mathematical Sciences. The original annoucement and expanded agenda for the meeting follows, together with links to copies of some of the presentations.


This event is intended to be a general technical meeting to describe and discuss the issues surrounding the purchase and use of clusters. In this, it is closer to a Techlink meeting than anything else, but the timing was not convenient to arrange it as one. The meeting will therefore not focus on future plans or describing how the CCHPCF, clusters and the Grid will work together.

Nick Maclaren of the CCHPCF will give an overview of the choices and their consequences, concentrating on describing the issues that should be considered when planning the purchase of a cluster or starting a project to use one for high performance computation. Users who are thinking of using clusters are invited to explain what they want to do and either ask questions or make suggestions. The topics will include:
  • What is a Cluster? (and why you might want one)
  • Programming Models (SMP, MPI, farms etc.)
  • Scalability (how many nodes can you make effective use of?)
  • General Design (how most people build clusters)
  • Physical Issues (will it fit, overheat etc.?)
Presentation in Postscript
Tim Cutts from the Sanger Centre, who run the largest clusters in the area, will give a description of some of their experiences.
Presentation in PDF
Gabor Csanyi from Physics will describe his experience with using and managing clusters.
Presentation in PDF
We shall probably break for tea at this point.
Paul Smith of the CCHPCF will describe how and why he set up maxwell, the CCHPCF-run Opteron cluster, and what the CCHPCF is currently investigating for the future. Comments and suggestions would be welcomed.
Presentation in PDF
Nick Maclaren will continue with more technical aspects. The purpose of this is to describe what questions need to be answered, what risks there are, and how to maximise the benefit and minimise the costs, effort and likelihood of failure.
This will contain a lot more questions than answers, and feedback from people who use or manage clusters would be appreciated; there are several areas (such as the performance and usability of the Apple G5 or IBM 970 systems and software). The current list of topics includes:
  • Choice of CPU (which architecture/chip?)
  • Choice of Hardware System (everything around it)
  • Interconnect Issues (not just for MPI programs)
  • I/O and File Serving (a common cause of problems)
  • Wider Area I/O and Networking (security and other things)
  • Operating Systems and Software (including compilers etc.)
  • Job Scheduling etc. (a very brief summary, kept simple)
  • System Support Effort (staying sane and keeping it running)
  • User Support/Advice/Debugging/Tuning etc. (including your own)
  • Nightmare Scenarios (you don't want to end up here)
Presentation in Postscript