Thursday, 17 February 2011

Leaving and Joining

Most NGS users should now be aware of the major changes due at the end of March 2011 - when some of the machines providing free-to-use CPU time will be retired.

The NGS cluster at Leeds - ngs.leeds.ac.uk - is among those that have reached the end of their useful life. It will be removed from service on 31 March 2011.

This does not mean that Leeds as a site is dropping off the Grid forever. We will be back...

The NGS cluster is one of a number of systems managed by the Research Computing group within the Central IT Service at Leeds. It is also - by modern standards - one of the smallest. In compute terms, it is dwarfed by the one in the room next door - which boasts 4000 CPUs, 7 Tb of memory and 100 Tb of fast disk - and goes by the name of ARC1.

ARC1 is so big because it is two computer clusters rolled into one.

Around half the cluster was funded by the University for use by local researchers - for whom applications such as DL_POLY, AMBER and CASTEP have been are installed.

The rest comes from a UK-wide consortium of Solar Physicists - so there is a need for people from outside Leeds to safely and securely use the service. Cross site access is why the National Grid Service exists. We can do that.

While primarily aimed at the Sun spotters - Leeds has kindly offered some CPU time on ARC1 to the NGS. If, that is, the NGS can get ARC1 onto the Grid.

If we can... applications installed locally will be made available to external users - where licenses permit. The users were are expecting are those who currently access resources via the UI/WMS.

So what do we need...?
  • A standard way to presenting the applications to the world. We can do that... it is why the Uniform Execution Environment was invented.
  • A way of limiting access to licensed applications to the right people. We can do that too.
  • A means of accepting requests from the UI/WMS. Taking the lead from the particle physics community - we are looking at CREAM as deployed by gLite.
  • An information service that lets people outside see the state of the system and the applications available - the obvious choice here is the BDII, widely deployed and also available from gLite.
  • A way of accounting for use in APEL. We may send data directly or indirectly via the NGS accounting service and RUS records
Whatever we build, will need to work with what is now called Oracle Grid Engine - the local batch management system - and within a highly specialised and customised Linux environment that is significantly different to that used by the World Wide LHC Grid.

As a first stage, we are going to deploy a separate (virtual) machine - running Scientific Linux and using packages from gLite - to act as the link between ARC1 and the grid.

We want to produce - and make available to others - a kickstart configuration and associated scripts that automate as much of the installation as possible.

It will certain produce a blog post or two.

Leeds has been involved in the NGS for a long time: we have hosted two generations of NGS cluster; been involved in the NGS Outreach, Research and Development and Monitoring activities and created automatic installation systems for deploying and configuring Grid software.

We've had a lot of practice. It certainly hasn't made us perfect but, if you will excuse what sounds like Yorkshire pride coming from someone who is technically an Essex Boy - if anyone can do it- Leeds can.

No comments: