Tuesday 29 March 2011

Getting ready for AHM 2011

Incase you aren't aware the dates for this year's UK e-Science All Hands Meeting 2011 have been announced as the 26th - 29th of September. It will be held at the University of York and a Call for Papers has now been issued.

The main themes will be shared infrastructures, using the cloud in research, end-user engagement and applications (e-science, e-social science, research in the arts and humanities) ensuring an all encompassing meeting with something for everyone.

There are a number of themes to which papers can be submitted:

* Theme 1: Cloud computing for e-Research
* Theme 2: Shared Infrastructures, Systems and Tools for e-Research
* Theme 3: Applications and end-user engagement in e-Research
* Theme 4: Data-intensive Research
* Theme 5: Organisation, Trust, Security and Validation
* General Paper

As in previous years, authors of selected abstracts will be invited to submit full papers after AHM 2011 to be considered for inclusion in a special edition of a journal. The deadline for papers is the 23rd of May 2011.

There is also a call for workshops and tutorials as there is scheduled time for a limited number of half day workshops. The deadline for workshop proposals is 3rd April 2011 - please see the AHM website for further details.

The Problems of Pilots and Pools

The Grid relies on users having a unique identity, represented by their certificates.

Which is all very well until you actually have to run something. At this point, the certificate must be mapped to a set of local credentials - on a Unix system this will be a username and a set of groups.

There is no reason why your local credentials will be the same on different hosts or even that they will be the same on different worker nodes within the same compute cluster - especially where pool accounts meet pilot jobs.

A compute service cannot support a large virtual organisation by giving every single member his or her own account - especially if the bulk of the members will never come near the service. On practical grounds, it is more common to set aside pools of accounts and hand them out on a first-come-first-serve basis.

Pilot jobs are widely used in the particle physics world. The sole purpose of a pilot job is to find a big enough chunk of compute power and then and only then find something useful to do with it.

Users submit tasks which are kept on a central queue. When a pilot job runs, it will pick a task from the queue, magically become the task's owner and perform the task.

The magic is provided by a program called glExec which itself depends on the LCAS/LCMAPS framework.

In some of its early incarnations, LCAS/LCMAPS was configured so that every worker node has a separate pools of accounts - and the 'real' user of a pilot job usually ended up with one of these per-worker pools accounts.

These days, it is more common for glExec to likely to pass on requests to a central authorization service.

The ARGUS service, currently being tested by Southgrid at Oxford, is the latest generation of central authorization service. Its behaviour and quirks were covered at a presentation by Kashif Mohammed at a recent NGS Surgery.

Kashif's slides describe how the components of the authorization framework can control access for pilot jobs. The same service can also centrally manage the mapping of certificates to local credentials through an Obligation Handler (OH).

This isn't just relevant to pilot-and-pool-pushing particle physicists. glExec is also provides authorization to the CREAM compute element and we plan to use glExec and ARGUS to centrally manage the mapping of credentials when CREAM is deployed in front of as Leeds' ARC1 cluster.

Wednesday 23 March 2011

Read all about it!

A new edition of the NGS Quarterly newsletter is out now on our website. Browse whilst drinking your cuppa or download and read on the train!

This edition includes a range of articles on topics such as the imminent changes to the NGS (make sure you read this before the 31st of March if you are a user!), using NX rather than GSISSH to securely access GUI's, tips and hints for using our ever popular UI/WMS and much more.

All editions of the newsletter are available from the newsletter section of the NGS website.

In addition to the newsletter, we also have a new user case study up on the NGS website. This time we have a case study on how NGS resources were used to model the climate impact of aircraft emissions. This research was carried out by Laura Wilcox at the University of Reading who ran FORTRAN code on several NGS sites to complete her models. All the details of Laura's research can be found on the case study page.

Friday 18 March 2011

Are we there yet?

I apologise in advance for this weeks post. I'm afraid that the R+D part of the NGS blog never misses an opportunity for a really lame joke and isn't going to change its ways now.

There are many variants on this one...

An unfortunate traveller took the wrong turn and found himself one of those parts of the world where road-signs - and, indeed, roads - are regarded as a modern fad which will never catch on.

After many hours of driving, he finally saw another person, a genuine Local - possibly leaning over a farm gate and chewing a piece of corn - and asked the way back to the big city. Let's say, for this version, that city was London....

`London?', the Local said, raising an eyebrow.

The Traveller nodded.

`London, you say?'

The Traveller nodded again.

`Well,' continued the Local, scratching his chin, `well.... if I were going to London.... I wouldn't start from here.'

Which takes us nicely to the Unified Middleware Development (UMD) Roadmap discussed in a presentation to an NGS Surgery meeting on 16 March.

It is fair to say that - if they had the choice - the UMD developers at the European Middleware Institute wouldn't start from here either.

They have been given the task of unifying four distinct families of grid software - ARC, gLite, UNICORE and dCache - and making the finished product work with other packages such as Globus.

The roadmap is a high level overview: it doesn't go into detail and describes the components and their dependencies using what seem to be UML Deployment Diagrams.

Yet behind the high level plans - there is concrete and practical work being done. There are regular automated builds being performed using CERN's ETICS system and work is underway on the first proper release - dubbed EMI-1.

The long term aim is for the work of the EMI to be adopted and deployed by the European Grid Initiative.

And - maybe - one of these days we will stop talking about the Middleware and start talking about what we can do with it.

Wednesday 16 March 2011

NGS at the JISC conference

On Monday several of the NGS staff headed over to Liverpool for the annual JISC conference which was held in the rather impressive BT Convention Centre on the river front.
I was there early to get our stand set up and, once our furniture turned up and sticky stuff was obtained for the posters (even though we had been told the poster boards were velcro compatible!), we were ready to go.

It was good to see so many familiar faces at the event even though for us the JISC conference is a bit of an "oddity" as it doesn't contain a huge number of our target audience. We had a chance to catch up with some collaborators from different parts of the country including Northern Ireland and Scotland and of course our JISC funders.

Tuesday morning saw our parallel session take place entitled "Increasing research efficiency through the NGS" with two gues speakers. Our technical director, David Wallom, opened the session with a brief presentation on what the NGS is / does and also covered our Cloud resources which are definitely a hot topic at the moment!

Our guest speakers were Ian Dunlop, University of Manchester, who presented "e-Infrastructure for Social Science data: Obesity e-Lab & MethodBox" and Susana Sansone, University of Oxford, who presented "ISA Infrastructure - Standards and Software for Annotating, Managing and Sharing Life Science Investigations". Copies of both their presentations are available from the "goodie bag" page for the NGS session.

Full details of the JISC annual conference including all the presentations from the event can be found on the JISC conference event page.

Monday 14 March 2011

Loaded

The most popular page on the NGS web site is the load monitor - which shows the current number of running jobs on selected NGS Partner sites as a set of moving, coloured bars.

We don't think this is because the red, yellow and green graphics look pretty. Many of our users have adopted a simple, effective - and low-tech - approach to scheduling jobs: they have a quick look at the load monitor page and do their work on the least loaded machine.

This is the human-powered counterpart of what the WMS bit of the UI/WMS does.

The load monitor is a nice example of how to present the information that is routinely published by a site on a Grid and defined by a GLUE Schema.

GLUE - in an egregious example of acronym abuse - is meant to stand for Grid Laboratory Uniform Environment. The reality is that it is called GLUE because it is what sticks the Grid together.

As any good Grid standard should, GLUE has its own Working Group, GLUE-WG, and proper published formal specifications. The current version is GLUE 2.0 but its predecessor, GLUE 1.3, is more widely deployed.

The load monitor is a visualisation of two pieces of (GLUE 1.3 style) information, presented to the world as
  • GlueCEStateFreeCPUs
  • GlueCEStateTotalCPUs
which, if you ignore the CamelCase-naming scheme and the GlueCEState prefix, are fairly self-explanatory. They are published for every compute element.

The pretty dancing bars are generated using jsProgressBar.

This is not new: the load monitor has been running for as long as the current version of the NGS web site and - before then - researchers used and abused a central Ganglia service for very much the same purpose.

It has back as a Research and Development activity because the NGS is changing.

The hard bit isn't the calculation of the system load or the pretty graphics: it is deciding which sites and compute elements should appear.

The current version is intimately entangled with the INCA monitoring service - the list of hosts is extracted from from a configuration file built for INCA.

INCA - as anyone with a high-enough tolerance of tedium to read the NGS R+D blog regularly will know - is being decommissioned as soon as the Nagios service is ready to replace it and we decided - late last year - to stop updating INCA.

This list has is becoming out of date: it includes a number of machines will disappear from the NGS soon and misses many others which should be there.

We are rewriting the load monitor to:
  • Select compute elements from information sucked from our Single Point of Truth - the GOCDB.
  • Filter out only those which support the ngs.ac.uk Virtual Organisation using the snappily-named`GlueCEAccessControlBaseRule' attribute defined by GLUE, and published by the sites.
and use it to generate a list of active Compute Elements in sites.

We can calculate the load on each of these compute elements at regular intervals and present it to the world in as pretty, colourful, wobbling bars. It is the Web 2.0-way.

Thursday 10 March 2011

More castles, more discussion but less lightning

Following on from my previous post, day two of the CW11 event kicked off with pastries and caffeine to get us ready for the day ahead and more discussion. Once again the agenda was debated and break out session topics were discussed before a presentation by one of the sponsors of the event, DevCSI which has nothing to do with the infamous TV series set in Las Vegas…

I attended the breakout session on the dissemination of “academic” research software results and ended up scribing for the session which was chaired by the NGS Technical Director, David Wallom, who is a hard person to keep up with!

The basis for this session was discussing where software developers could publish their efforts in a peer reviewed, “reference-able” journal. Nature Methods was held up as a great example but in a different field so could we do something similar for software developers so they could contribute to the RAE and improve their standing in the university etc. It was a very interesting discussion with a lot of bridges to cross before anything concrete can happen but watch this space. Hopefully the slides from this session will be available soon.

After coffee we had a lively reporting back session from all the breakout sessions which also included “Research funding 2011: what's available and how to get it?” and “We have data-management plans, should we have software management plans too?”. The slides from these sessions are available now.

After lunch the audience was divided into 3 groups to discuss the same topic – “Ideas for improving Collaborations” and we were asked to report back with our findings after an hour. The discussion group I was in concentrated more on why’s and why not of collaborating rather than the practical means of doing so that I am usually involved in e.g. Access Grid meetings, collaborative technology such as Skype, blogs etc. However it was still a lively and engaging discussion!

This was the last discussion session of the day and all that remained was for Neil Chue Hong to sum up the past 2 days, for many people to be thanked for their great organisations skills (Simon Hettrick!) and for us all to make our way home by plane, train and automobile.

A great two days yet again and I’m already looking forward to the next one. The SSI is certainly going to be busy following up on all the actions from the meeting and I look forward to seeing their outcomes throughout the next 12 months.

Tuesday 8 March 2011

Castles, case studies and lightning

Last week I attended the Software Sustainability Institutes Collaboration Workshop in Edinburgh. As with previous meetings this event was very enjoyable with a great deal of discussion, actions and collaboration taking part.

The organisers had revamped the programme for this year’s event and one of the changes was the inclusion of “lightning talks”. The set up was interesting with each presenter having 5 minutes and one slide which shared the screen with a rather large countdown clock. It certainly kept the presenters on their toes and the attention of the audience! Unfortunately given the nature of the presentations you really had to be there to appreciate the talks but some presenters did manage to get a lot of info on their slides. In particular the slides from Taverna, STFC, software preservation, soundsoftware.ac.uk, e-science central, SPRINT, Hartree Institute and Youshare.ac.uk are worth a look. I have linked to the websites for each project where they exist but for the slides, visit the materials section.

After the a lovely lunch it was time to start what I think is the most enjoyable part of the workshops – the breakout sessions. Before the event, delegates have the opportunity to submit topics that they would like to discuss at the breakout sessions and then at the event people vote on what topics they would like to discuss. The delegates break into groups and disappear off to various parts of the building to discuss the topic. Each group has a chair, a scribe and someone who “volunteers” to report back but quite often people take on two of these roles.

I attended the session on preparing case studies to support cases for funding which was rather apt as I prepare the NGS user case studies. There were about 7 of us from a variety of backgrounds including funding council reps and the discussion covered areas such as different types of case studies, ways that case studies could contribute to REF, how to format case studies and much more. It was a very interesting hour and we did come up with several suggestions for further work for the SSI.

The second breakout session of the day took place after some caffeine had been consumed and this time I attended the session on “measuring the research impact of software”. Unfortunately the notes from this aren’t up on the SSI website yet but when they are I will make sure I update the link!

So that was day 1 at the CW11. I’m sure you don’t want to hear about the conference dinner so I’ll leave that out and write up day 2 very soon!

Monday 7 March 2011

Portals, Proxies and making things simpler

The NGS provides the services needed centrally to run a Grid. We provide instructions for institutions that wish to contribute to a Grid and resources for researchers who want access.

We have to leave the interesting bit - connecting the services, institutions and researchers in new ways - to others.

One of the people doing the interesting is Mark Hewitt of the Department of Computer Science at the University of York - who was involved in a project to allow non-technical users to link tasks on the grid together in workflows.

In this guest post, Mark describes how they did it...

The P-Grade portal is a web-based portal system developed by SZTAKI - the Computer and Automation Research Institute of the Hungarian Academy of Sciences - in collaboration with the Centre for Parallel Computing at the University of Westminster.

P-Grade provides a generic job submission system using workflows and can submit to any cluster with the standard NGS Globus stack installation.

The White Rose Grid e-Science Centre deployed P-Grade at York for the use of researchers at York, Leeds and Sheffield. Many users find managing certificates complicated, so we decided to use the NGS SARoNGS system as an alternative.

Integration of SARoNGS with P-Grade was quite straightforward. P-Grade must download a proxy certificate from a proxy server before it can start work. SARoNGS creates proxy certificates, generates username and password and provides a mechanism by which these can be passed to a web page. We were able to modify the P-Grade code, to allow a user to download a SARoNGS certificate, instead of having to use the myproxy tool to upload their e-Science digital certificate.

The addition of SARoNGS support to the portal was vital as it completely removed the need for users to go through the process of signing up for a digital certificate and meant that they could authenticate simply through the portal interface within seconds.

Tuesday 1 March 2011

Teacher's pet

As well as serving the needs of the UK academic research community, the NGS is also involved in helping to train the upcoming generation of new researchers.

NGS resources are used in a number of university courses including those at Cranfield University and the University of Edinburgh. In a new article on the NGS website, David Fergusson who leads training at the NGS, explains how he has used NGS resources in his teaching of the MSc in Distributed Scientific Computing for the last 6 years.

If you would like further information on how grid and cloud computing can be incorporated into your teaching, then contact the NGS helpdesk.