Tuesday 30 November 2010

And breathe - it's over for another year

So the NGS Innovation Forum is over for another year and, although it seemed to consume most of my time over the last month, I’ll miss it!

This years event was well received by all those who attended according to the feedback I was given both at the event and afterwards by email. Always nice to know we’re doing the right thing!

The event kicked off on Tuesday with a day focused primarily on our users and I’m glad to say there were some present in the audience. Steven Young gave a brief summary of the Campus Champions who are our eyes and ears in institutions – ready to help users and to feedback comments and suggestions to us. We then moved onto a series of talks about user tools. The aim of this session was to talk through some of the tools that we offer in order that users could head home from the event and actually apply them in their research. The tools covered were –

The day also featured three presentations from users who make a great deal of use of the NGS resources. We had presentations from a variety of research areas to demonstrate just how widely used our resource are. Luke Rendell from St. Andrews University talked about simulating learning strategies, Zhongwei Guan talked about modelling composite structures and Narcis Fernandes-Fuentes talked about using the NGS for early stage drug discovery. A bit of a range of uses!

Day 1 was really good with lots of questions and discussion which continued right the way from the last session through the drinks reception and poster viewing until the end of the event dinner!

Wednesday was aimed primarily at IT staff, sys admins etc so there were a few new faces on this day. In order to bring everyone up to speed, David Wallom re-capped the discussion from findings the day before. We then kicked off with a presentation from the University of Westminster who have been a NGS member for some time before moving onto a discussion session about how the NGS can help to facilitate collaboration between researchers and institutions.

Presentations on two NGS projects followed – accessing the NGS with Shibboleth and updates to the NGS accounting provision. The last session was dedicated to the EU with an update from the EGI Director, Steven Newhouse followed by presentations from two ESFRI projects – CLARIN and ELIXIR.

An exhausting couple of days but well worth it!

From an outreach point of view I’m now busy organising a couple of new roadshow events that people requested during the IF, I’m gathering the presentations from the event to go on the NGS website (watch this space!) and announcing the winner of the best poster at the event.

Congratulations to Jarmila Husby from the School of Pharmacy, University of London whose poster “Molecular Modelling Studies of the STAT3β homodimer:DNA complex” was voted the winner by the delegates. Jarmila won an Amazon voucher which is very handy with Christmas coming up! All the posters from the event will also be on the website soon.

If you missed the event there are a number of ways to catch up – the Twitter posts are available, a blog post from Catherine Gater of EGI, an article on Cloud computing from Simon Hettrick at SSI and photos from the event are available on the NGS Flickr account.

Thank you once again to all those who attended and hopefully we’ll see you all next year!

Monday 29 November 2010

Innovation writeup part 1

Another successful NGS Innovation Forum. It was good to hear Real Users™ stand up and say they love the NGS (no, really!) and to tell us about all the interesting work they are doing. (Slides should appear on the agenda page shortly, Gillian is busy chasing people.)

Highlights will always be a personal choice - Jason already mentioned CVMFS. There are many interesting bits one could mention, so let's focus on one in this post: authentication.

We demonstrated the CertWizard on behalf of the dev team, and despite being a live demo, it was 100% successful. This tool will make it much easier to manage certificates: browsers were built for a lot of different things, including e-commerce, so managing certificates with browsers can be challenging. Managing credentials with this tool will be much easier, and even fun.

Speaking of easy credentials, Mike talked about Shibboleth access to the NGS (aka SARoNGS). SARoNGS is not new, but it is still changing access to services. For example, we have demonstrated login to Jason's nodes in Leeds using SARoNGS credentials.

Finally, it is worth mentioning that we are collaborating with JANET on demonstrating Project Moonshot. This project is again about federated access but at a "lower" layer than Shibboleth - Shib is very web (or HTTP) oriented which is very useful, but Moonshot aims at other services like ssh (or at the Moon.) Expect more blog posts as we make progress.

Ultimately all this authentication stuff should benefit all end users who will have a choice of how to access their services.

Sunday 28 November 2010

Software distribution by squid

Last week saw the NGS Innovation Forum. Many of the people who do the NGS's Research and Development work were involved in the forum which, ironically, left us very little time to do any actual innovating in the last week or so.

So... this post will be about something our colleagues in GridPP are working on - and which was discussed at a gathering of UK High Energy Physics System Managers early last week.

The NGS had been invited to the gathering to talk Nagios and monitoring, Other presentations covered the use of sofware from CERN called CVMFS.

CVMFS is interesting approach of delivering software efficiently - by combining the idea of Content Addressable Storage with the World Wide Web's capacity to bring data close to where it is needed. There is a very detailed technical report available from CERN and a twitter feed but little of what could be thought of as public documentation.

To understand why CVMFS is so appealing to GridPP, you need to understand their users.

The use of GridPP systems is very different from that of systems elsewhere in the NGS. They provide a lot of compute power, handle a mind-blowingly-huge amount of data - but deploy a comparatively small range of applications software, albeit on a large number of machines.

It is vital that the software used to analyse data from the major experiments at CERN be available everywhere where that data will be analysed. In the past, special deployment jobs were run for this purpose.

CVMFS is an alternative approach. It was sprung from a CERN project to deploy virtual images and the need to keep the images small.

In CVMFS, files are deployed from a single central source. When a file is needed, it is copied to a local disk and read from there. No file is copied more than once and copies are stashed 'nearby' in case another nearby machine needs them.

The caching and stashing is made possible by referring to a file by the SHA1 hash of its contents - hence content addressable storage - and putting it on a web server under a name derived from the hash.

The server provides a catalogue, translating from filenames to hashes. If the same file appears more than once - within an application or within different releases of the same application - it will be represented by the same hash-related-filename on the server.

CVMFS uses this with the Filesystem in Userspace feature of Linux - aka FUSE - to present a user with something looks like any other directory.

Behind the scenes, requests are made to the central server via a local Squid web proxy cache. Squid is designed to collect files from the web on behalf of clients, store copies as it does so and deliver the copy where-ever possible. It is very, very good at this.

There are quirks: the first time as file is needed by a site, access will be slow although all subsequent attempts to use it will be much faster.

As long as a site has enough local disk space and a nice big squid, CVMFS can deliver software to where it is needed, when it is needed.

Sunday 21 November 2010

Failing more succesfully - getting past Maradona and Condor

It has been nearly a month since the last progress report on Nagios. Which is a shame, because in that time we have made something that looks rather like progress.

The NGS's development Nagios server was at the point where it was throwing tests at NGS partner sites.

The simpler tests - for things such as service certificates reaching their expiry date - are working.

We have have less success with the more sophisticated tests - like those that poke every nook and cranny of a Compute Element.

A few sites - notably those in Scotgrid - are accepting the tests and running them to completion but we only see part of the results. For others sites we get the infamous
Standard output does not contain useful data.Cannot read JobWrapper output, both from Condor and from Maradona.
error message.

In both cases, the same test - the CE-probe - is involved. This is thrown at all sites that advertise Compute Elements in the GOCDB database of all things griddy.

This test makes use of the Nagios concepts of active and passive tests. In an active test, the Nagios service runs some bit of code and expects that bit of code to provide a result. In a passive test, there is no explicit test code and results are fed in by whatever means necessary.

The CE-probe appears within Nagios as one active test and a whole raft of passive ones. The active test delivers a bundle of tests to the site - via a Workload Management service (WMS) - and checks on its progress. At various stages in the life of the bundle, the passive tests results are updated.

Some passive tests results are generated from the Nagios server itself; others are sent directly from the system under test via the next available Message Bus.

When the bundle of tests runs successfully, we see the results generated from within the nagios server but not those coming from the message bus. This is because the development service uses a message broker that sits outside the core set of brokers used by WLCG. A workaround for this is coming any day now.

The Maradona message appears when the bundle of tests doesn't run at all.

It is a by-product of the script generated within the WMS and sent on to the site and, in particular, how this script handles 'Shallow' resubmission.

A shallow failure is one where the job is rejected and can be tried elsewhere. The WMS touts the job around the grid until it finds a system prepared to accept it. Acceptance is signified by the deletion of a marker file using GridFTP.

Which is all very well, as long as the machine on which the script is running has software that is able to delete a file using GridFTP.

gLite-based systems usually have something , those using the NGS VDT based installer do not. If this step fails, the script gives up early and prints the Maradona message.

A VDT based system can be persuaded to run the WMS-generated script by installing the UberFTP tool using
  pacman -get http://vdt.cs.wisc.edu/vdt_181_cache:UberFTP

(Pick a different cache if you are using something other than the elderly version 1.8.1 of VDT.)

UberFTP provides enough GridFTP support to allow the bundle of tests to run - though we have yet to persuade them to run to completion. I would call that a more successful failure.

Anyone attending the HEPSYSMAN meeting in Birmingham on 22 November will have the opportunity to hear, and ask questions, about what we needed to do to persuade WLCG nagios to work on the weirder bits of the NGS.

[Edit 2010-11-24 fixing typos]

Tuesday 16 November 2010

Unable to attend the NGS IF10?

If you are unable to attend the NGS Innovation Forum next week, we hope to keep you up to date with interesting points, discussions etc through the medium of Twitter and this blog!

I will be encouraging all delegates to Twitter with the tag #ngsif10 and of course the regular bloggers will hopefully be in action on here.

The presentations from the event will be available on the NGS website after the event along with pdfs of the posters from the Tuesday evening poster session.

If there is anything else you would like to see us do to keep you up to speed with developments at the event then please let us know!

Saturday 13 November 2010

Escape from NeSCForge

NeSCForge - home to the NGS's collection of software, documentation and training material - is officially doomed.

On 20 December 2010, the service will be turned off: there simply isn't the money available to keep it running.

NeSCForge has long provided our version control repository. The code we developed to simply the deployment of grid software lives there, as does the Myproxy enabled GSISSH and the accounting clients.

We needed to find a new home for our software sharpish - and we didn't want to break anything when we did. In particular, we wanted to retain the code and the history of changes in our CVS repository.

Which made the decision about where to go, very easy.

The only public software hosting service which provides CVS support is SourceForge. On 7 November, the UKNGI project joined SourceForge.

Why UKNGI? Partially because we are becoming part of the UK National Grid Initiative, but mostly because the NGS name was already taken.

The next stage is moving all our data and - with perfect timing - the Software Sustainability Institute has come to galloping to the rescue. They have recently extended their collection of guides for developers to include:
That covers what we need to do nicely.

Earlier today - following those last two guides - we copied the CVS repository from NeSCForge to its new home on SourceForge - with all branches and tags and other version control stuff intact. We have also added Subversion and Git based repositories which we expect to use for future development.

The software releases and other files will be moving soon and we can allow NeSCForge to retire gracefully.

Friday 12 November 2010

Notes on validating XML signatures

Technical brain dump - left on the NGS blog in case it is useful to anyone. There will be a proper R+D posting along shortly.

We have been investigating a problem with SARoNGS and Shibboleth that is similar, but not identical, to the XML signature problems covered in an earlier posting.

As in that earlier case, we are being sent a lump of XML within which is:
  • some data,
  • a certificate
  • a digital signature for the data generated from the key matching the certificate.
Unlike the earlier case, there is no known bug in the code that generates the XML - yet something deep within SARoNGS was refusing to accept the data.

We suspected that the XML had been mangled - but needed to prove it.

After much fiddling and searching, we dug up a useful one-line command to check the signature without needing the whole of Shibboleth.

First, catch your assertion. This is left as an exercise for the reader.

Next, verify the signature by running:
  xmlsec1 verify --id-attr:AssertionID Assertion shibdata.xml
Where shibdata.xml here is a file containing the assertion.

An explaination... ignoring namespaces, the digital signature consists of a block within which there is a <Reference> element along the lines of
 <Reference URI="#_39e459384b39f1ddce64e11c58155abc">
The URI is meant to point you at the bit of XML that has actually been signed. The code expects to find an attribute
   ID="_39e459384b39f1ddce64e11c58155abc"
attached to that element.

In this case there is no such attribute. There is, however, an AssertionID attribute with exactly that value.

Which is why we need that odd looking --id-attr option. It explicitly tells the program to use AssertionID within a Assertion element when searching for the signature.

Thursday 11 November 2010

The NGS at the centre of the universe?

I've recently been working on a number of case studies with several NGS users to highlight the different ways that the NGS is used in many different research areas.

The first case study from this batch has recently been released and is now available on the NGS website case study section. This case study highlights the work of Cristiano Sabiu from the University of Portsmouth who used the NGS to analyse the distribution of galaxies in the universe.

Cristiano made use of the freely available Gadget2 code which is installed on the NGS STFC RAL site and ran 20 full scale simulations which required approximately 100,000 cpu hours.

To find out more about Cristianos research see our user case study page. More user case studies are in the pipeline so watch this space!

Tuesday 9 November 2010

NGS Cloud at the NGS Innovation Forum

We're on the home straight for the NGS Innovation Forum. Registration closes this Friday (12th Nov) at 4.30pm sharp so if you want to attend and haven't yet registered, you had better do so soon!

All our previous NGS innovation forums have featured breakout sessions with groups reporting back to the meeting as a whole. This year we will have our breakout sessions as usual but we would like to flag one up early!

We will be having a break out session on the new NGS Cloud prototype so this would be an ideal opportunity for current users of our cloud service to feedback directly their experiences to NGS staff. It would also be an excellent opportunity for anyone interested in the NGS cloud prototype to find out more.

Remember you don't have to attend both days of the event, delegates are welcome to attend either day as a single day.

Friday 5 November 2010

Who do we think you are?

This posting is going deep into the innards of Grid software.

Think of it as a computer programmer's version of Inside Nature's Giants - a wonderful example of TV science but not necessary suitable for watching over dinner. So before we are get out the (metaphorical) scalpels, I want to explain why we need to do this.

The NGS provides the SARoNGS service - that provides certificates to people using their institutional credentials and store these in a MyProxy server.

We have developed the Myproxy enabled GSISSH to give users command line access to a grid compute service from any SSH client - this reads credentials from a MyProxy server.

By linking SARoNGS and Myproxy-enabled GSISSH, using the ability to create accounts on demand and opening the service to anyone in the UK Access Management Federation, it would be possible to provide such a service anyone in the UK academic community who needed it.

The big practical problem with this plan - and the one most likely to give your IT security people nightmares - is stopping this service being abused.

The missing link is the ability to provide very restricted access to users who are being nosy - enough to prove that it can be done, not enough to do anything - and full access to ones who have signed up to a suitable acceptable use policy.

Non-technical people can look away now...

If you offer a service that runs actual real programs on behalf of actual real grid users, then at some point you are going to be handed a blob of data that contains:

  • A user proxy certificate - with possible added Virtual Organisation membership - that gives your service rights to pretend to be that user.
  • A description of what it is that the user wants to run.

For services such as Globus GSI-OpenSSH and GRAM you need to associate the proxy certificate with an account on a compute service. The account will be used when running anything on behalf of the user.

This sounds simple. Lots of things about Grid computing sound simple.

This particular problem fails to be simple because there are many, many different ways by which the users proxy certificate can be delivered.

For GSI-OpenSSH, delivery is left to the Generic Security Service (GSS). Technical details can be found on Globus development webpages.

The code that provides GSS authentication plays a complicated game of network ping-pong as client and server bounce messages at one another until they come to a mutual agreement or give up trying. The people behind the Heimdal project have bravely attempted to explain how it works on their blog.

At the end of the game, the credentials are delivered to Globus in the form of a 'context' stored in a variable of type gss_ctx_id_t.

There is a function within the Globus libraries called globus_gss_assist_map_and_authorize that uses this context, feeds it to whatever authorization mechanism is used locally and returns a local user account.

globus_gss_assist_map_and_authorize is used in both the Globus GRAM gatekeeper and GSI-OpenSSH but does not seem to be part of the official application programming interface.

It will either look up the user in the Globus gridmap file or call out to an external authorization service such as LCAS/LCMAPS. The exact behaviour depends on environment variables and configuration files.

MyProxy-enabled GSISSH does this mapping by running the Unix id command as the user via the proper gsissh command. This is not going to work if the user is not allowed to run the id command.

We would like to be able to replace the gsissh step by a stand alone program that does the mapping in the same way as gsissh when presented with the same environment as gsissh.

Luckily, we have the basis of this program buried in another NGS project - integrating LCAS/LCMAPS with Globus webservices - which was put on hold several years ago. The developer left his work in the source code repository at NeSCForge.

http://forge.nesc.ac.uk/cgi-bin/cvsweb.cgi/lcas-lcmaps/gt4ws_lcas_lcmaps_callout/src/c/?cvsroot=ngs#dirlist

The idea that code and code history is valuable in itself has been mentioned before in this blog and in much more prestigious publications before - and this applies even if the code was never finished.

We have one more problem to overcome. NeSCForge will be closing down on 20 December and we are not going to lose our source code when it does. The details of exactly how we will save our code will have to wait for another day and another posting.

[With thanks to Robert Frank at Manchester]

Tuesday 2 November 2010

NGS Innovation Forum incoming!

We really are now in the run up to the third Innovation Forum which will take place on the 23rd - 24th November.

The final speaker for the event was announced a few weeks ago and we are pleased to welcome Andrew Lyall from the European Bioinformatics Institute who will be speaking about the European project ELIXIR.

We will also be providing a "Roaming RA" service at the Innovation Forum so if you would like to use the NGS but do not have a RA at your local institution from whom to obtain a grid certificate, come along to the IF! Not only will you obtain a grid certificate but you will also be able to meet NGS staff and hear about tools which will be of great use to you in running jobs. For more information on how to obtain a certificate at the event, see the instructions on our website.

If you are unable to attend the event you can hopefully follow us on Twitter. We will have a special tag for the event #ngsif10 so look out for this on your Twitter feeds. If you happen to Twitter about the NGS in general please use our tag #ukngs. Help us spread the word!

Remember registration closes on the 12th of November!