Friday, 25 June 2010

HARC back

It is tempting, when writing a 'Research and Development' blog posting, to focus on tasks that have been completed. It is even more tempting to focus on those that have been completed successfully.

But, it is often said that...
If we knew what it was we were doing, it wouldn't be called research, would it?
(Pages scattered across The Internet attribute this quote to Einstein but are somewhat vague about when he said it and to whom.)

So this week's posting will focus on a service that we are trying to get working. Or to be more accurate, trying to get working again.

The service is HARC - the Highly Available Resource Co-allocator and it was first deployed more than three years ago.

HARC provides a way to co-ordinate reservations of time on many separate computers and networks so they can all be used together. The technology was used by the Grid Enabled Neurosurgical Imaging Using Simulation project (aka GENIUS). You can read more about their work in the Real-time visualisation of blood flow through the brain case study on the NGS web site.

HARC uses the idea of 'Paxos consensus', which I am not going to embarrass myself by trying to explain. The distinguished computer scientist Leslie Lamport who created it, first tried to explain it by analogy to a part time parliament on an ancient Greek island. Apparently this confused more people than it helped so his second attempt was called Paxos made Simple.

As far as HARC is concerned, the important feature is that you pass on your request for time to a set of acceptors and these communicate with the resources on your behalf to make a reservation.

Through the power of Paxos, the acceptors can reliably respond to any request even if some of them disappear off the network while the request is being processed.

When the NGS first deployed HARC, we used acceptors maintained and run by Lousiana State University's Centre for Computation and Technology

These acceptors served us very well for many years but - as LSU staff moved on and the service was less heavily used - they slowly disappeared. Paxos allowed us to survive until the last acceptors started to fail. At this point, we went from a Highly Available Resource Co-allocator to a Hardly Available Resource Co-allocator.

We have to thank all those at LSU who provided the service and kept it going as long as it did but it was clear that if the NGS wanted to provide a HARC service, we needed to have our own network of acceptors.

We were fortunate that one of the NGS staff at Manchester was heavily involved in the initial HARC development. He was able to deploy a single NGS acceptor - enough to keep a service ticking over but not enough to provide all the Paxos goodness.

Oxford eResearch centre stepped in and offered to host a second acceptor. With the help of Manchester, the software was deployed to Oxford.

Both acceptors worked in isolation but we tried to get them to cooperate, both disappeared off the network. That does not really fit the definition of 'highly available'.

And this is why we are researching and developing.

As a first step, Manchester are deploying a pair of acceptors locally to investigate whatever weird combination of factors made it all go so horribly wrong.

While they do, Oxford are hosting a standalone acceptor - available for anyone who needs HARC. Anyone with a copy of the HARC client software can point it at the NGS acceptor by putting the following in the file:

# The global V2.0 acceptor set, good for co-allocating UK NGS,
### harc.client.acceptor.global2.vidar=
(Note that Manchester's acceptor is commented out in that fragment.)

When - and if - we solve the problem, we'll let you know.

Wednesday, 23 June 2010

Grid certificates drive you mad?

We know that certificates can be a bone of contention with our users but we have a host of ways to help you manage them.

The latest in these is the release of our new series of certificate leaflets. There are 3 in total and are entitled -
  • applying, retrieving and renewing certificates
  • looking after your certificate
  • certificate obligations
They can all be downloaded from our poster and leaflet section of the NGS website.

As well as the leaflets we also have our Certificate Wizard which you can read more about here. We hope you find the leaflets useful and please let us know what you think of them!

Monday, 21 June 2010

Keine Hexerei

It was good to see demos at the recent Open Grid Forum (OGF) in Chicago.

There was an OpenNebula deployed platform with associated cloud storage resources. The former used OGF Open Cloud Computing Interface (OCCI), and the latter used the Storage Networking Industry Association's (SNIA) Cloud Data Management Interface (CDMI). If I understood it right. It certainly looked nice. SNIA now have a reference implementation of CDMI - open source, too.

The grid filesystem demo worked fine - until they tried to show the resilience features . But they had a recording of a successful one. Seeing is believing.

In today's griddy and cloudy world, your work is somewhere else. To demonstrate your work to your colleagues, you need to access it. But conference networks may be flaky. Your laptop may suddenly decide to upgrade or virus check itself. You may expect to have a power socket nearby but you don't, and then you run out of battery. Of course you test it the day before but when it really matters, it doesn't work.

Jens' third law of computing: ``A succesful demo is indistinguishable from a rigged one.'' This is well known, of course: Arthur C. Clarke said "any sufficiently advanced technology is indistinguishable from magic."

Of course I am not accusing the successful demonstrators of cheating. They were probably just lucky.

Back in the old days, prestidigitators had to say "Keine Hexerei, nur Behändigkeit" (no witchcraft, only dexterity), to avoid accusations of witchcraft. These days in computing demos it's the other way around: no amount of dexterity alone will make it work, you need some magic or luck as well.

All the news on the forthcoming NGS IF'10

In an extraordinary show of organisation, I have just put up the agenda for this years NGS Innovation Forum even though it’s not until November – a whole 5 months away!

The reason for this organisation is mainly due to a wonderful programme committee for the event consisting of users, IT sys admins, Campus Champions and NGS staff members. We have put together what will hopefully be an interactive, engaging and thought provoking event where everyone will come away with some new found knowledge.

On the first day which is primarily aimed at users we will be showcasing some of the tools from the NGS such as the UI/WMS and Hermes. This will not be “death by PowerPoint” and instead NGS staff will demo the tools and walk you through using them from a users point of view. We will also be having a session where we can gather feedback from delegates about the NGS and respond to your comments.

If you’d like to come away from the event with more than new found knowledge be sure to submit a poster abstract for our poster session on the Tuesday evening. We are inviting all users to present a poster showcasing their research using the NGS and there will be prize for the best poster as voted for by the delegates.

The second day will be aimed more at IT staff, sys admins, research computing staff etc. There will be presentations from the “coal face” with a presentation from a NGS member site detailing their experience of working with us. We will also be focusing on the bigger picture by showing how the NGS can facilitate collaboration between research groups both nationally and internationally. This will be followed by presentations by large European multi-institution projects. We know that allowing external people to use your resources is a big responsibility so we have dedicated a session to “keeping track” which will also allow for discussion regarding accounting in institutions.

All in all we hope you will find the event both stimulating and informative. Keep an eye on here for registration opening and hopefully we’ll see you in November!

Thursday, 17 June 2010

Versions of version control

Earlier this month, the developers at Manchester released a new version of the NGS accounting clients - with Condor and limited support for Sun Oracle Grid Engine.

Those readers who stumbled across the earlier R+D posting `Who, where and how much?' may recall that the accounting clients translate the records from local batch management systems into a format suitable for accounting on a Grid.

The code - like other software developed within the NGS - is being distributed via the NGS project at NeSCForge.

NeSCForge is a version of the GForge Collaboration Toolkit hosted at the National eScience Centre. In addition to using it to releasing packages, we also make some use of the revision control service.

A revision control service allows all changes to the files making up a package are recorded in a repository - with associated log messages - allowing the history of a piece of software to be tracked. When bugs bite, the ability to identify when a change was made is invaluable.

The revision control service is based around the venerable Concurrent Version System
(CVS). When CVS development started, back in 1993 according to CVS's own ChangeLog, CVS was the bees-knees if you needed to keeping track of large numbers of files.

The bees-knees aren't what they used to be. Many developers have moved on from CVS to Subversion or to distributed revision control systems like GIT. We use Subversion at Leeds, my colleagues at Manchester prefer GIT.

Some NGS software development, such as that for the VDT installer , is done via the NeSCForge CVS service. A lot is done via private repositories and other revision control software.

When code is developed in private, people outside lose the ability to spot when and where the bugs appeared. If you are developing security-sensitive software which can be used by anyone, it could be considered good manners publish these in a public repository.

If you use NeSCForge for that public repository, you need to feed everything to CVS.

This is easier than it sounds...

GIT comes with a git-cvsexportcommit command - that will translate GIT changes into CVS-ish commits. Subversion users can download and install svn2cvs.

We use svn2cvs at Leeds to mirror changes to X509runsetgid - mentioned in last week's 'Licensed to Grid' posting - to NeSCForge. For this to work, you need to give NeSCForge a SSH authorized key and use somewhat verbose command line...

env CVS_RSH=ssh \
$SVNROOT/x509-setgid/trunk \ x509runsetgid

While it can be a little embarrassing to leave a record of every bug you created in the public domain - you can console yourself with the knowledge that you are doing other developers a favour. While it is good to learn from your own mistakes - it is much better to learn from somebody else's.

Tuesday, 15 June 2010

Would you like a free software trial?

If you would then it's your lucky day!

For the next 3 months, the NGS partner resource at Leeds and Schrodinger are offering NGS users a free trial of the Virtual Screening workflow from Schrodinger's structure based design tools.

Schrödinger software has been used on national and regional high performance computing centers as a way of providing access to greater computing power, particularly for applications such as high throughput virtual screening. As part of this initiative they are offering a three-month evaluation of Schrödinger’s structure based design tools to all academic users accessing the NGS resources.

You don't need to be an existing user of the software - just a NGS user. Through the trial you'll gain access to an interface for full protein and ligand preparation and high throughput access to Schrödinger’s Glide docking application for screening of large datasets via your existing NGS access.

If you are interested in taking advantage of this offer, please see the news item on the NGS website.

Friday, 11 June 2010

Licensed to grid

Software licensing: where legalise and technical gobbledy-gook meet.

When you hit the 'I agree' button - after carefully reading the license terms, naturally - you are promising the software vendor that you will look after their little bundle of binary and not let it fall into the wrong hands.

It is not always easy to identify whose hands are the wrong ones. Licenses can cover individuals or research groups, users of a particular host or whole institutions and, sometimes, national and international collaborations. They do not usually cover whole grids.

So in addition to the technical issues, you need to keep licensing in mind if you make a resource available to people outside your research group or institution. This posting will describe some of the techniques developed by the NGS partners to cope with the complicated business of licensing on a grid.

Starting with the simplest problem. Free software is comparatively painless as long as 'free' means Open-source licensed. Examples of this kind of package from the list of applications available on the NGS partner sites include GROMACS and AUTODOCK 4.

Commercial software will typically enforce license rules using something like Flexera Software's FlexNet Publisher - what many system administrators still refer to as FlexLM. FlexNet is frequently used to provide floating licenses: where a license service can be configured to hand out licenses from a pool.

And sitting awkwardly in the middle are those applications that use could be called honesty box licensing. There are no technical barriers preventing the code from running but users must agree to license terms before they are allowed to run them.

Honesty box licensing covers packages such as: Amber, Castep, DL_POLY, GAMESS (US) PC-GAMESS/Firefly. Academics can typically obtain the rights to use thse packages for a comparatively small fee - or even for nothing - as long as they are used for academic research. Licensing is seen as a means of tracking users or protecting intellectual property rather than as a way of making money.

NGS partner sites have ways of providing access to FlexNet licensed software and that relying on the honesty box approach.

Where FlexNet in used, sites have installed the software but do not provide a valid local license.

These packages are aimed at users with access to floating licenses and their own FlexLM license servers. The users need to arrange their local firewalls so this license server is accessible from the NGS partner site. FlexNet generally allows the location of license to be specified via an environment variable such as LM_LICENSE_FILE.

NOTE: If you are thinking of using this approach, you need to confirm that the legal bits of the license allow it.

Honesty box licenses are usually managed by restricting access to the software to a particular group of users. Only accounts belonging to users who are known to have signed the license are added to the group.

This is where The Grid adds an extra layer of complexity. Many sites automatically allocate an account only on the first time that user's certificate is seen. You cannot be in the group if you do not have an account.

Hopefully, any nastiness is hidden from the license holder. If you are a license holder, simply contact the helpdesk on after your first use of a resource and ask to be granted access to the application.

Whoever answers the request will need to confirm that you are a legitimate license holder. This will usually be nothing more than a short email exchange but can be more drawn out for more commercially sensitive packages.

Neither the NGS staff or the user want to go through this process more than once so we need to record who the known licensees are.

As we are a virtually organised, we add the licensees to a special virtual organisation and assign them to groups representing particular applications. Sites admins can download the group membership from the VO and use this to control local group membership.

Sites can update the groups by whatever mechanism best fits their systems. At Leeds, we use a locally written tool called x509runsetgid.

X509runsetgid is available from the NGS area at NeSCForge and uses the Unix set-group-id or setgid mechanism. The tool will launch a program as if owned by a particular group only if the user presents a certificate that recognised as part of that group. The list of users is usually downloaded from the VO.

The set-group-id approach is not without problems. The major one is that proxy certificates typically last 12 hours, so if the queues are long and the job takes a while to start running, the proxy is no longer valid by the time it is checked.

We are improving the way we support the VO as part of the NGS's Research and Development work.

The VO has been maintained manually: a rather slow and painstaking business.

Over the last few weeks, the database developers at Manchester have added tagging to the NGS User Accounts Service. A tag is simply a label that can be associated with a group of users, a virtual organisation and a role or group within that virtual organisation.

Development work is underway to automatically update the membership of selected virtual organisations, including from the information in the tags.

We will, of course, be making as much of the software as we can available.... under an open-source license.

Thursday, 10 June 2010

New on the website!

I've been updating the NGS website today with a couple of new things. Well I've actually updated it several times this week - there seems to be a lot going on! I thought I would point the new additions out on here just incase you miss them as there is so much being added at the moment.

We've got a new poll up on the website asking if you would like the NGS to provide an academic cloud service. Now we've all heard a lot about clouds over the last few months as it seems to be the buzzword of the moment but do you really want one? Yes, no , not sure? All answers are welcome and it will take less than 5 secs to record your opinion. See the home page of the NGS website to vote!

I also finished off the 11th edition (yes really...) of the NGS quarterly newsletter this week and it's now up on the website. There is a wide range of articles in the newsletter covering topics such as the EGI and the role the NGS is playing in this, how the NGS is demonstrating global interoperability, a user case study showing how researchers actually use our resources, an introduction to our University of Sheffield member site and much more. It never fails to surprise me just how much is going on at the NGS and how much we have to report back to our users through our newsletter each quarter.

I'm already planning the next edition of the newsletter which will be out in time for AHM so if you have any ideas for articles then please let me know!

Tuesday, 8 June 2010

Gearing up for AHM

Yes I know it may seem like a long way away but the preparation for this years All Hands Meeting has already started at the NGS. The call for papers has been out for a while and eventually (after the 2nd extension) closed on Monday. Registration was opened last Wednesday so if you feel like being organised you can register now! I imagine that a lot of people will be waiting to see if their papers are accepted before registering.

There will be a good turn out from the NGS as always with several papers submitted from our team and an exhibition stand publicising the NGS and the NGI in conjunction with GridPP. I'm not sure what we'll be demo-ing on our stand yet but I''m sure it will be something exciting.

As many people know the AHM unfortunately clashes with the inaugural EGI meeting in Amsterdam but all is not lost as there are direct flights several times a day from Cardiff to Amsterdam so it may be possible to attend both meetings!

I'm just glad that the AHM has moved back to September as it means that I have the quieter months of July and August to get everything ready for the event!

Friday, 4 June 2010

Too much of an adequate thing

There are two quotes that sum up much of the technology behind grid computing.

The first..

All problems in computer science can be solved by another level of indirection... Except for the problem of too many layers of indirection.

has been variously attributed to the computer scientists Butler Lampson and David Wheeler. In the grid world, indirection is spelled M-I-D-D-L-E-W-A-R-E.

The other?

XML is like violence – if it doesn’t solve your problems, you are not using enough of it.

Variations on this can be found scattered around the internet because XML - the eXtensible Markup Language - is everywhere. Sometimes because it is the right tool for the job, frequently because it is an adequate tool for the job.

It is the right tool for the job when that job is marking up - superimposing a structure on what would otherwise be stream of text.

Scholars in digital humanities use XML schema from the Text Encoding Initiative - and elsewhere - with material as diverse as Chaucer's Canterbury Tales, the Concise Icelandic-English Dictionary and the writings of Mark Twain.

XML parsers such as Apache Xerces can read this data and store it in XML-aware databases such as eXist. These databases can be queried on both their content and their structure using technology such as XQuery and XPath. The tools are complicated to learn but we have the excuse that they are doing a complicated task.

But, as anyone who has experience of modern software will tell you, XML is used for far more than marking up text. It is used as the basis for the SOAP protocol and the Web Services Resource Framework used by the Globus Toolkit and UNICORE.

Pragmatism helped spread XML. It was good enough. It could be understood by looking at it. There are libraries available to generate and process it. Modern development environments such as Eclipse and Netbeans can edit it without the tripping over the angle-brackets.

XML is designed to be flexible enough to cope with ambiguous human text. Computers don't cope well with ambiguity and - as NGS staff discovered during the roll out of the SARoNGS service - XML can be ambiguous in very inventive ways.

SARoNGS, for those who have yet to meet it, is a way of generating restricted certificates for users who can present institutional usernames and passwords. It was intended as a lightweight alternative to applying for a full e-Science certificate.

SARoNGS is uses the Shibboleth service managed by the UK Access Management Federation to authenticate users. Shibboleth is designed to authenticate access to web sites and widely used as a way of accessing academic journals. It is built on SOAP. SOAP is built on XML.

Shibboleth is designed so that the user only ever enters his or her username into a web-page provided by his or her home institution. This is called the Identity Provider or idP. If you try to connect to a Shibboleth-protected service, you will eventually be sent to your local idP.

The idP confirms that you are indeed a fine, upstanding member of the University of Whatever in two distinct messages - known as assertions - one delivered via your web browser, one delivered directly to the Shibboleth-protected service. Think of it as a way of linking your browser, your home institution and the service together.

These messages are heavily disguised XML.

When users from Leeds and some other large Universities tried to use the service - they failed. SARoNGS would see nothing.

Further investigation showed that both messages were being sent but one was being rejected on arrival.

At the time SARoNGS was the only service within the Access Management Federation that insisted on assertions being digitally signed.

Because there are many ways of formatting the same information in XML, signing a lump of XML involves first converting it to a canonical form by - for example - replacing certain character sequences by their Canonical counterparts. This is further complicated by XML Namespaces designed to allow different XML dialects to be used in the same document.

The SARoNGS developers eventually tracked down a small bug - known as JXT-55 - in an XML toolkit used by some versions of Shibboleth. JXT-55 had inadvertently squeezed two different versions of an XML dialect called the Security Assertion Markup Language into the same namespace.

Some time between the XML being canonicalised for signing and being sent, an XML 'cleanup' routine stripped out what looked to it like duplicate data. This invalidated the signature and the message was rejected.

Users in institutions running older Shibboleth services - which included the buggy code - could not use SARoNGS, those running newer ones could.

Discovering this took several weeks of effort and frustration by half-a-dozen people at 4 universities.

XML is like violence in one sense - it can be a source of a great deal of pain.

[Edit: 6-Jun to provide more details of The Little Bug that broke SARoNGS]

Thursday, 3 June 2010

Free places at the TransferSummit

Thanks to colleagues at the OSS Watch there are a couple of free places for academics to attend this years TransferSummit.

If you haven't been before (or heard of it!), it's an opportunity for academics, the research community and business executives to talk about the requirements, challenges, and opportunities in the use, development, licensing, and future of Open Source technology.

This isn't your usual "death by PowerPoint" conference either. There will be three tracks, a Gala Dinner and a BarCamp over the 2 days ensuring that everyone has a good chance to contribute to the discussion.

If you'd like to attend then please email OSS Watch to obtain a discount code.

Tuesday, 1 June 2010

NGS Innovation Forum 2010 announced!

We have just announced the dates of this years NGS Innovation Forum which will be held at STFC RAL on the 23rd - 24th of November.

We have had two very successful events in Manchester and London so far with over 250 people attending. We hope to make the third event even better still with new opportunities for participants to be directly involved.

Again the event will be aimed at users, IT staff, researchers and others who are interested in the NGS and what it does. Participants have the option of attending for one or two days and accommodation is available on-site.

Watch this space for more details to be announced shortly regarding the programme but in the meantime make sure you put the date in your diary!