Friday, 4 June 2010

Too much of an adequate thing

There are two quotes that sum up much of the technology behind grid computing.

The first..

All problems in computer science can be solved by another level of indirection... Except for the problem of too many layers of indirection.

has been variously attributed to the computer scientists Butler Lampson and David Wheeler. In the grid world, indirection is spelled M-I-D-D-L-E-W-A-R-E.

The other?

XML is like violence – if it doesn’t solve your problems, you are not using enough of it.

Variations on this can be found scattered around the internet because XML - the eXtensible Markup Language - is everywhere. Sometimes because it is the right tool for the job, frequently because it is an adequate tool for the job.

It is the right tool for the job when that job is marking up - superimposing a structure on what would otherwise be stream of text.

Scholars in digital humanities use XML schema from the Text Encoding Initiative - and elsewhere - with material as diverse as Chaucer's Canterbury Tales, the Concise Icelandic-English Dictionary and the writings of Mark Twain.

XML parsers such as Apache Xerces can read this data and store it in XML-aware databases such as eXist. These databases can be queried on both their content and their structure using technology such as XQuery and XPath. The tools are complicated to learn but we have the excuse that they are doing a complicated task.

But, as anyone who has experience of modern software will tell you, XML is used for far more than marking up text. It is used as the basis for the SOAP protocol and the Web Services Resource Framework used by the Globus Toolkit and UNICORE.

Pragmatism helped spread XML. It was good enough. It could be understood by looking at it. There are libraries available to generate and process it. Modern development environments such as Eclipse and Netbeans can edit it without the tripping over the angle-brackets.

XML is designed to be flexible enough to cope with ambiguous human text. Computers don't cope well with ambiguity and - as NGS staff discovered during the roll out of the SARoNGS service - XML can be ambiguous in very inventive ways.

SARoNGS, for those who have yet to meet it, is a way of generating restricted certificates for users who can present institutional usernames and passwords. It was intended as a lightweight alternative to applying for a full e-Science certificate.

SARoNGS is uses the Shibboleth service managed by the UK Access Management Federation to authenticate users. Shibboleth is designed to authenticate access to web sites and widely used as a way of accessing academic journals. It is built on SOAP. SOAP is built on XML.

Shibboleth is designed so that the user only ever enters his or her username into a web-page provided by his or her home institution. This is called the Identity Provider or idP. If you try to connect to a Shibboleth-protected service, you will eventually be sent to your local idP.

The idP confirms that you are indeed a fine, upstanding member of the University of Whatever in two distinct messages - known as assertions - one delivered via your web browser, one delivered directly to the Shibboleth-protected service. Think of it as a way of linking your browser, your home institution and the service together.

These messages are heavily disguised XML.

When users from Leeds and some other large Universities tried to use the service - they failed. SARoNGS would see nothing.

Further investigation showed that both messages were being sent but one was being rejected on arrival.

At the time SARoNGS was the only service within the Access Management Federation that insisted on assertions being digitally signed.

Because there are many ways of formatting the same information in XML, signing a lump of XML involves first converting it to a canonical form by - for example - replacing certain character sequences by their Canonical counterparts. This is further complicated by XML Namespaces designed to allow different XML dialects to be used in the same document.

The SARoNGS developers eventually tracked down a small bug - known as JXT-55 - in an XML toolkit used by some versions of Shibboleth. JXT-55 had inadvertently squeezed two different versions of an XML dialect called the Security Assertion Markup Language into the same namespace.

Some time between the XML being canonicalised for signing and being sent, an XML 'cleanup' routine stripped out what looked to it like duplicate data. This invalidated the signature and the message was rejected.

Users in institutions running older Shibboleth services - which included the buggy code - could not use SARoNGS, those running newer ones could.

Discovering this took several weeks of effort and frustration by half-a-dozen people at 4 universities.

XML is like violence in one sense - it can be a source of a great deal of pain.

[Edit: 6-Jun to provide more details of The Little Bug that broke SARoNGS]

No comments: