Thursday, 27 January 2011

Fixing the mangled XML in the SARoNGS service

A little technical note following from last weeks posting on SARoNGS.

If you are averse to perl, XML and regular expressions, look away now. There will be a proper R+D blog post along shortly.

Someone at a recent NGS Surgery asked for the gory technical details of how we turned the corrupted XML that was breaking the SARoNGS service into something that once-again matched its cryptographic signature.

The SARoNGS web front-end at https://cts.ngs.ac.uk is written in Perl. It relies on Shibboleth to obtain user attributes from identity providers, encode them in Base64 and deliver them via a custom http header called 'Shib-Attributes'.

The Apache web server will eventually present this to our perl code in an environment variable called HTTP_SHIB_ATTRIBUTES.

We realised that, under some circumstances, an additional set of xmlns:xs and xmlns:xsi namespace declarations were being added to the <samlp:response .. > XML tag generated by newer versions of the Shibboleth idP.

These were always inserted at the end of the tag, before the final '%gt;' and just after the responseid attribute. Removing them meant...
  • turning the base64 encoded data back into XML,
  • using a perl regular expression to remove the cruft and restore the XML to canonical form
  • turning the correctly canonicalised XML back into Base64.
or in perl...
use MIME::Base64;
my $encodedData = $ENV{HTTP_SHIB_ATTRIBUTES};

...

my $shibAttrXML=MIME::Base64::decode_base64($encodedData);
for ($shibAttrXML) {
s{(<saml1p:response.*?responseid="_[0-9a-f]+")(.*?)(>)}{$1$3}m;
};
my $encodedDataCanonical=MIME::Base64::encode_base64($shibAttrXML,'');


It's a workaround, not a fix, but it is a workaround that works.

No comments: