I can't say that
I wasn't warned.
- accepting work requests from a Workload Management Server.
- passing them on to HPC systems which may be running
SunOracle GridEngine, Torque/PBS or SLURM batch systems. (Leeds is using Grid Engine, but if we are successful, it could be rolled out to other institutions).
The EMI are in an
uncomfortable position. Their job is to take pieces of software from different places - that is similar in intent and very different in design - and persuade them to work together. Sometimes, inevitably, things fall through the gaps.
This is how it is meant to work...
- Information is kept in a BDII-friendly database and made available to the world via the LDAP protocol through an OpenLDAP 'slapd' service.
- On any given system, this information is generated as a set of LDAP 'LDIF' format records by programs called providers and plugins.
- A program called bdii-update takes the locally generated LDIF, processes it and passes it on to slapd.
What was actually happening...
- The ARC information system was generating lots of LDIF.
- The bdii-update process was collating it and passing it onto slapd.
- slapd was refusing to accept it - complaining of an 'Object class violation'.
After digging into the inner workings of both the BDII and ARC, we've identified the cause. It is all down to a subtle difference between what Nordugrid expect and what gLite expect from their information services.
From this point on, this is going to be technical. Readers of a less geeky disposition can look away now, happy in the knowledge that we know what broke and how to fix it.
Geeks, grab yours Acronyms. Here we go...
Slapd relies on schema files to define what is acceptable: Nordugrid have their own Scandinavian-style
nordugrid.schema; gLite use the
GLUE schema, including one called
Glue-MDS.
Glue-MDS and nordugrid.schema both define an objectClass called 'Mds'. Both agree that it represents a collection of information but in GLUE, an Mds is defined as a STRUCTURAL class whereas Nordugrid defines it as an ABSTRACT class.
So what... as anyone who managed to make it this far down the page might cry.
Well, in the LDAP-world, STRUCTURAL objects can exist whereas ABSTRACT classes can only be used as a basis upon which other objects can be defined. Its all very
Object-Oriented-Programming.
ARC's information service generates 'MdsVo' objects, based on Mds objects, but properly STRUCTURAL. This is fine according to the nordugrid schema.
But bdii-update contained code that takes any object that is based on an Mds object and turns it into a plain, simple self-contained Mds object. This is closer to what GLUE expects.
Slapd gets very confused.
A
bug report has been raised - and after a bit of bug ping pong between the BDII and ARC developers - it has been decided that bdii-update should, in future, leave Mds objects alone. For the moment, all that is needed is to remove the line in bdii-update that reads
new_ldif = fix(new_dns, new_ldif)