We hope to fill the gap left when the last of our NGS-funded clusters was turned off back in April. Our main requirement was that the grid front end should be completely separate from the HPC service. In addition, we wanted...
- ... to use software provided by the EMI-1 software release from the European Middleware Initiative.
- ... to publish information that can be read into the NGS's central information service.
- .. to accept request from the NGS's workload management system.
Unfortunately, EMI-1 was missing the components needed to make CREAM work with the SGE batch system used locally. The only software within EMI-1 that was SGE-friendly was Nordugrid's Advanced Resource Connector - ARC.
After a few months of work and in the great tradition of the grid: it is sort-of-kind-of-working-after-a-fashion. At the moment:
- ARC's compute service - A-REX - is accepting jobs: for a very limited set of users and not from the workload management system.
- ARC's information provider - ARIS - is publishing information about the system and this information is making its way to the NGS's BDII.
A bit of background. The NGS information service is a Berkeley Database Information Index service or BDII. BDIIs are built to collate information, some of which comes from other BDIIs. The NGS's central BDII, for example, collates information published by a BDII, or something that looks like a BDII, at each of the partner sites.
ARIS can do a impression of a BDII. Whether it is a convincing impression depends on what it is talking to.
ARIS produces information in its own Nordic-accented schema, designed to feed the ARC tools. This needs to be translated into GLUE format before a BDII will give it a second glance.
Based on documentation on linking ARC and EGI from Nordugrid,, this can all be done via a single ARC configuration file called /etc/arc.conf. arc.conf consists of blocks, denoted by a [name in square brackets] each containing a set of name=value definitions.
arc.conf needs to be tweaked in three places.
Turn on publishing of Glue 1.2 format information - which is close enough to the current common Glue version 1.3 - by adding to the '[infosys]' block.
[infosys] ... infosys_compat=disable infosys_nordugrid=enable infosys_glue12=enable
Add in anything that Glue needs and ARIS does not via the '[infosys/glue12]' block:
[infosys/glue12] glue_site_unique_id="NGS-LEEDS" ... provide_glue_site_info=true
And finally arrange for ARIS to collect its own output and present it as if it were a site BDII by a block called
[infosys/site/NGS-LEEDS] unique_id=NGS-LEEDS url=ldap://ngs.arc1.leeds.ac.uk:2135/mds-vo-name=resource,o=grid
Our initial experiments suggest that the information produced by ARIS is good-enough to be accepted the NGS's central BDII but not good enough to fool our Nagios monitoring.
WLCG Nagios includes a number of BDII specific tests including one called org.bdii.Entries. org.bdii.Entries only looks for 'services' - or more accurately objects of the 'GlueService' type. While ARIS generates a lot of information, none of describes a GlueService.
What we don't yet know if it Nagios is being picky, or whether the existence of a GlueService is vital for some bit of grid wizardry.
No comments:
Post a Comment