Monday 14 March 2011

Loaded

The most popular page on the NGS web site is the load monitor - which shows the current number of running jobs on selected NGS Partner sites as a set of moving, coloured bars.

We don't think this is because the red, yellow and green graphics look pretty. Many of our users have adopted a simple, effective - and low-tech - approach to scheduling jobs: they have a quick look at the load monitor page and do their work on the least loaded machine.

This is the human-powered counterpart of what the WMS bit of the UI/WMS does.

The load monitor is a nice example of how to present the information that is routinely published by a site on a Grid and defined by a GLUE Schema.

GLUE - in an egregious example of acronym abuse - is meant to stand for Grid Laboratory Uniform Environment. The reality is that it is called GLUE because it is what sticks the Grid together.

As any good Grid standard should, GLUE has its own Working Group, GLUE-WG, and proper published formal specifications. The current version is GLUE 2.0 but its predecessor, GLUE 1.3, is more widely deployed.

The load monitor is a visualisation of two pieces of (GLUE 1.3 style) information, presented to the world as
  • GlueCEStateFreeCPUs
  • GlueCEStateTotalCPUs
which, if you ignore the CamelCase-naming scheme and the GlueCEState prefix, are fairly self-explanatory. They are published for every compute element.

The pretty dancing bars are generated using jsProgressBar.

This is not new: the load monitor has been running for as long as the current version of the NGS web site and - before then - researchers used and abused a central Ganglia service for very much the same purpose.

It has back as a Research and Development activity because the NGS is changing.

The hard bit isn't the calculation of the system load or the pretty graphics: it is deciding which sites and compute elements should appear.

The current version is intimately entangled with the INCA monitoring service - the list of hosts is extracted from from a configuration file built for INCA.

INCA - as anyone with a high-enough tolerance of tedium to read the NGS R+D blog regularly will know - is being decommissioned as soon as the Nagios service is ready to replace it and we decided - late last year - to stop updating INCA.

This list has is becoming out of date: it includes a number of machines will disappear from the NGS soon and misses many others which should be there.

We are rewriting the load monitor to:
  • Select compute elements from information sucked from our Single Point of Truth - the GOCDB.
  • Filter out only those which support the ngs.ac.uk Virtual Organisation using the snappily-named`GlueCEAccessControlBaseRule' attribute defined by GLUE, and published by the sites.
and use it to generate a list of active Compute Elements in sites.

We can calculate the load on each of these compute elements at regular intervals and present it to the world in as pretty, colourful, wobbling bars. It is the Web 2.0-way.

1 comment:

Ahmed said...

Awesome stuff.