Sunday, 23 May 2010

Who, where and how much?

Accounting: if there is a word to gladden the hearts of those of us who run the machines that make up the grid, that word is not accounting.

Unfortunately, we are not here to have our heart's gladdened. We are here to ensure that people can run tasks and that we can keep track on how much computer time these tasks use.

Accounting is not an optional extra and it is more complicated than it appears because of the way tasks bounce around the grid.

A task run on a grid arrives as a little bundle of information. This bundle describes what should be done, in the form of a command and arguments, and who requested it, in the form of a certificate.

When this reaches a computer on the grid, a service called the jobmanager will translate the 'who' into a local user and group and the 'what' into a something that the local batch management service can process.

For a computer cluster, the local batch management service will be open-source software like Torque or SGE or a commercial systems such as PBSPro and LSF . Its role is to put the command into a queue until it can find the computer power to do the work.

From the point on, the jobmanager hangs around, repeatedly asking the batch management system 'Are we there yet?' like the computational counterpart of a bored teenager on a long car journey.

The information about who submitted the task is recorded by the job manager. The information about how much computer time was crunched is kept by the batch system. Before any accounting is done, the information has to be bundled together.

To explain how this is done, we will concentrate on the Resource Usage Service Client, developed at Manchester and described at http://www.ngs.ac.uk/site-level-services/rus-and-ur. This packages all the accounting information into 'RUS' records - blobs of XML described in a 59 page specification document from the Open Grid Forum - and hands them over to our accounting service.

The RUS client main role is to decode local batch system accounting logs.

It also needs to augment this with the Distinguished Name from the certificate associated with the task and the Virtual Organisation of which the owner of that certificate claimed membership. The distinguished name can be extracted from the Globus accounting logs in:
$GLOBUS_LOCATION/var/accounting.log
Virtual Organisation information is recorded by a accounting plugin built as part of gLite LCAS/LCMAPS by the NGS's installer scripts. The plugin stashes its accounting information in
$GLITE_LOCATION/var/voms_accounting.log
If you want to know the gory details of LCAS/LCMAPS, look at the early posting: The M-Word.

Two scripts called createjbmdb and createlcasdb - which are provided with the RUS client - read these logs and build databases mapping distinguished names and virtual organisations to usernames and other information.

These databases are used fill in the gaps in the RUS records produced from the local batch system before these are uploaded to the NGS's RUS service.

We use the information to ensure that NGS users keep within their CPU quotas and to allow owners of Virtual Organisations we support to track usage by the VO members.

As NGS Research and Development, we are investigating additional ways of getting RUS data into the accounting database: from equivalent Grid accounting services such as GridPP's APEL, or from High Performance Computing clusters using GridSAFE

It may not be exciting but - as academic institutions increasingly share resources - accounting will be vital for the future of the grid.

No comments: