Wednesday 7 September 2011

It is not easy being parallel

As has been said before - there are differences between Grid and traditional High Performance Computing. Some of the differences are due less to the technology and more to the problems being solved.

The more successful grid users are task farmers: they scatter comparatively small compute tasks and data and wait for them to grow into results. The grid - metaphorically speaking - is there to plough the land, spread  the fertilizer and muck out the system administrators.

Traditional HPC concerns itself with big applications and - in particular - applications that are too big to fit on a single computer. HPC systems are built with parallel computing in mind.

The Grid does not do parallel computing well.

Consider the two steps in running any parallel tasks
  • Asking for more than one CPU core on the same system.
  • Setting those CPU codes to work
For each step, there is definitely more than one way to do it...

Take 4...

So, there you are, sitting by your favourite grid client, a freshly minted X509 proxy ready. All you need to answer one of the great problems of modern science is 4 CPUs.

All you need to do is ask.

How you ask depends on who you are asking and what grid dialect they understand.

Globus GRAM5 and ARC accept tasks defined in Globus  Resource Specification Language (RSL), possibly with some Nordic extensions. In RSL, you can ask for more than one CPU with an additional:

  (count=4)

The web-service-y Globus job submission systems (WS-GRAM) used a similar approach but written as XML.

In Job Description Language, as understood by the gLite CREAM-CE and WMS, you need

  CPUNumber=4;

And in the OpenGridForum-approved XML-based Standard Job Specification Description Language, you have the instantly-memorable and easily-readable:

    <jdsl:TotalCPUCount>
       <jdsl:Exact>4.0</jdsl:Exact>
    </jdsl:TotalCPUCount>

(which you will find buried somewhere under 3 levels of XML tags). 

Yes - I know JSDL isn't really there for humans to read, but it doesn't stop some humans trying.

4 go to work...

That was the easy part.

Now it gets complicated.

And, on this occasion, you can't blame the Grid for the complexity.

Large-scale parallel programs are typically written around libraries implementing the Message Passing Interface (MPI). There is more than one version of the MPI standard and more than one library implementing them.

To add to the confusion, from some MPI variants, you need to build versions for each FORTRAN compiler installed.

Launching a parallel job depends on both the job management software and the underlying mechanisms used for communication. MPI installations typically provide either an mpirun or mpiexec command that ensures that the right processes are started in the right way on the right computers.

It is very likely that each version or each MPI implemention will have its own variant of mpirun or mpiexec. It is equally likely that - at least for mpirun - they will expect different arguments.

In the first and second phases of the NGS, we were funded to provide exemplar Grid clusters at RAL, Oxford, Leeds and Manchester. The grid software we deployed - Pre-WS GRAM from Globus 4 - could launch MPI jobs if

  (jobtype="mpi")

was included in the RSL.

It could only launch one of the many possible mpirun commands. To work around this, devious system administrators cooked up a sort of super-mpirun that would locate the correct version for an applications.

Researcher in Ireland found ways of launching MPI jobs from within JDL jobs - but they could not hide all the complexity.

ARC supports parallel jobs via its Runtime Environments extension - which can tune the environment for an application so that the right number of CPUs are assigned and the right mpirun is run. Again, this needs the  system administrator to do something devious if it is to work.

We haven't even begin to cover parallel programs written outside MPI - such as those using the Java sort-of-MPI library MPJ-Express.

So... what am I trying to say?

It would be nice to have a conclusion, or at least a lame joke, to end this blog post - but I can't think of one.

All I can say is that parallel computing is complicated, distributed computing is complicated and that any attempt to combine the two - either using existing Grid solutions, or something newer, shinier and probably invoking the word Cloud - cannot make either kind of complicated vanish completely.

No comments: