Wednesday, 15 September 2010

Bigger data and Faster Horses

Alex Szalay of John Hopkins University and the Sloan Digital Sky Survey spoke at All Hands this morning.

His talk was about the science that can be done when you simply have too much data to store or process and your first task is working out which bits you need to throw away.

Among the many interesting points he made was that, by Amdahl's Law, modern computers are unbalanced if they are used for data-driven research.

CPUs are fast. Modern multi-core CPUs can crunch numbers at extra-ordinary rates. But we gain very little from this if we can't feed them the numbers as fast as they can crunch them.

At best, the numbers have to be loaded from memory and, on the timescales at which a computer works, memory access is slow. At worst, they come from disk and disk access is much, much slower.

Modern CPUs hide the generally sluggishness of memory by keeping data that has been used or may soon be used within a small but very fast caches. As a block of data is transferred from the main memory to cache, nearby blocks are copied too.

Disks and operating systems use a similar approach - whenever a user requests a block of data to be transferred from disk to memory, the blocks that follow it are transferred too.

You only get the advantage of the memory caches and disk access if you are reading the data in one big long stream.

Prof. Szalay likened this to watching the results of a laboratory experiment as it runs. He described computer systems - which he called Data Scopes - designed so that the speed at which data can be accessed is as near as possible to the speed at which it can be processed. You carefully layout your data in the 'scope and just let its comparatively low powered CPUs crunch away.

It is very different from the current approach to High Performance Computing. He ended his talk with a quote attributed to Henry Ford - an example of why 'more of the same' is not always an option:
If I had asked people what they wanted, they would have said faster horses.

No comments: