Monday, 27 September 2010
Overall there was a theme of "bridging the gap" - or is it a chasm? - to take software from its initial deployment with limited uptake to a broader user base, the place where prototypes either fail or become successful. Prof Dan Atkins from UMich spoke about this chasm, and how to bridge it, and about UK's e-science and its impact. Prof Alex Szalay from Johns Hopkins spoke about coping with data volumes in the context of Amdahl's law, ie the advantages - or (eventually) lack of advantages - of parallelisation, and the importance of simulation - we can today simulate things made of 1,000,000,000 interacting pieces, which was seen as impossible just a short time ago, and we expect to soon be able to process 1000 times that - so simulations and the data they generate will have an increasingly larger impact on research and society as a whole. Our very own prof, Carole Goble, spoke about the long tail scientist, all the many who do research but are not part of big groups, and how this tail is getting "fatter" (or how to make it fatter), and even getting "normal" people involved like Galaxy Zoo did - but to do this, you (ie us, infrastructure and software people, e-scientists) need to "walk in their shoes"; and also by rewarding people for engaging and sharing - no prizes for second place or for developing standards, so people feel possessive about their work, and they may not realise the benefits of sharing and collaborating until they've tried.
I can only encourage you to browse the programme for presentations. Dan Atkin's plenary even contains a transcription! Make yourself a cuppa and go read them now.
Friday, 24 September 2010
A bit like sending a letter to Santa (you have no idea where it is going and you can be fairly sure you won't hear anything back).I suppose that makes us Santa's little helpers: less of a National Grid Service, more of a National Elf Service.
Dr. Asquith is not impressed by the the cryptic errors that we - the grid world - inflict on users. These are even less amusing than that last attempt at a joke.
Her example was "Lost Heartbeat" but there are many more. Those of us who answer NGS helpdesk tickets are familiar with people asking why the WMS said...
Standard output does not contain useful data. Cannot read JobWrapper output, both from Condor and from Maradona.and people running grid software quickly get used to seeing the classic
GSS failed Major:01090000 Minor:00000000 Token:00000003These are nothing to do with large birds, Argentinian footballers or unsuccessful members of the military. Roughly translated these mean 'Sorry... your job went missing' and 'Oops... invalid certificate' respectively.
In their favour, at least these error messages are obscure enough to give sensible answers when fed to Google.
So why are we so bad at telling people what has gone wrong?
In part, this is because so many things have to go right for a job to run: data has to be delivered to the right place, the right software needs to be available and the machine doing the processing needs to be behaving itself. The end result of any failure of any part is the same - the job failed.
The same applies to the letter from Santa. All you know is that you didn't get what you asked for. You will never know if this was because your letter was eaten by a reindeer, or if it was dropped down a chimney or simply that you happen to be on the Naughty List this year.
The situation is not helped by the body of very general purpose code that is buried deep within grid software. That GSS message about the failed major, for example, comes from an implementation of the 'Generic Security Service' Application Programming Interface.
The GSS-API is meant to be able to handle any mechanism for securing network traffic - it handles Kerberos in the same way that it handles certificates. JANET(UK)'s 'Project Moonshot' plans to use it in conjunction some of the technology behind Shibboleth.
The thing about generic interfaces is that they tend to return generic errors: basically saying that something built around the interface went wrong - go look there instead. That is great for developers but confusing for users.
The grid can be a scary thing to use. It is complicated. It will go wrong. If we want it to be less scary, we need to learn how to go wrong, better.
Tuesday, 21 September 2010
An extra special reminder that the Call for Poster Abstracts for the NGS Innovation Forum closes this Friday. If you or anyone you know would like to submit a poster abstract to this event please make sure that you submit your 200 word contribution to the NGS website by 5pm on Friday (24th).
Remember that there will be a prize for the best poster as voted for by the delegates and all abstracts will be peer reviewed by the Innovation Forum Programme Committee. We would like to encourage all users to submit a poster and to attend the event in order to hear about the latest developments and tools from the NGS and also to leave with the knowledge of how to apply these tools in their research. Delegates are welcome to attend for one or both days of the event.
Registration for the event is also open now and further details are available on the event page on the NGS website.
Monday, 20 September 2010
However before I had to leave AHM I did attend some interesting sessions on "Sharing, Collaboration and Interfaces for e-Research" featuring "BlogMyData" which Jason has already mentioned.
I have to thank Andrew Richards, Director of the NGS, for giving my presentation in the "Enhancing Community Intelligence for e-Science" workshop which I unfortunately couldn't attend due to being on a plane to Amsterdam! The organiser Alex Voss had asked me to report on some of the statistics that the NGS collects in its usual day-to-day running. This includes the data from the user application forms, the NGS member sites and much more. The invitation to present on these stats was very timely as we have recently released a new service to users on the NGS homepage.
We have made a selection of statisitics publicly available on the NGS website including statistics by research area, institution, NGS usage over time, funding sources and information sources and more. The link can be found on the right hand side of the home page underneath the latest poll.
Meanwhile at the EGI conference in Amsterdam I met up with the EGI dissemination team for the first time as well other dissemination people from several other NGI's. It's amazing no matter how far apart the countries, we all have the same problems and challenges in getting the word out there about grid computing and hunting down those user success stories! Watch out for more user case studies from all over Europe!
Wednesday, 15 September 2010
His talk was about the science that can be done when you simply have too much data to store or process and your first task is working out which bits you need to throw away.
Among the many interesting points he made was that, by Amdahl's Law, modern computers are unbalanced if they are used for data-driven research.
CPUs are fast. Modern multi-core CPUs can crunch numbers at extra-ordinary rates. But we gain very little from this if we can't feed them the numbers as fast as they can crunch them.
At best, the numbers have to be loaded from memory and, on the timescales at which a computer works, memory access is slow. At worst, they come from disk and disk access is much, much slower.
Modern CPUs hide the generally sluggishness of memory by keeping data that has been used or may soon be used within a small but very fast caches. As a block of data is transferred from the main memory to cache, nearby blocks are copied too.
Disks and operating systems use a similar approach - whenever a user requests a block of data to be transferred from disk to memory, the blocks that follow it are transferred too.
You only get the advantage of the memory caches and disk access if you are reading the data in one big long stream.
Prof. Szalay likened this to watching the results of a laboratory experiment as it runs. He described computer systems - which he called Data Scopes - designed so that the speed at which data can be accessed is as near as possible to the speed at which it can be processed. You carefully layout your data in the 'scope and just let its comparatively low powered CPUs crunch away.
It is very different from the current approach to High Performance Computing. He ended his talk with a quote attributed to Henry Ford - an example of why 'more of the same' is not always an option:
If I had asked people what they wanted, they would have said faster horses.
Tuesday, 14 September 2010
As usual it is necessary to clone oneself for the parallel sessions, keep track of all the discussions, keep the todo list up to date, sniff around for new things, while simultaneously keeping up with email back home, other projects ticking along at home, documents, proposals, reports, arrangements. The mental equivalent of an octopus. All in a day's work. Hey ho.
One project that particularly caught my eye was BlogMyData.
Much academic research takes place in corridors, pubs and even - occasionally - in the loo. Researchers will discuss their latest discoveries with colleagues when they bump into one another on the way to somewhere else - and get a new perspective or a new idea as they do so. Call it serendipity at work - or possibly serendipity in the bar of the Dog and Duck.
This is the kind of material that occasionally appears as `Bloggs, Fred (Personal Communcation)' in papers.
BlogMyData extends this chatter about the work to researchers who are in different institutions and so - unless they happen to be at All Hands - are very unlikely to be in the same coffee room, or pub, at the same time.
It allows researchers to post visualisations of the data they are working on blogs which can be read - and commented on - by collaborators. It combines two projects: the Godiva 2 visualisation package from Reading and the LabBlog blogging tool from platform.
This has the big advantage that the data and the conversation will be recorded for future reference unlike, say, a chat in the bar or an unexpected encounter in the gents...
The stand was put up last night and our first demo was held this morning. A big thanks to Jonathan Churchill whose demo of the UI/WMS managed to pull in a substantial crowd despite a quiet start to the event as people continue arrive during the morning.
Jonathan will be doing a follow up demo this lunchtime on the "gLite WMS Enabled NGS Applications Portal" so pop by our stand with your lunch if you are in Cardiff!
The conference proper kicks off this afternoon with the first themes and workshops. I'll be going to the "Sharing, collaboration and interfaces for e-Research" which looks as though it will have some interesting presentations about user tools.
Saturday, 11 September 2010
In the case is a contraption built from wicker-work, string and cogs and lights and bits of old gramophone. Every so often, it springs to life and whirls around and plays a tune.
It is not an Yorkshire-based competitor for the iPod but a sculpture by Rowland Emmet: "The Featherstone-Kite Openwork Basketweave Mark Two Gentleman’s Flying Machine". Leeds shoppers passing by look at it and think...
What on earth is THAT meant to do?Which is a rather tortuous way of introducing the latest bit of R+D work.
We are investigating how to move the important features of the existing INCA monitoring service to a new monitoring service based on WLCG Nagios.
But - as has been said many times - grids are complicated. Which means that the software needed to monitor grids is complicated. Which means that when you start to look at the software, you spend a lot of time staring at a screen and thinking...
What on earth is THAT meant to do?
So after a week of staring and thinking, here is what we think the bits and pieces of the service are meant to do:
At the core sits Nagios: an open-source monitoring system familiar to many system administrators. It consists of a set of programs called 'plugins' and a scheduler that arranges for these plugins to be run.
A plugin tests if a particular service on a given host is working as expected. Plugins typically return a short message and a status code that means one of: 'OK', 'WARNING', 'CRITICAL' or - if the plugin broke - 'UNKNOWN'. They can also track performance data such as disk usage.
Nagios comes with a set of basic plugins. WLCG Nagios adds a whole raft of Grid specific ones.
In this documentation, plugins within WLCG are referred to as probes.
Next up, a 'configuration generator' called NCG takes data published about a site or set of sites and generates a configuration for Nagios that monitors them.
Statistics and performance metrics generated by the plugins/probes are collected and are delivered via a message bus to a service that stuffs them into a database. A tool called MyEGEE is used to visualise the contents of this database.
If you want to know more...
Staff from STFC and Oxford gave an NGS surgery on WLCG Nagios in late July this year. Their slides describing how WLCG Nagios can be configured and how it has been deployed can be found on the NGS web site.
There is more technical information on twiki.cern.ch in the GridMonitoringNcgOverview and GridMonitoringNcgYaim pages. More information about the plugins/probes can be found on SAMProbesMetrics.
If you want to know more about the NGS R+D activity, we will on on hand at All Hands next week.
Tuesday, 7 September 2010
Martins presentation will look at what is necessary at the national level to support CLARIN, a European infrastructure providing services relating to language resources and tools to researchers in the Humanities and Social Sciences.
Continuing with the European theme we are also pleased to announce that Steven Newhouse, Director of EGI, will also be presenting at the event. Steven will be reflecting on the experiences of the first 6 months and provide an overivew for the plans that have now been established for future years.
Registration for the Innovation Forum is now open with full details available on the event page on the NGS website.
Friday, 3 September 2010
We are proudly showing off our cloud; our HARC acceptors are now accepting to an acceptable level; the new User Interfaces are on display in the Innovation section of the NGS web site; we have come up with a saner way to manage access to licensed software and the ngs-vo-tool to manage virtual organisation configuration has been released.
There will be more on HARC in a future posting. The rest have already been well and truly blogged.
This posting is about the what we are going to do with all our newly available free time in the final 7 months of the third phase of the NGS.
We are here to do the dull-but-useful bits and the dull-but-useful things we are concentrating on are those needed as the NGS and GridPP join forces to form the UK National Grid Initiative (UK NGI) within the European Grid Initiative...
- Incorporating the important features of the NGS's current 'INCA' monitoring service into a NGI one based on WLCG Nagios.
- Allowing the data collected by the NGS accounting service to be interchanged with that collected by the APEL (Accounting Processor for Event Logs) accounting service.
Yes... you wait ages for a bus and then three turn up at once.
A message bus is simply a way getting a lump of data from A to B. The important bit is that your application does not need to know how to get from A to B - it just needs to hop on the bus at A and hop off again at B.
Finding a reliable route from A to B becomes somebody else's problem.
WLCG Nagios uses its message bus to pass the results of tests to a central monitoring service. Technical details can be found on https://twiki.cern.ch/twiki/bin/view/EGEE/UseLocalActiveMQForMessaging.
The latest APEL client sends records wrapped in messages rather than attempting immediate database updates. Again there are more technical details available from http://goc.grid.sinica.edu.tw/gocwiki/ApelHome.
In both cases, they are using the message bus to avoid information being lost when there is a backlog of data to be processed. As users of the UK public transport service are all too aware, buses are very good at waiting in traffic.
Thursday, 2 September 2010
NGS tool demos
1. Transcriptome Analysis using the NGS User Interface /Workload Management System (UI/WMS) – Jonathan Churchill, NGS, STFC RAL
The UI/WMS is a tool which allows users to easily submit jobs to the whole of the NGS relying on the WMS to chose which NGS resources to use for their jobs. Use of the UI/WMS will be demonstrated with a user case study in which analysis time of mRNA was decreased from a month to less than 12 hours.
2. Accessing the NGS using the Application Hosting Environment (AHE) – Stefan Zasada, UCL
An overview of how access to the NGS can be simplified using the Application Hosting Environment, a lightweight application portal system.
3. Using the HERMES data management tool – David Wallom, NGS, University of Oxford
Here we will show how easy it is to install and connect into various NGS resources to move data between them, your home institution and your desktop.
4. The NGS from the CCP4 desktop – Matteo Turilli, NGS, University of Oxford
The NGS R&D theme have been working to build access to the NGS into the desktop tools that researchers use on a day-to-day basis, in this presentation we look at the example of CCP4: Software for Macromolecular X-Ray Crystallography.
We also have a presentation from the Director of the NGS -
The future of the NGS – Neil Geddes, NGS Director, STFC RAL
This presentation will look at the focus of activities for the NGS for the coming 2-3 years and possible longer term opportunities.
Remember that registration for the event is now open and that the call for poster abstracts closes on the 10th of September!