Tuesday, 18 August 2009

NGS Central Service Update

Following the recent issues with aircon in the main machine room at RAL, the majority of services have now been returned to full service. Full details of which services are on the main NGS website in the News section. The main services now back on are the NGS database services at RAL, which not only returns to service a number of user hosted databases but also a number of key central services such as user account management. In many ways, the distributed nature of the NGS has shows the benefits of having a distributed e-infrastructure in the UK, as many users have not been affected at all by the recent RAL outtage. However, it does highlight that even with many services distributed and replicated, certain key services remain in one place. NGS Operations is now undertaking a full review of what we class as 'critical' and in what way those services should or can be made more resilient.

To compound the issues, a recent linux security vulnerability is meaning that RAL is choosing to keep current compute services, i.e. services that users can access directly, offline. Tests across a number of systems show a mixed picture between which kernels and which O/S are vulnerable to the current vulnerability. Until patched kernels are available or until a confident work around is available STFC RAL is choosing to stay in downtime, though other sites are making their own local security decisions, based on advise from EGEE, as to what to do.

