How do you build and install a supercomputer?

Posted by Mark Parsons on 13 October 2020

Professor Mark Parsons, Director of EPCC at the University of Edinburgh explains the complexities of launching the UK’s next national research supercomputer

Over the past 12 months, ever since I learnt that Cray had been selected as the hardware provider for the new ARCHER2 supercomputer, I’ve been eagerly anticipating the day when we would open the service to users. Over this time a lot has happened, Cray have been bought by Hewlett Packard Enterprise (HPE), and we’ve all had to cope with complexities of the Covid-19 pandemic. However, the good news for UK scientists is that the first part of the new national HPC service will shortly be opened to users.

In his blog post in July 2020, Professor Simon McIntosh-Smith explained what a supercomputer is, what the current national supercomputer, ARCHER, has achieved and what we expect ARCHER2 to be capable of. With the imminent start-up of the ARCHER2 service I wanted to use this blog post to explain where we are and where we hope to get to over the next few months.

A long, complicated but successful summer

 
Fitting of the cooling infrastructure between the mountain
cabinets and the CDU (Credit: Mark Parsons: EPCC)

Building and installing a supercomputer is never an easy task – particularly a system as large and capable as ARCHER2. The move to a new underlying operating system for ARCHER2 and the COVID-19 pandemic has delayed its installation but in mid-July the first four of the final 23 cabinets arrived at our data centre, the ACF, in Edinburgh. Because of the installation delays, UKRI’s EPSRC asked us to keep the ARCHER service running throughout this year. Over the past six weeks we’ve been installing and configuring the initial ARCHER2 system in a separate room to the current ARCHER service ready to finally begin the retirement of the current service.

An autumn transition to ARCHER2

The initial 4-cabinet ARCHER2 system is now up and running with HPE and EPCC staff working hard to configure it for the first users in the next few weeks. Although the system only represents 131,072 cores (the current ARCHER system has 118,080), each of these cores is at least 1.5 times more powerful. So, although ARCHER2 will open with around the same number of cores as ARCHER, I hope that users immediately see a difference in the quantity of scientific results the system will be able to produce.

 
The 4 cabinet Shasta Mountain system, the first phase of
the 23 cabinet system (Credit: Mark Parsons: EPCC)

Once the 4-cabinet system is providing a service to users we’ll move into a very busy period. ARCHER will be turned off for the last time and dismantled. The final power and cooling preparations for the full system will be completed. We’ve had to lay 107 new power cables for example! The system will then be installed by HPE and gradually brought to life. We hope to do this in a way that doesn’t involve any long period of downtime although there will of course be some interruptions to service.

The future is almost here

I always find this period of building a new service really exciting. These are very large complex computing systems and not everything will work first time – I think of our supercomputers like Formula 1 cars – they need a pit crew just to get them started – but once they’re going, they’re phenomenally powerful.

Over the winter, we hope that the full ARCHER 2 system will be brought into service and start delivering all the scientific benefits we’ve been looking forward to.

Author

In the following table, contact information relevant to the page. The first column is for visual reference only. Data is in the right column.

Name: Mark Parsons
Job title: Director of EPCC at the University of Edinburgh and Director of Research Computing at EPSRC
Organisation: EPSRC