Extended HPC and storage systems downtime May 6-11 for NWSC electrical work

April 11, 2019

The following has been superseded by an update published on April 11.

Major electrical repair work at the NCAR-Wyoming Supercomputing Center will require an extended downtime for the Cheyenne, Casper, Campaign Storage, GLADE, and HPSS systems. The work scheduled for Monday, May 6, through Saturday, May 11, will follow several weeks of facilities work that can be done without powering down those systems.

The May work includes replacing one of the 24,900-volt switches supplying power to the NWSC facility, which suffered a catastrophic failure in December 2017. A spare switch that was on-site has been in service since then as the root cause of the explosion was identified and plans made to prevent similar failures in the future. Preventive maintenance will be performed on three additional switches. All systems will be brought down in the final days of the facilities work to prevent damage or data loss as the new switch is integrated into the infrastructure.

The repairs will require contributions from many outside contractors and have been coordinated by CISL’s on-site engineering staff to minimize the duration of the work.

A major operating system update to the Cheyenne system also is being planned and will require an extended downtime, most likely in late June or early July. Details will be announced in the Daily Bulletin when the dates are set.

Note that the May 6-11 outage will be followed by an additional several weeks of facilities maintenance that can be performed without powering down the systems and so no user impact is anticipated. The routine maintenance downtime that was scheduled for April 2 has been canceled. Information on scheduled outages is available on the CISL HPC calendar.