Urgent Cheyenne outage today, Thursday, Sept 13

September 13, 2018

Updated 9/13/18 - A reminder to users that Cheyenne will be unavailable for most of today, Thursday, September 13. The outage began shortly after 7:00 am MDT and is expected to last approximately 12 hours but every effort will be made to restore the system as soon as possible.  This outage is necessary to replace two InfiniBand switches in the system’s hypercube fabric that were identified as a major contributing cause of Cheyenne’s worsening job failure rate.

 

The Geyser and Caldera clusters and the GLADE file system are not expected to be directly impacted by the switch replacement work.  Jobs running on Geyser and Caldera will continue without interruption but new job submissions and logins will not be possible while Cheyenne’s login nodes are unavailable.  Every effort will be made to restore Cheyenne’s login nodes to users as early as possible.

 

CISL apologizes for the disruption this outage will cause for many users. Users should also be aware that the October 2 maintenance outage is still planned as scheduled. More information on that outage will be published in the Daily Bulletin beginning early next week.