Network hardware failure caused last night’s system issues

January 23, 2019

CISL has determined that a UCAR enterprise ethernet network hardware failure was the root cause of last night’s problems on Cheyenne. The network problem caused Cheyenne to lose communications with GLADE and caused a significant number of failed jobs and poor system performance.

CISL system and storage administrators implemented a workaround to restore Cheyenne-GLADE communications and no further unscheduled interruptions are expected. It may necessary to schedule a brief outage in the near future to implement a more permanent repair. Users will be notified well in advance if such an outage is scheduled.