Daily Bulletin Archive

July 2, 2019

CISL's HPC system administrators, Consulting Services Group, and HPE engineers resolved several significant issues on Monday towards completing Cheyenne's operating system upgrade. A full suite of system tests were executed overnight and the test results are being analyzed this morning. If they were successful, the system will be rebooted later this morning. Following the reboot, the system tests will be repeated for added confidence in the system's health. 

A firm ETA is not yet available, but if all goes well Cheyenne could be returned to service by midday. Users will be apprised of any significant updates through CISL’s Notifier service, which was restored Monday afternoon.

July 1, 2019

Problems encountered on Sunday while rebooting some of the Cheyenne system’s compute nodes have delayed returning the system to users as early as planned after the operating system upgrade. Cheyenne will not be returned to service this morning.

CISL HPC system administrators and HPE engineers are working to resolve the issue as soon as possible and have escalated it to the highest severity level with HPE. We do not have an ETA for returning the system at this point but will notify users when more information is available.

Unrelated issues with the CISL Notifier service prevented updates from being issued over the weekend. Thank you for your patience and understanding.

July 1, 2019

Work continued Thursday on Cheyenne’s operating system upgrade and several login nodes were made available to CISL’s Consulting Services Group to begin rebuilding the system’s software stack.  Today, CISL HPC system administrators and HPE engineers will focus on rebuilding the system’s internal communications network.

The system is expected to be returned to service on Monday, July 1. Users will be advised through the Notifier service of any significant changes to the schedule.

While Cheyenne is unavailable, users can log in directly to Casper to run data analysis and visualization jobs on that cluster or to access the GLADE file system and HPSS.

June 26, 2019

Cheyenne’s operating system upgrade efforts continued through Tuesday. CISL’s HPC system administrators and HPE engineers addressed several hardware, firmware, and software issues that were discovered when power was restored to the system. Work resumed this morning and the system is still expected to be returned to service on Monday, July 1. 

While Cheyenne is unavailable, users can log in directly to Casper to run data analysis and visualization jobs on that cluster or to access the GLADE file system and HPSS.

 
June 24, 2019
CORRECTION: Casper will be available this week
 
Cheyenne is scheduled to be down and unavailable starting Monday June 24 at 6:00 AM until July 1 for an operating system upgrade.
HPSS will be down on Tuesday morning from 8:00 am until 10:00 am for routine database maintenance.
No Downtime for GLADE, Casper, or Campaign store
 
 
 
June 20, 2019

A major Cheyenne operating system (OS) update is scheduled to begin Monday, June 24, and expected to be completed by Monday, July 1. The Cheyenne cluster will be unavailable during the update, including the system’s login nodes and all cron services.

Users will still be able to log in directly to Casper to run jobs on that cluster or access the GLADE file system and HPSS.

As announced previously:

  • The Cheyenne OS will be updated from SUSE Linux Enterprise Server (SLES) Service Pack 1 to Service Pack 4. This is to bring the system up to current security and support levels and is expected to be the last operating system upgrade in Cheyenne’s lifetime.

  • Some changes also will be made to the Cheyenne module environment to better support multiple compiler and MPI configurations while providing a more robust and easier to maintain user environment. Details here.

  • Most users’ programs and executables will need to be rebuilt following the update, as many system libraries will change.

  • Users should test their job scripts thoroughly after Cheyenne is returned to service.

The routine monthly maintenance times that were scheduled for July 2, August 6, and September 3 have been canceled.

June 19, 2019

CISL will begin phasing in enforcement of the purge policy for files in the /glade/p file space later this year. The project space was released to users on July 1, 2018, with a stated purge policy of 12 months from file create date. With that in place, files would have been purged as soon as July 1 of this year, but users now will have more time to adjust.

Once it is fully implemented, the project space purge policy will remove files 12 months after their last access date. On October 1, 2019, files will be purged if they have not been accessed in 18 months rather than 12 months. On the first Tuesday of each subsequent month, the retention period will be shortened by one month until the 12-month limit is fully implemented in April 2020.

The detailed schedule is provided below. Users will be notified well in advance of any changes to this schedule.

Retention period implementation dates

  • 18 months – October 1, 2019

  • 17 months – November 5, 2019

  • 16 months – December 3, 2019

  • 15 months – January 7, 2020

  • 14 months – February 4, 2020

  • 13 months – March 3, 2020

  • 12 months – April 7, 2020

CISL will deploy data management tools this summer that will include utilities to help users identify files that are nearing the purge limits.

June 19, 2019

Users are reminded that they can now log in directly to casper.ucar.edu. The new Casper login nodes are now permanently available to users,  subject to normal maintenance and outage activities, as an alternative way to access the Casper system, GLADE, and HPSS.  

Cheyenne’s login nodes will be unavailable throughout the system’s operating system upgrade outage, June 24 - July 1.  Logging in to casper.ucar.edu will be the only way to access Casper, GLADE, and HPSS throughout the extended outage.

June 18, 2019

No scheduled downtime: Cheyenne, Casper, Campaign Storage, GLADE and HPSS.

June 14, 2019

The Cheyenne module environment will undergo a number of significant changes following the system’s operating system update the week of June 24 - July 1. The planned changes are designed to better support multiple compiler and MPI configurations while providing a more robust and easier to maintain user environment.

The supported compiler and MPI versions will change when Cheyenne is restored to service. All versions of the Intel 2016 compiler will be removed. Versions 2017, 2018 (which will be the new default), and 2019 will remain available. The latest GNU 7, 8, and 9 compilers and PGI 19 will be installed as well.

MPT 2.19 will remain as Cheyenne’s default MPI and it will be available for all the above supported compilers. Intel MPI (module name impi) will be available for each of the supported Intel compilers and OpenMPI 3.1.4 will be available for the supported PGI and GNU compilers.

Further details on the module changes will be provided during the week of June 24.

Pages