Daily Bulletin Archive

July 3, 2019

The Cheyenne system was returned to production late Tuesday evening following completion of the operating system update and system verification. As noted in previous communications, users are advised to to rebuild their executables and thoroughly test all scripts. Many system libraries changed in the new version of the OS, which is SUSE Linux Enterprise Server (SLES) Service Pack 4. Executables built before the upgrade are likely to fail.

Other significant changes were made to the module environment during the upgrade, and users can now manage their Campaign Storage data holdings with POSIX commands. See these new Daily Bulletin items for details:

Please report any suspected issues with the new user environment as soon as possible to cislhelp@ucar.edu. Thank you all for your patience and cooperation throughout this extended outage.

July 3, 2019

The entire collection of environment modules has been reconstructed as part of Cheyenne’s operating system upgrade. Recent versions of commonly used software libraries are available for most compiler and MPI combinations. Additionally, recent releases of popular analysis software like Python, MATLAB, IDL, and R have been installed.

The default set of modules is now ncarenv, intel/18.0.5, ncarcompilers, mpt/2.19, and netcdf/4.6.3. Multiple versions of the Intel and GCC compiler are available, as is a PGI offering. Two MPI libraries are installed for each compiler.

Old modules that were built with system libraries from the previous OS have been archived and are no longer loadable. We apologize for any inconvenience this may cause, but it was necessary to prevent unexpected and/or broken behavior under the new OS version.

July 3, 2019

Users can now execute familiar POSIX commands to manage their data holdings in the Campaign Storage file system by logging in to CISL’s data-access nodes. Previously, Campaign Storage files could be accessed only using Globus. 

The Campaign Storage file system is mounted on the data-access nodes as /glade/campaign to enable users to manage file and directory permissions and to facilitate transfers of small files to and from GLADE spaces such as /glade/scratch and /glade/work. CISL still recommends using Globus for all other data transfers for its reliability, robustness, performance, and ability to validate the correctness of transfers.

As part of the new capability, CISL has removed world read, write and execute permissions on all project-level directories, i.e. the directories directly beneath the NCAR Lab and university level,  to help protect them from unintended access. Contact cislhelp@ucar.edu to re-open permissions on the directories that you have the authority to do so.

The data-access nodes are intended for data transfers and lightweight tasks such as editing files. Tasks deemed to be consuming excessive resources on the nodes will be killed at the discretion of CISL system administrators.

July 2, 2019

CISL's HPC system administrators, Consulting Services Group, and HPE engineers resolved several significant issues on Monday towards completing Cheyenne's operating system upgrade. A full suite of system tests were executed overnight and the test results are being analyzed this morning. If they were successful, the system will be rebooted later this morning. Following the reboot, the system tests will be repeated for added confidence in the system's health. 

A firm ETA is not yet available, but if all goes well Cheyenne could be returned to service by midday. Users will be apprised of any significant updates through CISL’s Notifier service, which was restored Monday afternoon.

July 1, 2019

Problems encountered on Sunday while rebooting some of the Cheyenne system’s compute nodes have delayed returning the system to users as early as planned after the operating system upgrade. Cheyenne will not be returned to service this morning.

CISL HPC system administrators and HPE engineers are working to resolve the issue as soon as possible and have escalated it to the highest severity level with HPE. We do not have an ETA for returning the system at this point but will notify users when more information is available.

Unrelated issues with the CISL Notifier service prevented updates from being issued over the weekend. Thank you for your patience and understanding.

July 1, 2019

Work continued Thursday on Cheyenne’s operating system upgrade and several login nodes were made available to CISL’s Consulting Services Group to begin rebuilding the system’s software stack.  Today, CISL HPC system administrators and HPE engineers will focus on rebuilding the system’s internal communications network.

The system is expected to be returned to service on Monday, July 1. Users will be advised through the Notifier service of any significant changes to the schedule.

While Cheyenne is unavailable, users can log in directly to Casper to run data analysis and visualization jobs on that cluster or to access the GLADE file system and HPSS.

June 26, 2019

Cheyenne’s operating system upgrade efforts continued through Tuesday. CISL’s HPC system administrators and HPE engineers addressed several hardware, firmware, and software issues that were discovered when power was restored to the system. Work resumed this morning and the system is still expected to be returned to service on Monday, July 1. 

While Cheyenne is unavailable, users can log in directly to Casper to run data analysis and visualization jobs on that cluster or to access the GLADE file system and HPSS.

June 24, 2019
CORRECTION: Casper will be available this week
Cheyenne is scheduled to be down and unavailable starting Monday June 24 at 6:00 AM until July 1 for an operating system upgrade.
HPSS will be down on Tuesday morning from 8:00 am until 10:00 am for routine database maintenance.
No Downtime for GLADE, Casper, or Campaign store
June 20, 2019

A major Cheyenne operating system (OS) update is scheduled to begin Monday, June 24, and expected to be completed by Monday, July 1. The Cheyenne cluster will be unavailable during the update, including the system’s login nodes and all cron services.

Users will still be able to log in directly to Casper to run jobs on that cluster or access the GLADE file system and HPSS.

As announced previously:

  • The Cheyenne OS will be updated from SUSE Linux Enterprise Server (SLES) Service Pack 1 to Service Pack 4. This is to bring the system up to current security and support levels and is expected to be the last operating system upgrade in Cheyenne’s lifetime.

  • Some changes also will be made to the Cheyenne module environment to better support multiple compiler and MPI configurations while providing a more robust and easier to maintain user environment. Details here.

  • Most users’ programs and executables will need to be rebuilt following the update, as many system libraries will change.

  • Users should test their job scripts thoroughly after Cheyenne is returned to service.

The routine monthly maintenance times that were scheduled for July 2, August 6, and September 3 have been canceled.

June 19, 2019

CISL will begin phasing in enforcement of the purge policy for files in the /glade/p file space later this year. The project space was released to users on July 1, 2018, with a stated purge policy of 12 months from file create date. With that in place, files would have been purged as soon as July 1 of this year, but users now will have more time to adjust.

Once it is fully implemented, the project space purge policy will remove files 12 months after their last access date. On October 1, 2019, files will be purged if they have not been accessed in 18 months rather than 12 months. On the first Tuesday of each subsequent month, the retention period will be shortened by one month until the 12-month limit is fully implemented in April 2020.

The detailed schedule is provided below. Users will be notified well in advance of any changes to this schedule.

Retention period implementation dates

  • 18 months – October 1, 2019

  • 17 months – November 5, 2019

  • 16 months – December 3, 2019

  • 15 months – January 7, 2020

  • 14 months – February 4, 2020

  • 13 months – March 3, 2020

  • 12 months – April 7, 2020

CISL will deploy data management tools this summer that will include utilities to help users identify files that are nearing the purge limits.