Daily Bulletin Archive

April 30, 2019

CISL’s system administrators have re-engineered a pair of Casper nodes so users can log in directly to casper.ucar.edu. These new Casper login nodes will remain available indefinitely,  subject to normal maintenance and outage activities, as redundant access to the rest of the Casper system, GLADE, and HPSS in the event that Cheyenne login nodes become unavailable. Users are encouraged to try this method to access Casper and provide feedback to cislhelp@ucar.edu.

April 29, 2019

No scheduled downtime: Cheyenne, Casper, Campaign Storage, GLADE and HPSS

April 26, 2019

Cheyenne and Casper default libraries for MATLAB, R, NCAR Command Language (NCL), and the NCAR Python Package Library (NPL) will be updated to their latest versions on Monday, May 6.

  • The MATLAB default version will change from R2016b to R2019a

  • The R default version will change from 3.4.0 to 3.5.2

  • The NCL default version will change from 6.5.0 to 6.6.2

  • The NPL default version will change from version 20190118 to 20190326. This introduces JupyterLab support and, on Casper, adds machine learning packages including TensorFlow and PyTorch.

April 24, 2019

The HPSS and HPSS disaster recovery systems will be down from 8 a.m. to 2 p.m. MDT on Thursday, April 25, in support of the major facilities work that is under way at the NCAR-Wyoming Supercomputing Center in Cheyenne. We apologize for the late notice.

April 24, 2019

The GLADE scratch file space is a temporary space for data that will be analyzed and removed within a short amount of time. It is also the recommended space for temporary files that would otherwise reside in small /tmp or /var/tmp directories that many users share. See Storing temporary files with TMPDIR for more information.

See this CISL page for more recommended best practices.

April 22, 2019

CISL system administrators will update each node in the Casper cluster beginning Tuesday, April 23, to install the latest version of the NVIDIA drivers and CUDA 10.1. To minimize the impact on users, several nodes will be updated each day, leaving most nodes available throughout the week.

The updates are expected to take up to two hours each day. Nodes will be unavailable during the update process according to the following schedule:

  • Tuesday – casper08-09, casper23-25, casper27-28

  • Wednesday – casper02-07

  • Thursday – casper10-15

  • Friday – casper16-22

April 22, 2019

30 minute outage for the Globus, Data Access, and Slurm HPSS queue services on 4/23 @ 12pm in order to reboot the nodes to clear up some hung processes.

Rolling maintenance on Casper for Nvidia driver updates.  No user impact expected.

No downtime for Cheyenne or GLADE

April 18, 2019

Cheyenne users should examine their job scripts and startup files for instances in which the environment variable MPI_SHEPHERD is set to the value “1” or “true.” That variable should be set in only two situations: when running MPT peak_memusage jobs and command file jobs.

Setting the variable to “1” or “true” in other situations can interfere with the job's process binding, causing it to slow considerably or hang. While the following error message refers to MPI_SHEPHERD, it almost always results from other, unrelated issues:

MPT ERROR: could not run executable. If this is a non-MPT application, you may need to set MPI_SHEPHERD=true.

Please contact CISL’s Consulting Services Group or cislhelp@ucar.edu for help resolving the problem if you receive that message.

April 15, 2019

No scheduled downtime: Cheyenne, Casper, Campaign Storage, HPSS, and GLADE

April 15, 2019

The CISL website, the Systems Accounting Manager, Notifier service, ExtraView helpdesk ticketing system, and some other support services may be unavailable intermittently. Thank you for your patience as we work to resolve some network issues.

Pages