The Daily Bulletin

April 25, 2019

The CISL User Services Section will present an in-person and online tutorial at 9:30 a.m. MDT on Friday, May 24, for new users of NCAR’s Cheyenne high-performance computing (HPC) system and the Casper data analysis and visualization cluster. The 90-minute tutorial is intended for individuals who are either new to HPC or unfamiliar with the Cheyenne user environment.

Topics will include:

  • Overview of compute and storage resources

  • Using software and building applications

  • Scheduling jobs on the batch resources

  • Workflow recommendations and best practices

Register at one of these links to attend in person or online:

April 24, 2019

The HPSS and HPSS disaster recovery systems will be down from 8 a.m. to 2 p.m. MDT on Thursday, April 25, in support of the major facilities work that is under way at the NCAR-Wyoming Supercomputing Center in Cheyenne. We apologize for the late notice.

April 24, 2019

The GLADE scratch file space is a temporary space for data that will be analyzed and removed within a short amount of time. It is also the recommended space for temporary files that would otherwise reside in small /tmp or /var/tmp directories that many users share. See Storing temporary files with TMPDIR for more information.

See this CISL page for more recommended best practices.

April 22, 2019

CISL system administrators will update each node in the Casper cluster beginning Tuesday, April 23, to install the latest version of the NVIDIA drivers and CUDA 10.1. To minimize the impact on users, several nodes will be updated each day, leaving most nodes available throughout the week.

The updates are expected to take up to two hours each day. Nodes will be unavailable during the update process according to the following schedule:

  • Tuesday – casper08-09, casper23-25, casper27-28

  • Wednesday – casper02-07

  • Thursday – casper10-15

  • Friday – casper16-22

April 11, 2019

CISL is pleased to announce a significant change to previously announced plans for the May 6-11 HPC systems downtime. CISL system administrators and NWSC engineers have determined it will be possible to maintain UPS power to all of Cheyenne’s login nodes, the Casper cluster, GLADE, and the HPSS system throughout the electrical repair efforts, so those will remain in service. However, Cheyenne’s compute nodes will be powered down and unavailable for use.

The May repairs will follow several weeks of facilities work that will be carried out without powering down any of the HPC systems.

A major operating system update to the Cheyenne system also is being planned and will require an extended downtime, most likely in late June or early July. Details will be announced in the Daily Bulletin when the dates are set.

The May 6-11 outage will be followed by an additional several weeks of facilities maintenance that can be performed without powering down the systems and so no user impact is anticipated. Information on scheduled outages is available on the CISL HPC calendar.