Daily Bulletin Archive

May 10, 2019

NCAR HPC system users are reminded of the scheduled downtime for Cheyenne’s compute nodes Monday, May 6, through Saturday, May 11, while extensive electrical repairs take place at the NCAR-Wyoming Supercomputing Center. Cheyenne’s login nodes, the Casper cluster, and GLADE will remain available on UPS power. HPSS is scheduled to be down briefly for electrical recabling on Monday, May 6, from 7 a.m. to 1 p.m. MDT but otherwise is expected to be available during the week.

A major Cheyenne operating system update also is being planned and will require an extended downtime, most likely in late June or early July. Details will be announced in the Daily Bulletin when the dates are set.

May 9, 2019

Intel software engineers will conduct a half-day training session titled “Intel Developer Tools” on Wednesday, May 22, from 1 to 4:30 p.m. MDT. The training will be held at the VisLab (ML4), NCAR Mesa Lab, in Boulder and is open to all UCAR and NCAR employees and external collaborators. To attend, please register at one of the links below.

Topics to be covered include:

  • Intel Compilers

  • Intel Distribution for Python (MKL optimized ML packages: numpy/scipy/sklearn; Intel ML lib DAAL; MKLDNN optimized DL frameworks)

  • Intel Performance Libraries

  • Intel VTune

  • Intel Advisor: (Flow Graph Analyzer; Roof Line Analysis; Platform Profiler)

  • Intel Inspector

  • Intel MPI

  • Intel Trace Analyzer

Register to attend in person or attend online by selecting one of these links:

May 7, 2019

The CISL User Services Section will present an in-person and online tutorial at 9:30 a.m. MDT on Friday, May 24, for new users of NCAR’s Cheyenne high-performance computing (HPC) system and the Casper data analysis and visualization cluster. The 90-minute tutorial is intended for individuals who are either new to HPC or unfamiliar with the Cheyenne user environment.

Topics will include:

  • Overview of compute and storage resources

  • Using software and building applications

  • Scheduling jobs on the batch resources

  • Workflow recommendations and best practices

Register at one of these links to attend in person or online:

April 30, 2019

CISL’s system administrators have re-engineered a pair of Casper nodes so users can log in directly to casper.ucar.edu. These new Casper login nodes will remain available indefinitely,  subject to normal maintenance and outage activities, as redundant access to the rest of the Casper system, GLADE, and HPSS in the event that Cheyenne login nodes become unavailable. Users are encouraged to try this method to access Casper and provide feedback to cislhelp@ucar.edu.

April 29, 2019

No scheduled downtime: Cheyenne, Casper, Campaign Storage, GLADE and HPSS

April 26, 2019

Cheyenne and Casper default libraries for MATLAB, R, NCAR Command Language (NCL), and the NCAR Python Package Library (NPL) will be updated to their latest versions on Monday, May 6.

  • The MATLAB default version will change from R2016b to R2019a

  • The R default version will change from 3.4.0 to 3.5.2

  • The NCL default version will change from 6.5.0 to 6.6.2

  • The NPL default version will change from version 20190118 to 20190326. This introduces JupyterLab support and, on Casper, adds machine learning packages including TensorFlow and PyTorch.

April 24, 2019

The HPSS and HPSS disaster recovery systems will be down from 8 a.m. to 2 p.m. MDT on Thursday, April 25, in support of the major facilities work that is under way at the NCAR-Wyoming Supercomputing Center in Cheyenne. We apologize for the late notice.

April 24, 2019

The GLADE scratch file space is a temporary space for data that will be analyzed and removed within a short amount of time. It is also the recommended space for temporary files that would otherwise reside in small /tmp or /var/tmp directories that many users share. See Storing temporary files with TMPDIR for more information.

See this CISL page for more recommended best practices.

April 22, 2019

CISL system administrators will update each node in the Casper cluster beginning Tuesday, April 23, to install the latest version of the NVIDIA drivers and CUDA 10.1. To minimize the impact on users, several nodes will be updated each day, leaving most nodes available throughout the week.

The updates are expected to take up to two hours each day. Nodes will be unavailable during the update process according to the following schedule:

  • Tuesday – casper08-09, casper23-25, casper27-28

  • Wednesday – casper02-07

  • Thursday – casper10-15

  • Friday – casper16-22

April 22, 2019

30 minute outage for the Globus, Data Access, and Slurm HPSS queue services on 4/23 @ 12pm in order to reboot the nodes to clear up some hung processes.

Rolling maintenance on Casper for Nvidia driver updates.  No user impact expected.

No downtime for Cheyenne or GLADE

Pages