The Daily Bulletin

March 21, 2019

Major electrical repair work at the NCAR-Wyoming Supercomputing Center will require an extended downtime for the Cheyenne, Casper, Campaign Storage, GLADE, and HPSS systems. The work scheduled for Monday, May 6, through Saturday, May 11, will follow several weeks of facilities work that can be done without powering down those systems.

The May work includes replacing one of the 24,900-volt switches supplying power to the NWSC facility, which suffered a catastrophic failure in December 2017. A spare switch that was on-site has been in service since then as the root cause of the explosion was identified and plans made to prevent similar failures in the future. Preventive maintenance will be performed on three additional switches. All systems will be brought down in the final days of the facilities work to prevent damage or data loss as the new switch is integrated into the infrastructure.

The repairs will require contributions from many outside contractors and have been coordinated by CISL’s on-site engineering staff to minimize the duration of the work.

A major operating system update to the Cheyenne system also is being planned and will require an extended downtime, most likely in late June or early July. Details will be announced in the Daily Bulletin when the dates are set.

Note that the May 6-11 outage will be followed by an additional several weeks of facilities maintenance that can be performed without powering down the systems and so no user impact is anticipated. The routine maintenance downtime that was scheduled for April 2 has been canceled. Information on scheduled outages is available on the CISL HPC calendar.

March 18, 2019

GLADE users occasionally need to share files with others who have GLADE access but who aren’t in the same UNIX group. Rather than asking CISL to create a special group in such a case, consider using access control lists (ACLs) to provide the necessary permissions.

ACLs are tools for controlling access to files and directories outside of traditional UNIX permissions. The UNIX permissions remain in effect, but users can create ACLs to facilitate short-term file sharing as needed. In the Cheyenne/GLADE environment, the most common use cases are:

  • Sharing files among users in different NCAR labs or universities.

  • Sharing files with short-term visitors, interns, students, or others during a short project period.

See Using access control lists for examples of how to create ACLs to allow other individuals and groups to work with your files, how to propagate permissions to new files and directories, and how to remove ACLs when they are no longer needed.

 

March 15, 2019

Cheyenne’s default MPI library is now MPT 2.19, which is the version that HPE recommends and supports. Versions 2.15 and 2.16 are no longer compatible with system firmware and have been removed from the system. To mitigate failures from existing scripts and job workflows, the mpt/2.15 and mpt/2.16 modules still exist, but they now point to the MPT 2.19 library and issue a message prompting users to upgrade. The mpt/2.15 and mpt/2.16 modules will be deleted later this year. MPT 2.18 is still available on Cheyenne but is no longer supported by HPE.

The parallel libraries netcdf-mpi and pnetcdf using MPT 2.19 are available for the following, supported versions of the Intel compiler: 16.0.3, 17.0.1 (the default), 18.0.5, and 19.0.2. The libraries have also been built for GCC versions 6.3.0, 7.3.0, and 8.1.0, and for PGI 17.9.

Users should update their scripts and recompile executables to use MPT 2.19 as soon as possible.

March 12, 2019

Registration is open for a MATLAB class that CISL is hosting at 9 a.m. on Thursday, March 28, in Boulder. A MathWorks application engineer will present Build and Execute Parallel Applications in MATLAB in the Small Seminar Room, Foothills Lab 2 (FL2-1001).

Class description

In this session we show how to program parallel applications in MATLAB. We introduce high-level programming constructs to easily create parallel applications without low-level programming and show how to offload processor-intensive tasks on a computing resource of your choice – multicore computers, GPUs, or larger resources such as HPC clusters and cloud computing services.

Learning objectives:

  • Program parallel applications in MATLAB

  • Analyze big data sets and solve large scale problems

  • Run parallel applications interactively and as batch jobs

  • Employ multicore processors and GPUs to speed up your computations

  • Off-loading processor-intensive tasks to clusters and cloud computing services

Use this link to register and attend in person. The class will not be recorded or available online.