The Daily Bulletin

April 18, 2019

Cheyenne users should examine their job scripts and startup files for instances in which the environment variable MPI_SHEPHERD is set to the value “1” or “true.” That variable should be set in only two situations: when running MPT peak_memusage jobs and command file jobs.

Setting the variable to “1” or “true” in other situations can interfere with the job's process binding, causing it to slow considerably or hang. While the following error message refers to MPI_SHEPHERD, it almost always results from other, unrelated issues:

MPT ERROR: could not run executable. If this is a non-MPT application, you may need to set MPI_SHEPHERD=true.

Please contact CISL’s Consulting Services Group or for help resolving the problem if you receive that message.

April 11, 2019

CISL is pleased to announce a significant change to previously announced plans for the May 6-11 HPC systems downtime. CISL system administrators and NWSC engineers have determined it will be possible to maintain UPS power to all of Cheyenne’s login nodes, the Casper cluster, GLADE, and the HPSS system throughout the electrical repair efforts, so those will remain in service. However, Cheyenne’s compute nodes will be powered down and unavailable for use.

The May repairs will follow several weeks of facilities work that will be carried out without powering down any of the HPC systems.

A major operating system update to the Cheyenne system also is being planned and will require an extended downtime, most likely in late June or early July. Details will be announced in the Daily Bulletin when the dates are set.

The May 6-11 outage will be followed by an additional several weeks of facilities maintenance that can be performed without powering down the systems and so no user impact is anticipated. Information on scheduled outages is available on the CISL HPC calendar.