Daily Bulletin Archive

May. 8, 2018

The Cheyenne cluster will be unavailable from 8 a.m. to 6 p.m. MDT today, May 8, to allow CISL staff and HPE engineers to perform hardware maintenance and address several known issues with the PBS job scheduler.

Users will be unable to log in during the maintenance period or submit new jobs. Running jobs that have not finished when maintenance begins will continue executing to completion. Jobs that have been submitted and are queued for execution will be dispatched by PBS as they normally would be.

Users will be informed via the CISL Notifier service when Cheyenne  is returned to service.

May. 8, 2018

HPSS downtime: Tuesday, May 8th, 7:30 a.m. to 3:00 p.m. for a library management system upgrade

DAV maintenance: Monday, May 7th, 12:00 p.m. to 1:00 p.m.

Cheyenne planned maintenance: Tuesday, May 8th, 8:00 a.m. to 6:00 p.m.

No downtime: GLADE

May. 7, 2018

The Geyser and Caldera clusters will be unavailable from noon  to 1 p.m. MDT today, May 7, to allow CISL system administrators to perform maintenance on the Slurm job scheduler.

Users will be unable to log in to either cluster during the maintenance period and new job submissions will not be possible. No interruptions are expected to existing login sessions or batch jobs that are already running or queued for execution.

We apologize for any inconvenience this might cause. Users will be informed via the CISL Notifier service when the systems are returned to service.

May. 1, 2018

Batch jobs running on the Cheyenne systems sometimes fail when they create large stdout or stderr files that overflow the spool directory on the first compute node used. Failures from this condition are more likely with MPI jobs. To avoid the problem, CISL recommends redirecting job output to a file as described in this newly updated documentation regarding job scripts.

May. 1, 2018

The recommended way to set up your Cheyenne user environment–or in some cases, environments–is to load the desired modules after logging in or to create customized environments as described here.

Some users recently reported problems that resulted from loading environment modules with their personalized start files (.bashrc, .cshrc, .kshrc, .tcshrc, .login, .profile, and so on) instead of the recommended procedures.

As advised in our Personalizing start files documentation, Cheyenne users should not set environment modules in their start files by using commands such as:

  • module load nco

  • module load netcdf

Please contact the CISL Consulting Services Group if you have questions.

Apr. 30, 2018

No downtime: Cheyenne, GLADE, Geyser_Caldera and HPSS

Apr. 24, 2018

Cheyenne system administrators will perform preventative maintenance procedures on the PBS job scheduler today, Tuesday, April 24, beginning at noon MDT. The maintenance is expected to take approximately one hour to complete.

Most PBS commands will not work during the maintenance outage, including qstat and qsub, and new job submissions will not be possible.

No batch jobs are expected to be lost as a result of the maintenance. Jobs that are executing when the maintenance begins will continue to run without interruption. Jobs that are queued for execution or in a hold state will remain in those states until PBS is returned to service. Access to Cheyenne’s login nodes will not be interrupted.

Users will be notified when PBS is returned to service.

Apr. 23, 2018

Cheyenne system administrators will perform preventative maintenance procedures on the PBS job scheduler tomorrow, Tuesday, April 24, beginning at noon MDT. The maintenance is expected to take approximately one hour to complete.

Most PBS commands will not work during the maintenance outage, including qstat and qsub, and new job submissions will not be possible.

No batch jobs are expected to be lost as a result of the maintenance. Jobs that are executing when the maintenance begins will continue to run without interruption. Jobs that are queued for execution or in a hold state will remain in those states until PBS is returned to service. Access to Cheyenne’s login nodes will not be interrupted.

Users will be notified when PBS is returned to service.

Apr. 17, 2018

CISL will reactivate the purge policy for the GLADE scratch file space today, April 17. The purge policy was turned off during last month’s problems with the /glade/p file space to provide users an alternative storage option.

The purge policy data-retention limit is currently set at 60 days and uses two time and date factors: a file’s creation date and its last access date.

Files that were created more than 60 days ago and have not been accessed for more than 60 days will be deleted. CISL monitors scratch space usage carefully and reserves the right to decrease the 60-day limit as usage increases. Users will be informed of any change to the purge policy.

 

GLADE scratch space is for temporary, short-term use and not intended for long-term storage needs.

Apr. 17, 2018

HPSS will be down on Tuesday, April 17 from 7:00 am to 3:00 pm for system testing.

 No downtime: Cheyenne, GLADE, Geyser_Caldera

Pages