Daily Bulletin Archive

May. 7, 2018

The Geyser and Caldera clusters will be unavailable from noon  to 1 p.m. MDT today, May 7, to allow CISL system administrators to perform maintenance on the Slurm job scheduler.

Users will be unable to log in to either cluster during the maintenance period and new job submissions will not be possible. No interruptions are expected to existing login sessions or batch jobs that are already running or queued for execution.

We apologize for any inconvenience this might cause. Users will be informed via the CISL Notifier service when the systems are returned to service.

May. 1, 2018

The recommended way to set up your Cheyenne user environment–or in some cases, environments–is to load the desired modules after logging in or to create customized environments as described here.

Some users recently reported problems that resulted from loading environment modules with their personalized start files (.bashrc, .cshrc, .kshrc, .tcshrc, .login, .profile, and so on) instead of the recommended procedures.

As advised in our Personalizing start files documentation, Cheyenne users should not set environment modules in their start files by using commands such as:

  • module load nco

  • module load netcdf

Please contact the CISL Consulting Services Group if you have questions.

May. 1, 2018

Batch jobs running on the Cheyenne systems sometimes fail when they create large stdout or stderr files that overflow the spool directory on the first compute node used. Failures from this condition are more likely with MPI jobs. To avoid the problem, CISL recommends redirecting job output to a file as described in this newly updated documentation regarding job scripts.

Apr. 30, 2018

No downtime: Cheyenne, GLADE, Geyser_Caldera and HPSS

Apr. 24, 2018

Cheyenne system administrators will perform preventative maintenance procedures on the PBS job scheduler today, Tuesday, April 24, beginning at noon MDT. The maintenance is expected to take approximately one hour to complete.

Most PBS commands will not work during the maintenance outage, including qstat and qsub, and new job submissions will not be possible.

No batch jobs are expected to be lost as a result of the maintenance. Jobs that are executing when the maintenance begins will continue to run without interruption. Jobs that are queued for execution or in a hold state will remain in those states until PBS is returned to service. Access to Cheyenne’s login nodes will not be interrupted.

Users will be notified when PBS is returned to service.

Apr. 23, 2018

Cheyenne system administrators will perform preventative maintenance procedures on the PBS job scheduler tomorrow, Tuesday, April 24, beginning at noon MDT. The maintenance is expected to take approximately one hour to complete.

Most PBS commands will not work during the maintenance outage, including qstat and qsub, and new job submissions will not be possible.

No batch jobs are expected to be lost as a result of the maintenance. Jobs that are executing when the maintenance begins will continue to run without interruption. Jobs that are queued for execution or in a hold state will remain in those states until PBS is returned to service. Access to Cheyenne’s login nodes will not be interrupted.

Users will be notified when PBS is returned to service.

Apr. 17, 2018

HPSS will be down on Tuesday, April 17 from 7:00 am to 3:00 pm for system testing.

 No downtime: Cheyenne, GLADE, Geyser_Caldera

Apr. 17, 2018

CISL will reactivate the purge policy for the GLADE scratch file space today, April 17. The purge policy was turned off during last month’s problems with the /glade/p file space to provide users an alternative storage option.

The purge policy data-retention limit is currently set at 60 days and uses two time and date factors: a file’s creation date and its last access date.

Files that were created more than 60 days ago and have not been accessed for more than 60 days will be deleted. CISL monitors scratch space usage carefully and reserves the right to decrease the 60-day limit as usage increases. Users will be informed of any change to the purge policy.

 

GLADE scratch space is for temporary, short-term use and not intended for long-term storage needs.

Apr. 16, 2018

The location of the Research Data Archive (RDA) on NCAR’s GLADE file system has changed.  NCAR users are advised to access RDA data from the new production location at: /glade2/collections/rda/data.

Note that the previous location of RDA data, /glade/p/rda/data, has been moved to /glade/p/rda/data_old but is no longer being maintained and will be purged later this year.

Please contact rdahelp@ucar.edu with any questions or concerns.

Apr. 13, 2018

A semi-annual Mesa Lab building maintenance power-down scheduled for Saturday, April 14, should have little impact on university users of CISL’s high-end resources. Some Boulder-based UCAR/NCAR staff will be unable to log in to the Cheyenne system or other services with their authentication tokens, but sessions that start before the power-down will not be affected.

The power-down should otherwise not affect the Cheyenne, Geyser, and Caldera clusters, the GLADE system, or HPSS, which will remain in service at the NCAR-Wyoming Supercomputing Center (NWSC) in Cheyenne. The maintenance work is scheduled to begin at 6 a.m. and conclude by 6 p.m.

Some HPC support services and the HPSS disaster recovery resources that are housed at the Mesa Lab will be unavailable during the power-down. The affected services include the license servers  for Mathematica and the PGI compilers, the CISL website, the ExtraView help desk ticketing system, and SAM accounting system. The license server supporting MATLAB users on Cheyenne will not be affected.

Users who have urgent help requests during this time should call 303-497-2400 or 307-996-4300 to reach the NWSC operations center.

Pages