Daily Bulletin Archive

August 31, 2018

9/4/2017 - HPSS downtime: Tuesday, Sept. 4th, from 09:30 to 12:30 MDT

No downtime: Cheyenne, GLADE, Geyser_Caldera

August 31, 2018

8/30/18 - OpenMPI 3.1.2 is now available on the Geyser and Caldera clusters and will become the default version of OpenMPI on those systems on Monday, September 10. Until then users can access the new version by explicitly loading its module by executing this command:

module load openmpi/3.1.2

OpenMPI 3.1.2 addresses a number of important bugs and has been built to support CUDA on all data analysis and visualization nodes, including the new Casper cluster being prepared for release late this summer.

August 28, 2018

Correction 8/29/18 - GLADE users who have not already copied the files they need from /glade/scratch_old to the new, larger /glade/scratch space have until October 2 to do so following the recent changes to those spaces.

CISL recommends using rsync -av (or cp -rp) rather than Globus for copying data between GLADE spaces. This is because Globus does not preserve symbolic links that are common in working directories, and it does not create symbolic links on destination endpoints. Globus also does not preserve a file’s executable status.

To create an exact copy of /glade/scratch_old/$USER in the new /glade/scratch using rsync, execute the following commands:

cd  /glade/scratch_old/$USER
rsync -av  .  /glade/scratch/$USER

This web page shows how to run these commands in a batch script.

The /glade/scratch_old space is read-only, so users cannot delete files. The space is scheduled to be removed from the system on October 2.

The previous version of this item included a syntax error.

August 28, 2018

8/23/18 - Reminder: Changes to the GLADE scratch file system became effective during last week’s maintenance outage, as announced previously in the Daily Bulletin.

The file space that was named /glade/scratch before August 21 was moved to /glade/scratch_old and is now read-only. All files that were in /glade/scratch before August 21 can still be accessed in /glade/scratch_old.  No user files were deleted when the directory was renamed. The purge policy for files in /glade/scratch_old is 30 days and the space will be removed from the system on October 2.

The new and larger scratch file space that was named /glade/scratch_new before August 21 was renamed /glade/scratch. Users’ files were not copied from the old scratch space to the new scratch space. Therefore, active files that still remain in users’ old scratch spaces will need to be copied to their new scratch space for ongoing and longer-term access. Use the familiar Linux “cp” command for this or, alternatively, the more versatile “rsync” function.

Please see File system status and data storage archives for a quick but comprehensive overview of the status of the GLADE and archive resources.

August 28, 2018

8/27/18 - HPSS downtime: Tuesday, August 28th, from 07:30 am to 09:30 am.

No downtime: Cheyenne, GLADE, Geyser_Caldera

August 23, 2018

7/31/2018 - Some of the changes to the GLADE project and work spaces that were announced in July will take place on Tuesday, October 2, as part of the migration to CISL’s new storage architecture and user environment.

The /glade/p_old/ space will be made read-only. This means it will continue to be read-write two months longer than previously planned. It will be decommissioned December 31. (These and other scheduled updates to storage systems have been published in table format here.)

Users are asked to:

  • Migrate any files they still have on /glade/p_old/ or /glade/p_old/work to one of the new storage systems as soon as possible. CISL recommends moving active project data to /glade/p/<entity>/<project_code> where entity can be univ, uwyo, cesm, mmm, nsc, or other designated NCAR lab or special program.

  • Move project data that is not active but needs to be preserved to the Campaign Storage archive. Users access and manage their Campaign Storage files with Globus services.

  • Move files they need from their individual /glade/p_old/work/ directories to the new /glade/work.

  • Delete files from /glade/p_old/ and /glade/p_old/work once their transfers are complete and validated.   

Contact cislhelp@ucar.edu with questions or for help moving their files.

August 21, 2018

08/21/18 - As part of its regular monitoring of Cheyenne's health and performance, CISL has become aware of a significant recent increase in calls to the PBS qstat command. As reported in the Daily Bulletin earlier this year, excessive qstat calls create unnecessary demands on the PBS server that contribute to poor overall system performance and job failures.

Some users have unknowingly created excessive qstat queries by using “watch qstat.” The Linux watch command’s default update interval is two seconds, creating 30 qstat calls every minute. Since the PBS scheduling cycle updates about once a minute, this creates many unnecessary qstat calls that provide no useful information and adversely affect overall system performance. Please refer to the watch command’s man pages for information on how to set its update interval. CISL recommends a frequency of no more than once every 60 seconds.

Users can help reduce demands on the system by adopting the following changes wherever possible:

  • Use “qstat <jobid>” instead of just “qstat”

  • Avoid using “qstat -f -x”

  • Limit the number and frequency of qstat commands.

CISL thanks all users for their cooperation. Please contact cislhelp@ucar.edu if you have any questions or would like help or advice on this matter.

August 21, 2018

8/21/2018 - The Cheyenne, Geyser, and Caldera clusters and the GLADE file system will be unavailable today starting at 6 a.m. MDT to allow CISL staff to update key system software components. The downtime is expected to last until approximately 6 p.m. but every effort will be made to return the systems to service as soon as possible. The updates will include the changes to GLADE’s scratch file spaces described in this earlier Daily Bulletin item.

System reservations will prevent batch jobs from executing after 6 a.m. All batch queues will be suspended and the clusters’ login nodes will be unavailable throughout the outage period. All interactive processes that are still executing when the outage begins will be terminated.

CISL will inform users through the Notifier service when all of the systems are restored.

August 21, 2018

08/15/18 - The Cheyenne, Geyser, and Caldera clusters and the GLADE file system will be unavailable on Tuesday, August 21, starting at 6 a.m. MDT to allow CISL staff to update key system software components. The downtime is expected to last until approximately 6 p.m. but every effort will be made to return the system to service as soon as possible. The updates will include the changes to GLADE’s scratch file spaces described in today’s Daily Bulletin.

A system reservation will prevent batch jobs from executing after 6 a.m. All batch queues will be suspended and the clusters’ login nodes will be unavailable throughout the maintenance period. All batch jobs and interactive processes that are still executing when the outage begins will be killed.

CISL will inform users through the Notifier service when all of the systems are restored.

August 20, 2018

08/20/18 - Some users have reported an increase in the number of emails received from the PBS scheduler after their Cheyenne jobs run. Often the jobs ran successfully but the body of the emails have the form:

          PBS Job Id: <JobID>.chadmin1

          Job Name: job_name

          Post job file processing error; job <JobID>.chadmin1 on host rXiYnZ

CISL has identified the primary cause of the increase in emails. Recent changes to the GLADE file system created several high-level symbolic links such as /glade/p -> /gpfs/fs1/p. PBS was not configured to correctly handle those links, which triggered many of the false errors. System administrators have made the necessary adjustments to PBS and they will be activated during maintenance system downtime on Tuesday.

Pages