Daily Bulletin Archive

August 21, 2018

8/21/2018 - The Cheyenne, Geyser, and Caldera clusters and the GLADE file system will be unavailable today starting at 6 a.m. MDT to allow CISL staff to update key system software components. The downtime is expected to last until approximately 6 p.m. but every effort will be made to return the systems to service as soon as possible. The updates will include the changes to GLADE’s scratch file spaces described in this earlier Daily Bulletin item.

System reservations will prevent batch jobs from executing after 6 a.m. All batch queues will be suspended and the clusters’ login nodes will be unavailable throughout the outage period. All interactive processes that are still executing when the outage begins will be terminated.

CISL will inform users through the Notifier service when all of the systems are restored.

August 21, 2018

08/15/18 - The Cheyenne, Geyser, and Caldera clusters and the GLADE file system will be unavailable on Tuesday, August 21, starting at 6 a.m. MDT to allow CISL staff to update key system software components. The downtime is expected to last until approximately 6 p.m. but every effort will be made to return the system to service as soon as possible. The updates will include the changes to GLADE’s scratch file spaces described in today’s Daily Bulletin.

A system reservation will prevent batch jobs from executing after 6 a.m. All batch queues will be suspended and the clusters’ login nodes will be unavailable throughout the maintenance period. All batch jobs and interactive processes that are still executing when the outage begins will be killed.

CISL will inform users through the Notifier service when all of the systems are restored.

August 20, 2018

08/20/18 - Some users have reported an increase in the number of emails received from the PBS scheduler after their Cheyenne jobs run. Often the jobs ran successfully but the body of the emails have the form:

          PBS Job Id: <JobID>.chadmin1

          Job Name: job_name

          Post job file processing error; job <JobID>.chadmin1 on host rXiYnZ

CISL has identified the primary cause of the increase in emails. Recent changes to the GLADE file system created several high-level symbolic links such as /glade/p -> /gpfs/fs1/p. PBS was not configured to correctly handle those links, which triggered many of the false errors. System administrators have made the necessary adjustments to PBS and they will be activated during maintenance system downtime on Tuesday.

August 20, 2018

08/20/18 - Scheduled downtime: Tuesday, August 21 6:00 a.m. - 6:00 p.m. Cheyenne, GLADE, Geyser_Caldera, HPSS

August 17, 2018

8/20/18 - Registration is now open for an NCAR/CISL series of four one-day workshops on Modern Fortran beginning Tuesday, September 11. Dan Nagle, CISL Consulting Services Group software engineer and a member of the U.S. Fortran Standards Technical Committee, will provide the training at the NCAR Mesa Lab’s Fleischmann Building (Walter Orr Roberts room) in Boulder.

Participants are encouraged to bring their own laptop computers with recent releases of gfortran, mpich, and opencoarrays. Each workshop will begin at 9 a.m. and end at 4 p.m. with an hour break at noon.

  • Scalar Fortran - Tuesday, Sept. 11: Scope, definition, scalar declarations and usage, and interacting with the processor.

  • Vector Fortran - Tuesday, Sept. 18: Arrays, storage order, elemental operations, and array intrinsics.

  • Object-Oriented Fortran - Wednesday, Sept. 26: Derived types, defined operations, defined assignment, and inheritance.

  • Parallel Fortran - Tuesday, Oct. 2: Coarray concepts, declarations, and usage; synchronization and treating failed images.

Use this form to register to attend one or more workshops. The workshops will not be webcast or recorded.

August 15, 2018

Updated 8/15/2018 - A number of the previously announced changes to the GLADE file system are scheduled for Tuesday, August 21. First, /glade/scratch will be moved to /glade/scratch_old and become read-only. The purge policy for files in that space will be 30 days. The /glade/scratch_old space will be removed from the system on October 2.

Also, the new and larger /glade/scratch_new (implemented on July 10) will be renamed to /glade/scratch. The 60-day purge policy for this space will remain in place.

To prepare for the renaming of the file spaces, users are encouraged to begin using the new scratch space as soon as possible and where it is practical to do so.  For example, files remaining on the old scratch space should be moved to the new scratch space.

DateChangeNotes
Aug 21/glade/scratch -> /glade/scratch_oldBecomes read-only
30-day purge policy
Aug 21/glade/scratch_new -> /glade/scratchRetains 60-day purge policy
Oct 2/glade/scratch_old will be removed 
August 14, 2018

8/13/2018 - HPSS downtime: Tuesday, August 14th 7:00 a.m. - 11:00 a.m.

No downtime: Cheyenne, GLADE, Geyser_Caldera

August 14, 2018

8/2/2018 - CISL is now accepting large-scale allocation requests from university-based researchers for the 5.34-petaflops Cheyenne cluster. Submissions are due September 11.

The fall opportunity will include allocations on some new supporting resources. The Casper analysis and visualization cluster will soon enter production to replace the Geyser/Caldera clusters. University projects should request long-term space on the Campaign Storage resource instead of HPSS. Unlike HPSS, Campaign Storage has no default minimum amount; users are asked to justify the amount requested. Scrutiny of the justification will increase with the size of the request.

Researchers are encouraged to review the allocation instructions before preparing their requests. The CISL HPC Allocations Panel (CHAP) is applying increased scrutiny to data management plans and justifications for storage requests. For instructions and information regarding available resources, see the CHAP page: https://www2.cisl.ucar.edu/chap

At the fall meeting, CISL will allocate 180 million core-hours on Cheyenne, up to 2 PB of Campaign Storage space, and up to 500 TB of GLADE project space. Large allocations on Cheyenne are those requesting more than 400,000 core-hours. CISL accepts requests from university researchers for these large-scale allocations every six months.

Please contact cislhelp@ucar.edu if you have any questions.

August 13, 2018

8/9/2018 - The Cheyenne system’s share queue is operating with far fewer nodes available to it than normal. CISL is exploring a number of solutions with a priority of restoring the queue as soon as possible and minimizing disruptions to users. The time frame for resolving the issue is not yet known.

Until the issue is resolved users will experience a significant backlog of jobs submitted to the share queue. If job turnaround time in the share queue becomes untenable, users are advised to submit their jobs to one of Cheyenne’s other queues, such as the regular queue, as an interim workaround. Note that jobs that run in the non-shared queues are charged for full use of the nodes and therefore use more core-hours, but those jobs will likely execute sooner than in the share queue in its present state.

August 3, 2018

08/06/2018 - No downtime: Cheyenne, GLADE, Geyser_Caldera and HPSS

Pages