Daily Bulletin Archive

August 12, 2019

Cheyenne continues to experience system stability issues introduced after the recent software upgrade, and vendors’ Level-3 support and CISL staff continue to aggressively work the problems. This afternoon, following the CESM tutorial, CISL staff will test some vendor-recommended changes on Cheyenne. Following those tests, we plan to make Cheyenne available to users over the weekend and into next week.

During a CESM polar tutorial next week with approximately 20 onsite attendees, we will repeat the procedure of halting all standard queues; running jobs will finish and other submissions will remain queued until the tutorial hands-on session is finished for the day. This process will be invoked for roughly two hours on Monday morning, Tuesday afternoon, and Wednesday afternoon. Casper will remain fully available to all users throughout the week.

 

August 12, 2019

Scheduled downtime for HPSS  from Aug 13th 3 p.m. - Aug 14th 8:00 a.m.

No scheduled downtime: Cheyenne, Casper, Campaign Storage, GLADE.

August 6, 2019

Cheyenne continues to experience system stability issues, and CISL staff are engaging vendors to troubleshoot the problems. Meanwhile, to ensure that this week's CESM tutorial succeeds for the 80 on-site attendees, CISL will be limiting access to the Cheyenne compute nodes to only tutorial attendees during the hands-on sessions on Tuesday, Thursday, and Friday afternoons. All standard queues will be halted; running jobs will finish and other submissions will remain queued until the tutorial is finished for the day. Casper will remain fully available to all users throughout the week.

CISL is escalating support from HPE and Mellanox to resolve the system issues after the CESM tutorial ends on Friday. More details on the vendors' plans will be provided as soon as they are finalized.

 

August 6, 2019

Registration is open for Optimized Modern Fortran, an August 16 workshop led by Alessandro Fanfarillo, NCAR Research Applications Laboratory, to help participants make their Fortran codes run more efficiently through vectorization and other techniques.

When: 9 a.m. to noon, 1 to 3 p.m. Friday, August 16

Where: Room 3131, Center Green campus (CG1), Boulder

Participants will get a detailed, practical explanation of how to obtain high performance from modern Fortran codes, with a particular focus on how to exploit the hardware instructions provided by modern processors. Prerequisite: Basic knowledge of Fortran 90 constructs, such as array syntax and allocation, recursion, modules, and intrinsic, elemental and pure functions.

Participants are encouraged to bring their own codes and laptop computers. Lunch will be provided. Some travel funding is available. See the Optimized Modern Fortran Workshop web page for details and registration.

 

August 5, 2019

No scheduled downtime: Cheyenne, Casper, Campaign Storage, GLADE and HPSS.

August 2, 2019

Cheyenne users continue to report frequent batch job failures with error messages containing “MPT: Launcher network accept (MPI_LAUNCH_TIMEOUT) timed out.” Determining the root cause and resolving the issue as soon as possible is the highest-priority issue for CISL’s HPC engineers, HPE, and Mellanox. Watch for status updates in the CISL Daily Bulletin and through the Notifier service.

 

August 1, 2019

CISL will begin enforcing the purge policy for files in the /glade/p file space on October 1 as described in this Daily Bulletin article. Starting on that date, files will be purged if they have not been accessed in 18 months. On the first Tuesday of each subsequent month, the retention period will be shortened by one month until the 12-month limit is fully implemented in April 2020. 

The detailed schedule is provided below. Users will be notified well in advance of any changes to this schedule. 

Retention period implementation dates

  • 18 months – October 1, 2019

  • 17 months – November 5, 2019

  • 16 months – December 3, 2019

  • 15 months – January 7, 2020

  • 14 months – February 4, 2020

  • 13 months – March 3, 2020

  • 12 months – April 7, 2020

CISL will deploy data management tools this summer that will include utilities to help users identify files that are nearing the purge limits.

 

July 30, 2019

The Campaign Storage file system will be unavailable today from 10 a.m. to approximately 1 p.m. MDT to allow CISL storage engineers to perform required hardware maintenance. Active Globus tasks will be paused and then resumed when Campaign Storage is back online.

July 29, 2019

Scheduled downtime for Campaign Store on July 30 10 a.m. - 1:00 p.m.

No downtime for Cheyenne, Casper, GLADE or HPSS

July 26, 2019

More than 22 million abandoned files will be deleted from the High Performance Storage System (HPSS) on Tuesday, October 1. HPSS files are considered abandoned when the project and/or file owner's user ID have been inactive for at least 12 months and CISL did not receive responses from either the files’ owners or the project leads after multiple subsequent notifications. More than 460 users have been notified of the pending deletions. 

Information on the abandoned data holdings, by project and user, is available here.

Pages