Daily Bulletin Archive

February 5, 2019

Cheyenne: Tuesday, noon to 1 p.m. (details)

No scheduled downtime: Casper, Campaign Storage, GLADE and HPSS

February 1, 2019

The Casper cluster will be expanded soon with the addition of two nodes. The new nodes are similar to the two existing Supermicro nodes with eight NVIDIA Tesla V100 GPUs. They will support ongoing and future machine learning and deep learning efforts.

The new nodes have been received and are being installed by CISL staff at the NCAR-Wyoming Supercomputing Center. When the installation is complete, CISL system administrators and software engineers will begin acceptance testing, which is expected to take several weeks. More details will be published when the new nodes are ready for users.

 

January 30, 2019

CISL system administrators will update the PBS workload management server at noon MST on Tuesday, February 5. The update is expected to take less than 60 minutes to complete. The new version of PBS, 18.2.3, provides performance and stability improvements and a number of important bug fixes.

Most PBS commands, including qstat, will not work during the update, and new Cheyenne job submissions will not be possible. Jobs that are executing when the maintenance begins will continue to run without interruption. Jobs that are queued for execution or in a hold state will remain in those states until PBS is returned to service. Access to Cheyenne’s login nodes will not be interrupted.

January 29, 2019

No scheduled downtime: Cheyenne, Casper, Campaign Storage, GLADE and HPSS

January 28, 2019

CISL plans to roll out a new Jira Service Desk system with an integrated Confluence Knowledge Base to help HPC users, CISL staff, and others quickly find the solutions or assistance they need. The new system is expected to be ready in February and will replace the ExtraView ticketing system that has been in place for most of the past decade.

Service Desk features a friendlier user interface, simplified request forms, and a knowledge base of articles to answer common questions. Users will also be able to log in to track the status of their in-progress tickets.

UCAR/NCAR personnel already have the CIT passwords that are required to log in to Jira Service Desk, as do users who have Duo two-factor authentication rather than YubiKey tokens. To get a CIT password, call 303-497-2400 for assistance.

More information on implementation of the new service desk will be available soon.

 

January 25, 2019

The maintenance operations on NCAR’s HPC systems that were scheduled for Tuesday, February 5, have been canceled. To minimize inconvenience to users, the work that was scheduled for that day will be combined with other system maintenance on Tuesday, March 5. More details on the March 5 outage will be published in the Daily Bulletin next month.

 

January 23, 2019

HPSS: Thursday, from 07:30 to 11:00 a.m.

No scheduled downtime: Cheyenne, Casper, Campaign Storage, GLADE

 

January 23, 2019

CISL has determined that a UCAR enterprise ethernet network hardware failure was the root cause of last night’s problems on Cheyenne. The network problem caused Cheyenne to lose communications with GLADE and caused a significant number of failed jobs and poor system performance.

CISL system and storage administrators implemented a workaround to restore Cheyenne-GLADE communications and no further unscheduled interruptions are expected. It may necessary to schedule a brief outage in the near future to implement a more permanent repair. Users will be notified well in advance if such an outage is scheduled.

January 22, 2019

CISL is now accepting large-scale allocation requests from university-based researchers for the 5.34-petaflops Cheyenne supercomputer and the Casper data analysis and visualization cluster. Submissions are due March 5. Researchers are encouraged to review these allocation instructions before preparing their requests.

In addition to requesting computing allocations, university projects should request long-term space on the NCAR Campaign Storage resource instead of HPSS. Unlike HPSS, Campaign Storage has no default minimum amount; users are asked to justify the amount requested. The CISL HPC Allocations Panel (CHAP) is applying increased scrutiny to data management plans and storage requests.  

At the spring meeting, CISL will allocate up to 275 million core-hours on Cheyenne, up to 2 PB of Campaign Storage space, and up to 200 TB of GLADE project space. Large allocations on Cheyenne are those requesting more than 400,000 core-hours. CISL accepts requests from university researchers for these large-scale allocations every six months. Please contact cislhelp@ucar.edu if you have any questions.

 

January 17, 2019

A video recording and slides from the January 14 NCAR/CISL tutorial for new Cheyenne supercomputer users have been added to the CISL Course Library. The 50-minute Introduction to Cheyenne tutorial covers basic usage and typical user workflows. Topics discussed include:

  • The Cheyenne computing environment

  • Accessing software, including compilers and MPI libraries

  • Submitting batch jobs using the PBS scheduler

Pages