Daily Bulletin Archive

January 30, 2019

CISL system administrators will update the PBS workload management server at noon MST on Tuesday, February 5. The update is expected to take less than 60 minutes to complete. The new version of PBS, 18.2.3, provides performance and stability improvements and a number of important bug fixes.

Most PBS commands, including qstat, will not work during the update, and new Cheyenne job submissions will not be possible. Jobs that are executing when the maintenance begins will continue to run without interruption. Jobs that are queued for execution or in a hold state will remain in those states until PBS is returned to service. Access to Cheyenne’s login nodes will not be interrupted.

January 30, 2019

Does your institution have lots of researchers and educators who want to use advanced computing but need some help learning how? Consider attending the free ACI-REF Virtual Residency Summer Workshop on Research Computing Facilitation to learn to be more effective at helping researchers and educators use Research Cyberinfrastructure (CI).

This intermediate-level workshop is June 2-7 on the University of Oklahoma Norman campus. Participants can attend either in person or remotely by videoconference. You do not need to have completed an introductory workshop to attend. Use this link to register or contact Henry Neeman for more information.for additional information.

Virtual Residency workshops have served 250 institutions in all 50 U.S. states, two U.S. territories, and seven other countries.

January 29, 2019

No scheduled downtime: Cheyenne, Casper, Campaign Storage, GLADE and HPSS

January 28, 2019

CISL plans to roll out a new Jira Service Desk system with an integrated Confluence Knowledge Base to help HPC users, CISL staff, and others quickly find the solutions or assistance they need. The new system is expected to be ready in February and will replace the ExtraView ticketing system that has been in place for most of the past decade.

Service Desk features a friendlier user interface, simplified request forms, and a knowledge base of articles to answer common questions. Users will also be able to log in to track the status of their in-progress tickets.

UCAR/NCAR personnel already have the CIT passwords that are required to log in to Jira Service Desk, as do users who have Duo two-factor authentication rather than YubiKey tokens. To get a CIT password, call 303-497-2400 for assistance.

More information on implementation of the new service desk will be available soon.

 

January 25, 2019

The maintenance operations on NCAR’s HPC systems that were scheduled for Tuesday, February 5, have been canceled. To minimize inconvenience to users, the work that was scheduled for that day will be combined with other system maintenance on Tuesday, March 5. More details on the March 5 outage will be published in the Daily Bulletin next month.

 

January 23, 2019

HPSS: Thursday, from 07:30 to 11:00 a.m.

No scheduled downtime: Cheyenne, Casper, Campaign Storage, GLADE

 

January 23, 2019

CISL has determined that a UCAR enterprise ethernet network hardware failure was the root cause of last night’s problems on Cheyenne. The network problem caused Cheyenne to lose communications with GLADE and caused a significant number of failed jobs and poor system performance.

CISL system and storage administrators implemented a workaround to restore Cheyenne-GLADE communications and no further unscheduled interruptions are expected. It may necessary to schedule a brief outage in the near future to implement a more permanent repair. Users will be notified well in advance if such an outage is scheduled.

January 22, 2019

CISL is now accepting large-scale allocation requests from university-based researchers for the 5.34-petaflops Cheyenne supercomputer and the Casper data analysis and visualization cluster. Submissions are due March 5. Researchers are encouraged to review these allocation instructions before preparing their requests.

In addition to requesting computing allocations, university projects should request long-term space on the NCAR Campaign Storage resource instead of HPSS. Unlike HPSS, Campaign Storage has no default minimum amount; users are asked to justify the amount requested. The CISL HPC Allocations Panel (CHAP) is applying increased scrutiny to data management plans and storage requests.  

At the spring meeting, CISL will allocate up to 275 million core-hours on Cheyenne, up to 2 PB of Campaign Storage space, and up to 200 TB of GLADE project space. Large allocations on Cheyenne are those requesting more than 400,000 core-hours. CISL accepts requests from university researchers for these large-scale allocations every six months. Please contact cislhelp@ucar.edu if you have any questions.

 

January 17, 2019

A video recording and slides from the January 14 NCAR/CISL tutorial for new Cheyenne supercomputer users have been added to the CISL Course Library. The 50-minute Introduction to Cheyenne tutorial covers basic usage and typical user workflows. Topics discussed include:

  • The Cheyenne computing environment

  • Accessing software, including compilers and MPI libraries

  • Submitting batch jobs using the PBS scheduler

January 15, 2019

The next regular maintenance operations on NCAR’s HPC systems are scheduled for Tuesday, February 5. The Cheyenne and Casper clusters and the GLADE file system are expected to be unavailable from 7 a.m. until 6 p.m. MST but every effort will be made to restore the systems to users earlier if possible. More details on the outage will be published in the Daily Bulletin later this month.

Pages