Daily Bulletin Archive

February 6, 2019

NCAR’s Computational and Information Systems Laboratory (CISL) is seeking users’ input as part of the process of procuring a system to follow the Cheyenne cluster currently in production in the NCAR-Wyoming Supercomputing Center. Users’ input will help ensure procurement of a system that best meets the community’s future needs.

Users are asked to provide their input by completing this survey by March 1: User Survey for CISL's Next HPC Procurement. The survey includes questions about users’ experience with the Cheyenne environment and priorities for the next-generation environment, which is expected to be in production by mid-2021.

Two of the three sections of the survey can be completed in about 10 minutes. Users are welcome and encouraged to respond to some open-ended questions in the third section if they can take the time.


February 5, 2019

Additional CMIP6 data are now available on GLADE through the CMIP Analysis Platform, including monthly averages from the historical, 1pctC02, and piControl experiments. Data sets are available from the BCC, CNRM, IPSL, NASA, and NOAA modeling centers. Users can request the addition of other data sets as modeling centers publish them. NCAR's initial CMIP6 data products are expected to be available in the first quarter of this calendar year.

More information: CMIP Analysis Platform.


February 5, 2019

Cheyenne: Tuesday, noon to 1 p.m. (details)

No scheduled downtime: Casper, Campaign Storage, GLADE and HPSS

February 1, 2019

The Casper cluster will be expanded soon with the addition of two nodes. The new nodes are similar to the two existing Supermicro nodes with eight NVIDIA Tesla V100 GPUs. They will support ongoing and future machine learning and deep learning efforts.

The new nodes have been received and are being installed by CISL staff at the NCAR-Wyoming Supercomputing Center. When the installation is complete, CISL system administrators and software engineers will begin acceptance testing, which is expected to take several weeks. More details will be published when the new nodes are ready for users.


January 30, 2019

CISL system administrators will update the PBS workload management server at noon MST on Tuesday, February 5. The update is expected to take less than 60 minutes to complete. The new version of PBS, 18.2.3, provides performance and stability improvements and a number of important bug fixes.

Most PBS commands, including qstat, will not work during the update, and new Cheyenne job submissions will not be possible. Jobs that are executing when the maintenance begins will continue to run without interruption. Jobs that are queued for execution or in a hold state will remain in those states until PBS is returned to service. Access to Cheyenne’s login nodes will not be interrupted.

January 29, 2019

No scheduled downtime: Cheyenne, Casper, Campaign Storage, GLADE and HPSS

January 28, 2019

CISL plans to roll out a new Jira Service Desk system with an integrated Confluence Knowledge Base to help HPC users, CISL staff, and others quickly find the solutions or assistance they need. The new system is expected to be ready in February and will replace the ExtraView ticketing system that has been in place for most of the past decade.

Service Desk features a friendlier user interface, simplified request forms, and a knowledge base of articles to answer common questions. Users will also be able to log in to track the status of their in-progress tickets.

UCAR/NCAR personnel already have the CIT passwords that are required to log in to Jira Service Desk, as do users who have Duo two-factor authentication rather than YubiKey tokens. To get a CIT password, call 303-497-2400 for assistance.

More information on implementation of the new service desk will be available soon.


January 25, 2019

The maintenance operations on NCAR’s HPC systems that were scheduled for Tuesday, February 5, have been canceled. To minimize inconvenience to users, the work that was scheduled for that day will be combined with other system maintenance on Tuesday, March 5. More details on the March 5 outage will be published in the Daily Bulletin next month.


January 23, 2019

HPSS: Thursday, from 07:30 to 11:00 a.m.

No scheduled downtime: Cheyenne, Casper, Campaign Storage, GLADE


January 23, 2019

CISL has determined that a UCAR enterprise ethernet network hardware failure was the root cause of last night’s problems on Cheyenne. The network problem caused Cheyenne to lose communications with GLADE and caused a significant number of failed jobs and poor system performance.

CISL system and storage administrators implemented a workaround to restore Cheyenne-GLADE communications and no further unscheduled interruptions are expected. It may necessary to schedule a brief outage in the near future to implement a more permanent repair. Users will be notified well in advance if such an outage is scheduled.