Daily Bulletin Archive

March 9, 2020

The Cheyenne and Casper login nodes are shared by everyone in the user community, so it’s important to keep in mind their intended purposes.

Cheyenne – You can run short, non-memory-intensive processes on the Cheyenne login nodes. These include tasks such as text editing and running small serial scripts or programs. Memory-intensive processes that slow login node performance for all users are killed automatically and the responsible parties are notified by email.

Casper – The Casper login nodes have one purpose: Logging in to start jobs on the cluster’s compute nodes. Any compute processes found running on the login nodes will be killed.

Learn more about using shared resources and other best practices right here.

March 9, 2020

No scheduled downtime: Cheyenne, Casper, Campaign Storage, GLADE, HPSS

March 4, 2020

The Women in IT Networking at SC (WINS) program is now accepting applications for the 2020 program. Since 2015, the WINS program has provided an immersive, hands-on mentorship opportunity for early- to mid-career women in the IT field who are selected to participate in the ground-up construction of SCinet – one of the fastest and most advanced computer networks in the world – which is built annually for the Supercomputing Conference (SC). 

SC20, to be held in Atlanta, GA, is expected to attract more than 13,000 attendees who are leaders in high-performance computing and networking. WINS offers travel funding for awardees; collaborates with SCinet committee leadership to match each awardee with a SCinet team and a mentor; and provides ongoing support and career development opportunities for the awardees before, during, and after the conference. Interested and qualified women are encouraged to apply. The application deadline is 11:59 p.m. AoE on April 1. Award notifications will be sent by mid- to late May 2020. You can find more information and a link to the Apply to WINS form here.

WINS is a joint effort between the Department of Energy’s Energy Sciences Network, the Keystone Initiative for Network Based Education and Research, and the University Corporation for Atmospheric Research.

March 2, 2020

Cheyenne was returned to service shortly after 5 p.m. Friday, a full day ahead of schedule. Maintenance work last week addressed the system’s cooling infrastructure, the InfiniBand hypercube fabric’s performance and reliability, and system security. System tests that were run after the system was restored showed significant performance improvements across a wide spectrum of applications.

We would like to hear from users if they notice any significant changes in their applications’ performance: Send an email. Thank you to everyone for your patience and cooperation throughout last week's extended downtime. 

February 28, 2020

No scheduled downtime: Cheyenne, Casper, Campaign Storage, GLADE, HPSS

February 28, 2020

Cheyenne repair and update efforts continued on Thursday, again successfully and without incident. CISL facility engineers at the NCAR-Wyoming Supercomputing Center restored electrical power to Cheyenne at 8:30 a.m. Nortek engineers completed their final inspections of the system’s cooling units. Late Thursday night, CISL HPC engineers updated the firmware on all of Cheyenne’s approximately 1,000 power supply units and validated the system’s InfiniBand hypercube fabric. 

While Cheyenne remains down today, users can access the Casper cluster, GLADE and Campaign Storage file systems, data-access nodes, and HPSS by logging directly in to Casper at casper.ucar.edu.

Cheyenne is expected to return to service no later than Saturday evening. This weekend users will be updated on any significant changes through the Notifier service.

February 27, 2020

All work that was scheduled for Wednesday at the NCAR-Wyoming Supercomputing Center (NWSC) was completed successfully and without incident. HPE engineers completed repairs on all 56 targeted InfiniBand fabric switches, Nortek engineers finished their final inspections of the Cheyenne system’s cooling units, and CISL facility engineers completed their repair and maintenance on the NWSC mechanical and electrical infrastructure. CISL HPC engineers updated the system’s Linux kernel and the InfiniBand network’s operating system and firmware.

While Cheyenne remains down, users can access the Casper cluster, the GLADE and Campaign Storage file systems, data-access nodes, and HPSS by logging directly in to Casper at casper.ucar.edu.

Cheyenne is expected to be returned to service no later than Saturday evening. Watch for regular updates in CISL’s Daily Bulletin and Notifier service.

February 26, 2020

All work scheduled for Tuesday at the NCAR-Wyoming Supercomputing Center (NWSC) was completed on time and without incident. HPE engineers repaired 26 InfiniBand fabric switches, Nortek engineers completed maintenance on the system’s cooling units, and CISL HPC specialists updated several system software stack components. CISL network engineers successfully rebooted key network switches and installed important security updates. These repairs and updates are expected to have a significant impact on Cheyenne’s performance and reliability.

The Casper cluster, the GLADE and Campaign Storage file systems, data-access nodes, and HPSS were returned to service before 10:30 p.m. MST, more than an hour ahead of schedule. Users can access these systems while Cheyenne is unavailable by logging in at casper.ucar.edu.

Cheyenne is expected to be returned to service no later than Saturday evening. Watch for regular updates in CISL’s Daily Bulletin and Notifier service.

February 25, 2020

The Cheyenne cluster was powered down this morning, on schedule, at 6 a.m. MST. The system is expected to be returned to service no later than Saturday evening, February 29. 

During this outage CISL staff and HPE engineers will perform critical infrastructure maintenance, repairs, and system software updates. The work is expected to have a significant impact on system performance, reliability and security. Users will be kept apprised of the effort throughout the week in the CISL Daily Bulletin and the Notifier service.

The Casper cluster, GLADE and Campaign Storage file systems, data-access nodes, and HPSS will remain available throughout the week except for approximately four hours beginning at 8 p.m. MST tonight, February 25. Users can access these systems by logging directly onto Casper at casper.ucar.edu.

February 25, 2020

Registration is now open for the 2020 Rocky Mountain Advanced Computing Consortium (RMACC) HPC Symposium, May 19-21 at the University of Colorado Wolf Law School in Boulder. RMACC is a collaboration among academic and research institutions, including NCAR and others as partners. Its mission is to facilitate widespread, effective use of high-performance computing throughout the Rocky Mountain region.

RMACC announced two keynote speakers for the event: Dr. Nick Bronn from IBM’s Experimental Quantum Computing Group will give a talk on “Benchmarking and Enabling Noisy Near-Term Quantum Hardware,” and Dr. Jason Dexter from CU Boulder will give a talk on “High-performance computing and the first black hole image.” The symposium schedule includes sessions on topics such as quantum computing, machine learning, cloud computing, and professional skills for students and practitioners.

Register here to attend.

Pages