Daily Bulletin Archive

March 18, 2019

No scheduled downtime: Cheyenne, Casper, Campaign Storage, GLADE and HPSS

March 15, 2019

Cheyenne’s default MPI library is now MPT 2.19, which is the version that HPE recommends and supports. Versions 2.15 and 2.16 are no longer compatible with system firmware and have been removed from the system. To mitigate failures from existing scripts and job workflows, the mpt/2.15 and mpt/2.16 modules still exist, but they now point to the MPT 2.19 library and issue a message prompting users to upgrade. The mpt/2.15 and mpt/2.16 modules will be deleted later this year. MPT 2.18 is still available on Cheyenne but is no longer supported by HPE.

The parallel libraries netcdf-mpi and pnetcdf using MPT 2.19 are available for the following, supported versions of the Intel compiler: 16.0.3, 17.0.1 (the default), 18.0.5, and 19.0.2. The libraries have also been built for GCC versions 6.3.0, 7.3.0, and 8.1.0, and for PGI 17.9.

Users should update their scripts and recompile executables to use MPT 2.19 as soon as possible.

March 14, 2019

Reminder: The file retention period for the GLADE scratch space was increased recently from 60 days to 90 days. Individual files will be removed from scratch automatically when they have not been accessed – read, copied or modified – in more than 90 days. To check a file's last access time, run the command ls -ul <filename>.

The updated retention policy is expected to ease user issues related to managing their data holdings and improve overall file system utilization.

March 11, 2019

Do you have some experience as an HPC system administrator and want to expand your skills? Consider attending Intermediate HPC System Administration, a Linux Clusters Institute workshop scheduled for May 13 to 17 at the University of Oklahoma. The workshop will:

  • Strengthen participants’ overall knowledge of HPC system administration.

  • Focus in-depth on file systems and storage, HPC networks, job schedulers, and Ceph.

  • Provide hands-on training and real-life stories from experienced HPC administrators.

See the workshop page for more information and registration. Early bird registration ends April 15.

March 11, 2019

No Scheduled downtime: Cheyenne, Casper, Campaign Storage, HPSS and GLADE

March 8, 2019

As a result of this week’s upgrade to the InfiniBand switch firmware, MPT version 2.15 is no longer available and the default MPI on Cheyenne is now MPT 2.16. Users with scripts pointing to MPT 2.15 or executables that have been compiled against MPT 2.15 will need to move to a more recent version of MPT as soon as possible as those scripts and executables will likely fail.

The MPT versions currently available on Cheyenne are MPT 2.16, MPT 2.18, and MPT 2.19. CISL recommends that users move to MPT 2.18 or MPT 2.19 as HPE no longer supports MPT 2.16, which will likely be removed from Cheyenne later this year. MPT 2.19 was installed yesterday and the full system software stack will be filled out within the next couple of weeks.

If you need assistance with updating your scripts or executables with a newer MPI, please contact CISL help at cislhelp@ucar.edu or call 303-497-2400.

March 6, 2019

The Cheyenne system's compute nodes remain down as of 9 a.m. today and are unavailable for running batch jobs due to unresolved problems following this week’s scheduled maintenance. HPE has been notified and is working with CISL to return the system to users as soon as possible.

The Cheyenne login nodes, Casper cluster, GLADE file system, NCAR's Campaign Storage, Globus data transfer services, and the High Performance Storage System (HPSS) have been restored to service. Users will be informed by Notifier when the Cheyenne compute nodes are back in service.

March 6, 2019

Do you perform data analysis, post-processing, visualization, GPU computing, or machine learning? If you use the Casper cluster, or wish to use future NCAR/CISL resources to conduct such work, we’d like to get your input by asking you to complete this brief CISL Data Analysis and Visualization User Survey by March 15.

CISL has begun the planning process for the system that will follow the Cheyenne and Casper clusters currently in production in our NWSC data center. Your input will help inform the procurement of resources to support your work in the future.

March 5, 2019

The Women in IT Networking at SC (WINS) program is now accepting applications for the 2019 program. The application deadline is 11:59 p.m. AoE, April 1, 2019. The application form and details are available here.

Since 2015, the WINS program has provided an immersive “hands-on” mentorship opportunity for early- to mid-career women in the IT field who are selected to participate in the ground-up construction of SCinet, one of the fastest and most advanced computer networks in the world. SCinet is built annually for the Supercomputing Conference (SC). SC19, to be held in Denver, Colorado, is expected to attract more than 13,000 attendees who are leaders in high-performance computing and networking.

WINS is a joint effort between the Department of Energy’s Energy Sciences Network (ESnet), the Keystone Initiative for Network Based Education and Research (KINBER), and the University Corporation for Atmospheric Research (UCAR), and works collaboratively with the SC program committee.

The program offers travel funding for awardees through an NSF grant and ESnet funding; collaborates with SCinet committee leadership to match each awardee with a SCinet team and a mentor; and provides ongoing support and career development opportunities for the awardees before, during, and after the conference.

March 4, 2019

Scheduled downtime: Cheyenne, Casper, Campaign Storage, HPSS and GLADE (details)