Daily Bulletin Archive

January 29, 2015

 CISL has scheduled an extended High Performance Storage System (HPSS) downtime for Wednesday, Jan. 28, for preventive maintenance. HPSS will be unavailable from 9 a.m. to 4 p.m. MST for the installation of 20 new tape drives and for quarterly deep cleaning of both tape libraries.

January 26, 2015

A recording of the January 20 “Introduction to Yellowstone” webcast is now available on the CISL web site. See the course page to review the presentation. Topics covered include logging in, working with environment modules, compiling code, running batch jobs and interactive jobs, and archiving and transferring files.

New users and others may also find the Yellowstone Quick start documentation helpful.

If you have suggestions for other training topics that you would find useful, please let us know via our feedback form.

January 21, 2015

After experiencing unanticipated system hangs, CISL has ended the rolling reboot of Yellowstone nodes that began on January 5. CISL staff will reboot the rest of the compute nodes during a scheduled system maintenance downtime at a future date.

 As announced previously, rebooting the nodes is intended to address higher than expected occurrences of out-of-memory conditions and other node issues. Those appear to have resulted from the accumulation of memory leaks in kernel and system processes during a roughly 180-day period of up time for most of the Yellowstone compute nodes. The recent system hangs were due to a small number of nodes that were not shutting down gracefully.

January 20, 2015

The CISL Consulting Services Group will present a 40-minute workshop at 10 a.m. MST on Tuesday, January 20, to introduce participants to using the Yellowstone, Geyser, Caldera, and HPSS systems.

Topics to be covered including logging in, working with environment modules, compiling code, running batch jobs and interactive jobs, and archiving and transferring files.

You can register to attend in person—at the VisLab in NCAR’s Mesa Lab in Boulder—or via webcast by selecting a link below:

January 15, 2015

The NCL team is offering a series of webinars on an introduction to NCL processing.

The first three webinars in this series have already been held and recorded. The next three webinars will be:

  • January 8, 2015 at 9:15 AM MST
  • January 9, 2015 at 10:15 AM MST
  • January 14, 2015 at 10:15 AM MST

 

These webinars assume you have basic knowledge of NCL syntax and NCL file I/O. It is helpful, but not required, to have watched the other webinars in this series.

You may attend the webinars remotely via a web link that will be emailed to you after registration, or in person in the VisLab at NCAR's Mesa Lab campus in Boulder.

Please click on the appropriate link to register for each webinar that you'd like to attend.

If you are completely new to NCL, you should watch the "Introduction to NCL" webinar series, available at the link below.

For more information, contact Mary Haley at ext.1254, haley@ucar.edu or visit the NCL Webinars web page.

January 14, 2015

The University of Colorado Computational Science and Engineering Meetup Group will learn about debugging and profiling parallel codes using Allinea DDT and MAP in a meeting at 11:30 a.m. Thursday, January 15. See the Meetup Group for details and to RSVP.

The Allinea tools are available to use on the Yellowstone system as described here.

January 13, 2015
 
Yellowstone lost some InfiniBand switch blades last Friday, and we had to switch off some of the GPFS servers due to connectivity issues. This in turn resulted in lower bandwidth for GLADE file system traffic. Users have been and will continue to observe slower GLADE access when the system is busy until we are able to restore the failed switch modules. We hope to get parts on site and fix the blade issues and file systems by the end of day tomorrow or earlier. 
 
We apologize for the inconvenience created by these failures.

David L Hart  -  dhart@ucar.edu  -  303-497-1234
NCAR/CISL User Services Manager
January 9, 2015

CISL has installed the latest version of MATLAB—R2014b—on the Yellowstone system. See Essential module commands to learn how to get a list of all of the available versions and how to load the one that you want to use.

January 6, 2015
Most Yellowstone compute nodes have been up for approximately 180 days, and we suspect that recent higher than expected occurrences of out-of-memory conditions and other node issues are caused by accumulated memory leaks in kernel and system processes over this unusually long period.
 
CISL staff are therefore planning to reboot select Yellowstone racks starting Monday, January 5, 2015. To avoid disruption to users, CISL will reboot only a limited number of nodes per day, so the rolling reboot process will go on for several days. We do not  anticipate any outage or user-visible downtime.
 
David L Hart  -  dhart@ucar.edu  -  303-497-1234
NCAR/CISL User Services Manager
January 5, 2015

Yellowstone users who need to debug code at scales beyond 1024 cores, even up to full-machine runs, will have that opportunity soon. As part of CISL’s purchase of DDT, MAP, and Performance Reports, Allinea, developer of the DDT debugger, will facilitate the larger-than-usual runs by providing CISL with a modified license for one week.

To indicate your interest and needs, submit a large-scale debugging request using the CISL Special request form. State how many cores you will need, how many runs you anticipate, and when you are available to run the jobs. CISL staff will schedule the license modification once user preferences and availability are known.

The agreement with Allinea allows CISL to offer this “burst mode” for large-scale debugging only a few times per year, so the opportunities will be scheduled by CISL consultants throughout the year as user needs are identified.

Pages