Daily Bulletin Archive

May 17, 2012

The CISL User Services Section has published new “quick start” documentation for NCAR users who are working with the Janus cluster managed by the University of Colorado's Research Computing group. It includes details about logging in, which node to use for compiling, a sample job script, and other information that NCAR users need to know. See Quick start for NCAR users.

May 16, 2012

No Scheduled Downtime: Bluefire, HPSS, Lynx, DAV, GLADE

May 16, 2012

Bluefire login nodes outage for May 15 has been cancelled.

Bluefire login nodes may not be accessible between 8 a.m. and 12 p.m. this coming Tuesday, May 15 while system administrators try to tune some network parameters as part of solving the recent GPFS performance problems. We are working closely with IBM on the exact settings for the network configuration, and we may have to postpone this change.

We apologize for the inconvenience and uncertainty created due to this change.

We will make every effort to minimize both. Please note that Bluefire batch nodes will continue running jobs throughout the outage.

May 9, 2012

Over the past few weeks, many users have experienced and have submitted tickets reporting intermittent periods of slow response times on the Bluefire login nodes, the Mirage DAV nodes, and other systems, as well as slow file system performance from the Bluefire batch nodes.

CISL staff from the high-end services section, the networking section, and the consulting group have been continuously pursuing every possible cause of this performance degradation since the April 14 data center power-down. Networking and computing vendors have also been engaged. In fact, the problem became more pronounced after the power-down, but actually seems to have surfaced on or about April 9.

All recent downtimes and changes for Bluefire and other systems have been made to track down and eliminate the root cause of the problem. Please monitor email from Notifier to keep apprised of changes and downtimes. (Subscribing to the "CISL Status" service at http://notifier.ucar.edu/ will get you all key notices.)

We have narrowed down the possible problem locations to the networking connections and interfaces between the compute systems and the GLADE servers, but to date have not yet isolated the cause of the problem. Our current monitoring shows that changes to date have mitigated the problem and the incidents have been less frequent and shorter, but the issue has not yet been eliminated.

We apologize for the inconvenience that this problem has caused. Rest assured that we are continuing to give the problem our full attention.

May 9, 2012

To help you prepare for the transition from Bluefire to the new Yellowstone environment, we've put together some information about the most notable differences between these systems: Transition from Bluefire. We hope it will answer some questions for you.

When more details become available, we will expand that page and also let you know when other new documentation is published.

As announced recently, Bluefire will be shut down on or shortly after September 30, 2012. Review that CISL Daily Bulletin item for more information.

May 7, 2012

The CISL Consulting Services Group will offer its four-day High Performance Computing (HPC) workshops on May 22 to 25, 2012, to help our users learn essential knowledge and skills to work with supercomputers. Topics include CISL Facilities and Support Overview, UNIX, Fortran, Programmer’s Tools, Parallel Programming with MPI/OpenMP, NCL/IDL/Matlab, and others.

You can enroll in the workshop either on-site or online. Please visit the following link for more details and registration.

May 2, 2012

While CISL is preparing for the arrival of the first Yellowstone hardware shipments, we wanted to provide users with some information about what you can expect from Bluefire in the coming months.

Most important, CISL has extended the maintenance agreement for Bluefire through September 30, 2012, and we will continue to operate Bluefire as a production resource until that date. If you encounter any issues, please contact cislhelp@ucar.edu as you do now.

Keep the September 30 deadline in mind, however. On or shortly after that date, Bluefire will be shut down for good. We will not be able to keep Bluefire around for "one last run." All users and projects should plan to complete critical work or reach meaningful stopping points before that date.

Furthermore, keep in mind that Bluefire is likely to become more active near the deadline, not less busy, as users try to complete their projects. All users are asked to plan accordingly and avoid the last-minute rush.

For planning purposes, assume that the September 30, 2012, is fixed. However, CISL is closely watching the Yellowstone deployment schedule for any changes that would recommend an extension to Bluefire's operation. We will inform users as quickly as possible if we change the date.

May 2, 2012

Bluefire:  Tuesday, May 1 from 6:00am - 1:00pm

HPSS:      Tuesday, May 1 from 7:00am - 9:00 am

DAV:        Tuesday, May 1 from 6:00am - 1:00 pm

GLADE:    Tuesday, May 1 from 6:00am - 1:00 pm

Lynx:       Tuesday, May 1 from 6:00am - 1:00 pm

No Scheduled Downtime:

May 1, 2012

Date and Time: 

1 May 2012, 9:00 AM - 4:00 PM
ML-Vis Lab
Thomas Clune, Ph.D.

This workshop provides an introduction to unit testing in a parallel, numerical, Fortran environment using the pFUnit software. In the session, participants will be able to set realistic expectations, test numerical algorithms, test legacy code, and similar tasks. The morning will be dedicated to presentations and discussion with hands-on tutorials in the afternoon.  Experience with Fortran is assumed, no prior experience with PFUnit or Unit testing is required.

The course is limited to 24 participants.  Registration is through the EOD Course Catalog.  To Register:

  1. Go to the EOD Training Catalog via Connect: https://www.fin.ucar.edu/hrisConnect/employee

  2. UCAS (timecard) log in & password

  3. Click- Training & Education Tab

  4. Click - Training Catalog

  5. Search by Course Title: FORTRAN TESTING USING pFUnit

  6. Click - Enroll

Speaker Description: 

Thomas Clune, Ph.D., Chief, Software Systems Support Office, NASA Goddard Space Flight Center, and a principal developer of pFUnit.

April 26, 2012

Making sure that we can reach you when necessary can help you avoid being locked out of the system. Say a job that you’re running on Bluefire is causing problems and we can’t contact you because your phone number or email address has changed. As explained on our User responsibilities page, that job may be killed and you will find yourself locked out. If your name, phone number, email address, or other information changes, please promptly notify the CISL Help Desk by email.