Daily Bulletin Archive

Jul. 23, 2013

NCAR HPC users can now submit requests for subsets of select Research Data Archive (RDA) gridded data sets using the "rdams" utility on the Yellowstone system’s login nodes.

See our Research Data Archive documentation for information on how to access and use rdams. Please contact Doug Schuster (schuster@ucar.edu) if you have any questions.

Jul. 12, 2013

No Scheduled Downtime: Yellowstone, Geyser, Caldera, HPSS, GLADE, Lynx

Jul. 10, 2013

During early testing of Yellowstone, using this line in LSF batch scripts was beneficial, but users now are asked to remove it from those scripts:

#BSUB -R "select[scratch_ok > 0]"

The functionality it provided has been superseded by other LSF features applied behind the scenes and not visible to users. Supporting the scratch_ok feature requires using additional batch node resources that can otherwise be used in computation. Therefore, we are planning to remove it in the near future. Once the feature is removed, jobs that include the line shown above will hang in the queue forever, so we ask that you remove the line from your job scripts.

Beginning Monday, June 24, LSF will reject jobs including this line with an error message asking you to remove it.

Jul. 5, 2013

Users logging in to Yellowstone after Tuesday’s outage may see a notice or warning message related to an upgrade to our environment modules software.

The notice says, “Loading system default modules.”

Users also will see warnings similar to the following  when loading customized module environments that they have saved as described in our Environment modules documentation:

"Lmod Warning: The following modules have changed: pgi"
"Lmod Warning: Please re-create this collection"

To get rid of the warning message, resave your customized environment default(s) using the module "sd" command. If you have questions about this or any other module-related problem, please contact CISL Consulting by email (cislhelp@ucar.edu), phone (303-497-2400) or ExtraView ticket.

Jul. 5, 2013

CISL is working hard to resolve the intermittent GPFS hangs that users have been experiencing with the Yellowstone system.

We are preparing to upgrade the GPFS software to version 3.5, which we expect will alleviate some of these problems. We are also working with IBM and Mellanox to address FDR InfiniBand interconnect issues that may be contributing to these issues.

Other hangs appear to be tied to extreme metadata load, which can be caused by any number of user-initiated tasks that access many files in a short time. Users can help mitigate one contributing source of metadata load, and speed up their work, by executing shell scripts using the “fast” option if the script does not execute module commands. For example, in the first line, use "#!/bin/csh -f" for csh. Without the fast option, the user's modules are initialized each time the script runs.

We will continue to keep you informed and are exploring ways to provide you information on a more “real-time” basis. Thank you for your patience and cooperation.

Jun. 21, 2013

Yellowstone, Geyser, Caldera: Downtime Tuesday, June 18, 9:00am - 5:00pm

GLADE: Downtime Tuesday, June 18, 8:00am - 10:00

No Scheduled Downtime: HPSS, Lynx

Jun. 20, 2013

Allocations for some non-university projects are subject to 30- and 90-day thresholds as explained in our Allocation use and thresholds documentation. CISL will begin enforcing that policy on Monday, June 17.

The thresholds apply to several NCAR divisional allocations and a small number of projects that have very large allocations. No university projects are affected.

When usage exceeds the thresholds that apply to an allocation, LSF notifies users who submit jobs and redirects those jobs to the low-priority “standby” queue. The message includes the project code (for example, P12345678) and the statement: “Warning: Project group exceeds a 30/90 threshold.”

To check on the status of an allocation, log in to https://apps.weg.ucar.edu/reports with your Yellowstone username and your UCAS password. Select “Divisional Reports” and then the appropriate division.

Contact cislhelp@ucar.edu if you have questions.

Jun. 18, 2013

An upgrade to module command is planned during the yellowstone outage on Tuesday 18 June.  After the upgrade it is possible that users may encounter a warning message when loading a saved default module set (either by explicitly using the module "gd" command, or when logging in).  The warning message will look similar to:

---

Lmod Warning: The following modules have changed: pgi

Lmod Warning: Please re-create this collection

---

To get rid of the warning message it should be sufficient to resave defaults using the module "sd" command.  If you encounter this or any other module related problem after the upgrade, please contact CISL Consulting by email (cislhelp@ucar.edu), phone (303-497-2400) or ExtraView ticket.

Jun. 18, 2013

Yellowstone, Geyser, and Caldera will be taken down for maintenance on June 18 to apply a firmware update to the central Juniper switch of the management network. We are reserving a full day for this outage, since it may entail a full reboot of the compute nodes. We will let users know when Yellowstone returns to service via a Notifier message.

During the outage, CISL will also apply some patches to LSF 8. These patches address a number of application start-up and performance issues.

The GLADE team has also decided to perform several updates, the most urgent of which is a firmware update for some of the GLADE drives to bring the systems up to date. Because Yellowstone will already be down, GLADE will be taken down from 8 a.m. until 10 a.m. to replace the firmware.

Additional GLADE software upgrades related to the planned upgrade of GPFS will also be carried out during the day. However, these will be handled via a rolling upgrade that does not require downtime.

During the downtime, no services will be able to access GLADE. Web access to the Research Data Archive (RDA) will also not be available.

Web access to RDA data files and submission of subsetting requests, as well as other GLADE services, such as Globus Online, will return to service after the GLADE team completes the firmware update. Processing of RDA subsetting requests will be delayed until after the Yellowstone downtime.

Jun. 18, 2013

CISL documentation regarding the Intel Math Kernel Library (MKL) of optimized math routines now includes OpenMP and MPI usage examples. In addition to the new parallel examples, the MKL documentation presents sample batch job scripts and procedures for accessing the numerous Intel examples on the Yellowstone system. See MKL: Math Kernel Library and contact cislhelp@ucar.edu if you have questions.

Pages