Daily Bulletin Archive

Sep. 22, 2013

CISL, IBM, and Mellanox have set Monday, September 30, as the start date for the process of replacing the Yellowstone InfiniBand cables, previously announced in July. Users should plan for Yellowstone being out of service for up to three weeks from that date.

A large team from CISL, IBM and Mellanox continue to refine the details of the process. The current plan for the outage has the following general phases:

* A full downtime will be taken Sept. 30 to Oct. 3 to remove the existing cables, conduct preventive maintenance in the NWSC central utility plant, and perform a file system check on GLADE. Yellowstone, GLADE, Geyser and Caldera will all be unavailable during this time.

* GLADE and the Geyser and Caldera clusters (along with the Yellowstone login nodes and LSF) will be returned to service as soon as possible to permit users to conduct analysis, visualization, and data access tasks. (The InfiniBand cables for Geyser and between GLADE and the Geyser and Caldera clusters will be replaced prior to October with limited downtime.)

* The recabling of the Yellowstone batch nodes will take approximately two weeks. After recabling is complete, CISL expects to restore the full system to service without additional downtime for GLADE, Geyser, and Caldera.

Note that HPSS will remain available during the entire period except during the NWSC utility maintenance.

We will continue to provide updates via the Daily Bulletin and Notifier as relevant details arise, but users should now plan for Yellowstone to be unavailable starting Sept. 30 for three weeks.

Sep. 16, 2013

NCAR/CISL invites NSF-supported university researchers in the atmospheric, oceanic, and related sciences to submit large allocation requests for the Yellowstone system by September 16, 2013. All requesters are strongly encouraged to review the instructions before preparing their submissions.

These requests will be reviewed by the CISL High-performance computing Advisory Panel (CHAP), and there must be a direct linkage between the NSF award and the computational research being proposed. Please visit http://www2.cisl.ucar.edu/docs/allocations for more university allocation instructions and opportunities.

Allocations will be made on Yellowstone, NCAR's 1.5-petaflops IBM iDataPlex system; the data analysis and visualization clusters (Geyser and Caldera); the 11-petabyte GLADE disk resource, and the High Performance Storage System (HPSS) archive. Please see https://www2.cisl.ucar.edu/resources/yellowstone for more system details.

For the Yellowstone resource, a large allocation is any request for more than 200,000 core-hours. Researchers with needs for up to 200,000 core-hours can apply for Small University Allocations at any time. Small allocations are also recommended for researchers who are new to Yellowstone, in order to conduct benchmarking and test runs before submitting large allocation requests.

Sep. 15, 2013

Yellowstone in service, Help Desk and Consulting closed Friday, September 13.

Due to flooding in Boulder today, UCAR facilities in Colorado have been closed. Thus, the CISL Help Desk and Consulting Services will not be available today.

Yellowstone, GLADE, HPSS, and other systems at NWSC, as well as support systems at the Mesa Lab, remain in service.

Sep. 10, 2013

Numerous WRF runs on Yellowstone by the Consulting Services Group (CSG) provide the basis for new compiling and runtime recommendations that are documented here:

Optimizing WRF performance on Yellowstone

The WRF jobs include small runs and others, with various domain sizes and time steps. They ranged in size up to 4,096 nodes—nearly the entire Yellowstone system. CSG also used run data to develop equations for use in estimating core-hours needed for individual WRF runs and allocation requests. See the link above for the equations, scaling results and timing graph.

Sep. 6, 2013

Commands for creating and managing customized collections of Yellowstone environment modules have changed.

To save a customized environment as your default environment, load the modules that you want to use in that environment, then simply run module save. The save command replaces the now-deprecated setdefault command. Similarly, restore replaces the getdefault command.

For information about other changes, see the updated CISL Environment modules documentation.

Aug. 30, 2013

Yellowstone, Geyser, Caldera: Downtime Tuesday, August 27 6:00am - 6:00pm

No Scheduled Downtime: HPSS, GLADE, Lynx

Aug. 28, 2013

The upgrade to LSF 9.1.1 and IBM Parallel Environment (PE) on Yellowstone was completed Tuesday, and following testing of the nodes, the system was returned to users at approximately 7 p.m.

Although not technically required, CISL's consultants strongly recommend that users recompile their codes in the updated environment.

Aug. 27, 2013

NCAR researchers and eligible university researchers can now request "small" Janus allocations of up to 200,000 core-hours at any time, an increase from the previous limit of 50,000 core-hours for small allocations.

University researchers can request allocations of more than 200,000 core-hours as part of the semi-annual large allocation process. The next deadline is Sept. 16. See University Large Allocation Request Form. For small allocations, use the University Small Allocation Request Form.

NCAR staff can also request both small and larger allocations on Janus via the Janus allocation request form. Large allocations require a brief write-up of the technical readiness and justification of the computational request. NCAR researchers should use the Alternative Allocation Request Form.

More information is available here: http://www2.cisl.ucar.edu/resources/janus/allocations

Aug. 26, 2013

CISL staff will be conducting a two-part update to key Yellowstone system software components August 20 and August 27. As part of this update, Yellowstone, Geyser, and Caldera will be taken out of service August 27, from 6 am MT until 6 pm MT. The updates include fixes for a number of issues experienced by users.

Although not technically required, CISL's consultants strongly recommend that users recompile their codes following the August 27 downtime.

Of most interest to users, the updates to LSF and the IBM Parallel Environment (PE) include:

* Corrected wrappers for the PGI compiler;

* the fix for a bug with MPI_IN_PLACE in MPI_Allgather that some users have encountered;

* a fix that will allow Fortran codes with "USE MPI" statements to compile correctly under PGI and GNU compilers; and

* the LSF and PE versions needed to complete integration of the Pronghorn Xeon Phi cluster into the environment.

On August 20, CISL will perform the first part of the update, upgrading the xCAT administration software, which is a prerequisite to the LSF and PE updates. No outage will be needed if the upgrade process goes as planned. However, users should be aware of the slight chance that CISL staff may need to take the system down should they encounter problems.

On August 27, CISL staff will take the system down to upgrade LSF to version 9.1.1 and the IBM PE to version The downtime is necessary since all the nodes must be rebooted to propagate all the changes.

During this period a number of other system firmware and software components will be brought up to date, but these will largely be invisible to users.

GLADE and HPSS will not be affected by the update process and are expected to remain in service throughout this period.

Aug. 25, 2013

An XSEDE training session for beginning and intermediate Linux/Unix users will be webcast from 1 to 4 p.m. Central time on Friday, September 6.

The Texas Advanced Computing Center will present the training session “Linux/Unix Basics.” XSEDE described it as an interactive lecture that will emphasize common strategies for interacting with clusters and HPC resources. It will include hands-on exercises. There are no prerequisites.

To register, see https://www.xsede.org/web/xup/course-calendar