Daily Bulletin Archive

August 28, 2013

CISL staff will be conducting a two-part update to key Yellowstone system software components August 20 and August 27. As part of this update, Yellowstone, Geyser, and Caldera will be taken out of service August 27, from 6 am MT until 6 pm MT. The updates include fixes for a number of issues experienced by users.

Although not technically required, CISL's consultants strongly recommend that users recompile their codes following the August 27 downtime.

Of most interest to users, the updates to LSF and the IBM Parallel Environment (PE) include:

* Corrected wrappers for the PGI compiler;

* the fix for a bug with MPI_IN_PLACE in MPI_Allgather that some users have encountered;

* a fix that will allow Fortran codes with "USE MPI" statements to compile correctly under PGI and GNU compilers; and

* the LSF and PE versions needed to complete integration of the Pronghorn Xeon Phi cluster into the environment.

On August 20, CISL will perform the first part of the update, upgrading the xCAT administration software, which is a prerequisite to the LSF and PE updates. No outage will be needed if the upgrade process goes as planned. However, users should be aware of the slight chance that CISL staff may need to take the system down should they encounter problems.

On August 27, CISL staff will take the system down to upgrade LSF to version 9.1.1 and the IBM PE to version 1.3.0.4. The downtime is necessary since all the nodes must be rebooted to propagate all the changes.

During this period a number of other system firmware and software components will be brought up to date, but these will largely be invisible to users.

GLADE and HPSS will not be affected by the update process and are expected to remain in service throughout this period.

August 28, 2013

Yellowstone, Geyser, Caldera: Downtime Tuesday, August 27 6:00am - 6:00pm

No Scheduled Downtime: HPSS, GLADE, Lynx

August 23, 2013

NCAR researchers and eligible university researchers can now request "small" Janus allocations of up to 200,000 core-hours at any time, an increase from the previous limit of 50,000 core-hours for small allocations.

University researchers can request allocations of more than 200,000 core-hours as part of the semi-annual large allocation process. The next deadline is Sept. 16. See University Large Allocation Request Form. For small allocations, use the University Small Allocation Request Form.

NCAR staff can also request both small and larger allocations on Janus via the Janus allocation request form. Large allocations require a brief write-up of the technical readiness and justification of the computational request. NCAR researchers should use the Alternative Allocation Request Form.

More information is available here: http://www2.cisl.ucar.edu/resources/janus/allocations

August 20, 2013

An XSEDE training session for beginning and intermediate Linux/Unix users will be webcast from 1 to 4 p.m. Central time on Friday, September 6.

The Texas Advanced Computing Center will present the training session “Linux/Unix Basics.” XSEDE described it as an interactive lecture that will emphasize common strategies for interacting with clusters and HPC resources. It will include hands-on exercises. There are no prerequisites.

To register, see https://www.xsede.org/web/xup/course-calendar

August 16, 2013

Users are asked to plan around the 2013 Community Earth System Modeling (CESM) Tutorial schedule August 12 to 16 to reduce potential contention for Intel compiler licenses.

Tutorial participants will be using Yellowstone’s six login nodes and four Caldera nodes for compilation between these hours:

  •  2:30 and 5 p.m. Mountain time on Monday, Tuesday, and Thursday

  • 1 and 3 p.m. on Friday

During these windows, 80 attendees will work in two-person teams, compiling and submitting CESM jobs. They will not be using PGI, GNU, or PathScale compilers, so those will not be affected.

The results of the tutorial compilations on most days will be small, short compute jobs that should have minimal impact on the availability of batch nodes for other users.

August 14, 2013

Starting Friday and over the weekend, users may have experienced issues with interactive sessions on Yellowstone due to problems on two of the six login nodes.

Yslogin2 will be taken out of service today, Monday, August 12, 2 p.m. to 4 p.m., so that IBM can replace the system board on the node. The other five login nodes will remain available.

Yslogin4 was taken out of service Friday evening through Saturday morning to replace a failing InfiniBand adapter. User sessions were interrupted to complete the fix, and the node has been returned to service.

August 9, 2013

HPSS:   Downtime Tuesday, August 13, 7:00am-9:00am

No Scheduled Downtime: Yellowstone, Geyser, Caldera, GLADE, Lynx

August 5, 2013

This week, CISL staff are performing a rolling upgrade to the Yellowstone, Geyser and Caldera systems to bring the GPFS client software on the clusters up to version 3.5.

Sets of nodes have been placed under several system reservations and will be taken out of service and restarted with the new client software. After passing health checks, the nodes will be returned to service.

Users should not be affected by the updates, other than perhaps slightly longer queue waits as the reservations and upgrade process reduce the number of nodes available to jobs. Users should consult CISL's documentation on backfill windows to maximize their throughput around the reservations; see http://www2.cisl.ucar.edu/resources/yellowstone/using_resources/runningjobs#bslots

These updates complete the transition to the most recent version of GPFS, which provides Yellowstone and GLADE with a number of features to improve the management of the disk resource.

UPDATE, Aug. 1, 11:00 am MT: The upgrades to the login nodes have been completed and the nodes returned to users.

Two of the six Yellowstone login nodes have already been upgraded. The remaining four log in nodes are scheduled to be updated between 10 a.m. and noon on Thursday, August 1. We will issue a screen message before bringing those nodes down. We recommend that you log into yslogin3.ucar.edu or yslogin5.ucar.edu instead of yellowstone.ucar.edu on Thursday morning to avoid this disruption.

August 5, 2013

A new Yellowstone environment module (mpi4py/1.3.0) loads the MPI for Python package, enabling users to run Python programs on multiple processors (either single or multiple nodes). To load MPI for Python, first load Python after logging in to Yellowstone:

  • module load python

  • module load mpi4py (or load all-python-libs to get all of the Python packages and libraries)

A demo job script is available here:

  • /glade/apps/opt/mpi4py/1.3/gnu/4.7.2/demo/AAA.run_bw_latency.LSF

Also see these related web pages for more information:

August 5, 2013

CISL experienced some issues with receiving and delivering email starting Tuesday afternoon, July 30, through mid-morning Wednesday, July 31. No email was lost, but incoming email, including those sent to the CISL Help Desk, was queued up and outgoing mail was delayed, including email from the Daily Bulletin and Notifier.

We apologize for any delays in our response to user tickets. We will be working to catch up after email service is restored.

Pages