Daily Bulletin Archive

November 21, 2013

Using a command file when submitting an LSF batch job can help you run numerous independent tasks in parallel and make the most efficient use of Yellowstone compute nodes. If you have a serial data processing script, for example, and need to run many copies of it to process multiple data files, you can create a command file and job script that let you pack each node with the appropriate number of tasks and run them all in parallel. See CISL's Platform LSF tips for an example of how to use command files in batch jobs.

November 15, 2013

No Scheduled Downtime: Yellowstone, Geyser_Caldera, HPSS, GLADE

November 7, 2013

CISL will return the GLADE scratch retention period to 90 days on November 6. Users should evaluate their older files in scratch and make sure that they have taken steps to preserve necessary files. GLADE scratch space is not backed up.

Prior to the Yellowstone re-cabling outage at the start of October, the retention period was extended to 120 days to ensure no files were deleted during the re-cabling outage. With the re-cabling completed, we are restoring the original scratch retention policy.

November 6, 2013

The Yellowstone, Geyser, and Caldera systems will be unavailable from 6 a.m. to 2 p.m. Mountain time on Tuesday, November 5. CISL staff will use this scheduled downtime to perform a variety of minor system configuration, software, and firmware updates that have been recommended by IBM and tested by CISL to resolve a number of issues. The nature of the changes requires the batch nodes to be rebooted, so we need to take a downtime.

November 1, 2013

The CISL Consulting Services Group is working with IBM and Rogue Wave Software to resolve an issue that prevents the TotalView debugger from working properly. Yellowstone users will be notified when the issue has been resolved.

October 25, 2013

Slide presentations from the October 11 CISL seminar on Yellowstone experiences and practices are available now on the CISL web site.

Michael Wiltberger (HAO) discussed using ParaView in the Yellowstone environment and Craig Schwartz (MMM) demonstrated submitting jobs using LSF array syntax. Rory Kelly (CISL/CSG) presented additional tips for submitting and managing batch jobs.

Based on the presentations and user input during the seminar, the User Services Section identified some software updates and user environment changes that are in the works. A new Platform LSF tips web page also has been added to CISL's Yellowstone user documentation based on input from various seminar participants.

To share your own ideas regarding our user environment or documentation, submit your comments on our Feedback form or email cislhelp@ucar.edu.

October 25, 2013

Ian Truslove and Erik Jasiak of the National Snow and Ice Data Center (NSIDC) will present a seminar on testing scientific code at 3 p.m. Thursday, October 24, in the Mesa Lab’s Main Seminar Room.

As their abstract states, computation and programming are increasingly inescapable in modern Earth sciences, but scientists and researchers often receive little or no formal software engineering or programming training. Research shows that computational and data errors contribute to high-profile retractions and to disappointingly low rates of repeatability in academic papers. These results increase the onus on researchers to write more repeatable, reliable, even reusable programs.

The presenters will discuss their experience with unit testing, test-driven development, and behavior-driven development, and will recommend some techniques that scientists and research programmers can use in their day-to-day programming. See http://sea.ucar.edu/event/testing-scientific-code for details.

October 21, 2013

James Kinter, director of the Center for Ocean-Land-Atmosphere Studies (COLA), will discuss the COLA team's Accelerated Scientific Discovery (ASD) project in a seminar from 10 to 11:30 a.m. MT on Friday, October 18, in the NCAR Mesa Lab Main Seminar Room. The COLA project was among the first to put CISL's Yellowstone system through its paces.

The COLA experiments on Yellowstone represent a continuation of the highly successful Project Athena, an international collaboration between COLA and the European Center for Medium-Range Weather Forecasts (ECMWF), in support of both centers’ ongoing efforts to understand and quantify predictability in the weather and climate system from daily to interannual time scales. Building upon the results of Project Athena, the team has explored the impact of increased atmospheric resolution on model fidelity and prediction skill in a coupled, seamless framework.

The presentation will be webcast for those unable to attend in person. To view the webcast, visit http://www.fin.ucar.edu/it/mms/ml-live.htm
.

October 18, 2013

No Scheduled Downtime: Yellowstone, HPSS, Geyser_Caldera, GLADE

October 14, 2013

The CISL, IBM, and Mellanox team is pleased to return the Yellowstone system to users as of 12:10 pm MT, October 9, approximately 10 days ahead of schedule. CISL staff will be closely monitoring the system to ensure its health and stability under user workload. Users can now resume logging in to yellowstone.ucar.edu.

The recabling work was successful in producing a healthy InfiniBand fabric that in benchmarks and tests has performed as well or better than before.

CISL and IBM are continuing to investigate pre-existing software issues related to running very large jobs -- typically 2,048 nodes or more. Such jobs are now succeeding more regularly, but some jobs still encounter errors that are being diagnosed.

Next week, IBM staff will be at the Mesa Lab to work closely with CISL on troubleshooting these large-job issues. As part of this effort, CISL and IBM will need to run large-scale jobs during the week. We will do our best to minimize the impact to users during the remainder of the original three weeks planned for the recabling downtime.

Thank you for your patience and understanding during this large-scale replacement effort.

Pages