Daily Bulletin Archive

May 20, 2014

We are in the process of making corrections to past charges on Geyser/Caldera allocations after having found an error in the original calculation. The corrected calculation reflects the charging formula for shared jobs posted at https://www2.cisl.ucar.edu/resources/yellowstone/using_resources/queues_charges. Recent jobs have been charged using the corrected formula for over a month.

We have already examined the impact on existing allocations and made adjustments to prevent projects from becoming overspent due to the correction. Many other projects will see no impact on their usage or will see their posted usage decrease.

No action is needed on the part of users or project leads. Contact cislhelp@ucar.edu if you have any concerns or questions.

May 20, 2014

CU Research Computing is phasing in a new job scheduler to replace the Moab and Torque packages on the Janus cluster and other systems.

Janus users should prepare by June 3 to submit new jobs using the Simple Linux Utility for Resource Management (SLURM), an open-source cluster management and job scheduling system. Research Computing provides SLURM testing documentation here: https://www.rc.colorado.edu/support/examples/slurmtestjob. CISL's related documentation for NCAR users will be updated soon.

SLURM is said to be backward compatible with many basic Torque commands and directives, so many users will notice little or no difference in behavior.

May 19, 2014

As of 8:40 p.m., May 15, GLADE, Yellowstone, Geyser, Caldera and Pronghorn have been returned to production.

No files on GLADE appear to have been lost or corrupted. Of course, files that were open during the original power incident may have been lost. Please check your data files before submitting your jobs.

The length of the downtime was due to the extensive file system integrity checks that were performed on over 3 PB of data to ensure that no data loss had occurred.

We will provide additional information once we have time to review the details, now that the systems are back in production.

May 16, 2014

CISL and IBM staff have been working through the night, but the diagnostic work is still underway with no estimated time for bringing GLADE and then Yellowstone back into production.

We expect to be able to provide more information later this morning when the current round of diagnostics complete.

May 15, 2014

Following the GLADE outage yesterday, Yellowstone was returned to service around 4:30 p.m. MT with only the /glade/u (home directories) and /glade/scratch file systems mounted on the Yellowstone, Geyser and Caldera clusters.

There is currently a tremendous opportunity for users to make use of Yellowstone. User jobs should run as usual, as long as they do not try to access files in /glade/p project spaces or the /glade/p/work directories.

Attempts to access files in /glade/p will return error messages such as "No such file or directory" until /glade/p is remounted.

CISL is working with IBM, which has staff on site, to resolve the file system issue as soon as possible. At this time we have no estimate for when /glade/p will be available.

May 9, 2014

CISL recently asked HPSS users who had data stored on media called "B-tapes" to review those data to consider if any could be deleted rather than migrated to new tape library media. If you have already reviewed such holdings, thank you. If you have not yet done your review, please do so. Removing unnecessary files reduces your ongoing storage charges, accelerates the migration to new storage media, and lowers overall data storage and management costs.

To determine if any of your data are stored on B-tapes, see HPSS B-tape files. Because of the high cost of updating the B-tape lists, deletions that you have done already will not be reflected in the B-tape listings.

Contact cislhelp@ucar.edu if you need help moving or deleting large numbers of files.

May 6, 2014

The WRF Users' Workshop will take place June 23-27. Papers focusing on development and testing in all areas of model applications are especially encouraged. The early registration deadline is June 8. The deadline for submitting short abstracts is May 2 and extended abstracts will be due June 16. Authors may request either a poster or oral presentation, although posters are encouraged due to time constraints for oral sessions. The workshop will open June 23 with a half-day session on best practices for applying WRF, WRFDA, and WRF Chem and will close on June 27 with six tutorials. See WRF Users' Workshop for details.

May 6, 2014

Due to the number of users submitting production jobs to Yellowstone's 'small' queue -- which is intended for interactive use, debugging, and testing only -- the charging factor for the small queue will be increased to 1.5 on May 2. That is, 'small' queue jobs will be charged 1.5 times their standard core-hour cost.

The small queue will remain available for its intended purposes. Users not requiring such capabilities will be better served by the 'premium' queue, which is also charged 1.5 times the standard core-hour cost.

May 2, 2014

The default settings of the sort program on many Linux systems, including Yellowstone, cause it to ignore many non-alphanumeric characters. This can cause unexpected behavior, especially when you are sorting directory/file paths. To prevent this from happening, set the LC_COLLATE environment variable to C as follows; note that doing so may affect other Linux programs.

  • For tcsh users: setenv LC_COLLATE C
  • For bash users: export LC_COLLATE=C

Users who receive lists of orphan files from CISL do not need to do this because those lists are properly sorted to begin with.

April 21, 2014

An XSEDE workshop May 7 and 8 will give C and Fortran programmers a hands-on introduction to MPI programming. Attendees will leave with a working knowledge of how to write scalable codes using MPI – the standard programming tool of scalable parallel computing. The workshop is not available via webcast but will be telecast to several satellite sites. See the XSEDE portal for details.