Daily Bulletin Archive

May 23, 2013

The CISL Resource Status web page now shows near real-time activity in the Yellowstone environment’s job queues to help users identify opportunities to submit jobs and determine which queue to use when submitting jobs. Updated every three minutes, it displays the number of running and pending jobs for each queue, the number of nodes being used, and the number of active users. Also see Queues and charges for help with queue selection.

May 23, 2013

Storing large files in your GLADE file spaces is more efficient than storing numerous small files. This is because the system allocates a minimum amount of space for each file, no matter how small. On /glade/scratch, for example, the smallest amount of space the system can allocate to a file is 128 KB. Any files smaller than 128 KB are still allocated 128 KB, so they require more space than you might expect.

See CISL best practices for more details and to learn how to make the best use of your computing and storage allocations.

May 17, 2013

The /glade/p file system was returned to service and remounted on the Yellowstone nodes at 8:30 p.m. May 16. Pending jobs resumed running and most completed overnight.

During the afternoon, while /glade/p was down users may have noticed jobs switching repeatedly from a RUN to PEND state and back again. This was due to LSF proactively working to keep jobs off of nodes that did not have the /glade/p file system mounted. All such jobs should have run successfully once /glade/p was remounted. We will be looking at ways to allow jobs to safely run with only some of the GLADE spaces mounted.

May 13, 2013

In order to temporarily mitigate the failure/slowdown of large jobs distributed over a large part of the Yellowstone network fabric, the capability queue will be restricted to about 1500 nodes defined over a limited part of the network fabric.

We hope that jobs submitted on capability requiring less than 1500 nodes will run fine, but jobs that need more than 1500 nodes will not get scheduled. Once we sort out our fabric issue this restriction will be removed. We sincerely apologize for any inconvenience caused by this measure.

May 10, 2013

No Scheduled Downtime: Yellowstone, HPSS, Geyser_Caldera, GLADE, Lynx

May 7, 2013

Videos and slide presentations from talks at the recent SEA Software Engineering Conference 2013 are now available at https://sea.ucar.edu/conference/2013. The conference included 30-minute presentations on many topics of interest to Yellowstone users, including several on scalable HPC profilers and tools such as Eclipse, Scalasca, TAU, and the Score-P run-time measurement system. (If no video is visible on a page you select, change “https” to “http” in the URL.)

Extra handouts from the one- and two-day conference tutorials are available to walk-ins through May 3 in Mesa Lab Suite 55. Copies will be sent via black bag to Center Green and Foothills campuses on request to cislhelp@ucar.edu.

April 29, 2013

Due to an unintended change within the administrative functions of SAM, the system default shell was changed from tcsh to bash around 2 p.m. on Tuesday. Yellowstone users who were using the system default shell for Yellowstone may have noticed the change.

We have restored the default shell to tcsh for Yellowstone and that change has now propagated back out to Yellowstone. We have taken steps to prevent such a change from being made unintentionally in the future. We apologize for the inconvenience.

April 25, 2013

For the past two days, Yellowstone utilization has been hovering around 50%. The system is ready and waiting for user jobs.

If you are encountering any issues with getting started on Yellowstone or having Yellowstone complete your jobs, please contact cislhelp@ucar.edu.

April 25, 2013

No Scheduled Downtime: Yellowstone, Geyser_Caldera, HPSS, GLADE, Lynx

April 23, 2013

NCAR's Mesa Lab data center will be undergoing significant electrical work during the semi-annual power-down scheduled for Saturday, April 20, from 6 a.m to 6 p.m. While NWSC will not be directly affected, users will notice several effects of this maintenance period:

* During this outage, HPSS will be taken out of service to eliminate the possibility of errors or confusion while the Mesa Lab tape libraries are powered down.

* Some YubiKeys and CryptoCARDs will not work, including many of those belonging to UCAR staff, while the NCAR RADIUS server at Mesa Lab is out of service. The RADIUS server is slated to be returned to service as early as possible after the maintenance work has been completed.

* CISL's GridFTP service (and GlobusOnline) will be unavailable from 5 p.m. April 19 through 12 p.m. April 21. The GridFTP service is scheduled to be migrated to NWSC, but not until after the power-down.

* CISL's license server, which enables software including Matlab, IDL, and Fluent to be used on Yellowstone, Geyser, and Caldera, is also in the process of being migrated to NWSC. Licensed software may also not be usable from 5 p.m. April 19 through 12 p.m. April 21.

* Finally, CISL user support relies on a number of services that will also be down, including the ExtraView ticket system, Notifier, email, and the CISL web site. The CISL Help Desk phone line, 303-497-2400, will still be available for urgent situations, but note that Saturday is outside of normal business hours.

Aside from these Mesa Lab-housed services, Yellowstone, Geyser, Caldera, and GLADE will continue to operate normally during the power-down. Users are encouraged to submit jobs to be run over the April 20 weekend, though users may need to work around the HPSS outage.