Daily Bulletin Archive

May 29, 2013

CISL staff rebooted the main Ethernet switch for the Yellowstone management network this morning, May 28, at 8 a.m. MT. The reboot of the switch completed at approximately 9:20 a.m., which restored traffic to a more normal state, though CISL is continuing to follow up with Juniper.

CISL has confirmed that compilation and license access has been improved to all Yellowstone login, Geyser, and Caldera nodes.

Users continuing to experience problems with licensed software should contact cislhelp@ucar.edu.

May 28, 2013

We have a workaround in place in our license setup. Compilation will now work on login nodes 3 through 6, Geyser, and Caldera though at a slower speed than yslogin1 and yslogin2.

As the final fix for the compilation, we will be rebooting the main Ethernet switch for the Yellowstone management network on Tuesday morning, May 28, at 8 a.m. MT.

The reboot should not affect any running jobs or any ongoing user sessions with licenses already checked out. However, during the reboot, users will not be able to check out new licenses or start new compile tasks. Since a workaround is available, we are waiting until after the holiday weekend to ensure CISL staff can be on hand to monitor the system during and after the reboot.

May 24, 2013

The UCAR Software Engineering Assembly (SEA) and CISL Consulting Services Group (CSG) are offering High Performance Computing and Software Carpentry workshops from Tuesday, May 21, through Friday, May 24, to help participants acquire essential knowledge and skills for working with supercomputers. The workshops will be presented by CSG members and Alex Viana and Ted Hart from Software Carpentry. See CSG Summer Training for details.

May 23, 2013

The CISL Resource Status web page now shows near real-time activity in the Yellowstone environment’s job queues to help users identify opportunities to submit jobs and determine which queue to use when submitting jobs. Updated every three minutes, it displays the number of running and pending jobs for each queue, the number of nodes being used, and the number of active users. Also see Queues and charges for help with queue selection.

May 23, 2013

Storing large files in your GLADE file spaces is more efficient than storing numerous small files. This is because the system allocates a minimum amount of space for each file, no matter how small. On /glade/scratch, for example, the smallest amount of space the system can allocate to a file is 128 KB. Any files smaller than 128 KB are still allocated 128 KB, so they require more space than you might expect.

See CISL best practices for more details and to learn how to make the best use of your computing and storage allocations.

May 17, 2013

The /glade/p file system was returned to service and remounted on the Yellowstone nodes at 8:30 p.m. May 16. Pending jobs resumed running and most completed overnight.

During the afternoon, while /glade/p was down users may have noticed jobs switching repeatedly from a RUN to PEND state and back again. This was due to LSF proactively working to keep jobs off of nodes that did not have the /glade/p file system mounted. All such jobs should have run successfully once /glade/p was remounted. We will be looking at ways to allow jobs to safely run with only some of the GLADE spaces mounted.

May 13, 2013

In order to temporarily mitigate the failure/slowdown of large jobs distributed over a large part of the Yellowstone network fabric, the capability queue will be restricted to about 1500 nodes defined over a limited part of the network fabric.

We hope that jobs submitted on capability requiring less than 1500 nodes will run fine, but jobs that need more than 1500 nodes will not get scheduled. Once we sort out our fabric issue this restriction will be removed. We sincerely apologize for any inconvenience caused by this measure.

May 10, 2013

No Scheduled Downtime: Yellowstone, HPSS, Geyser_Caldera, GLADE, Lynx

May 7, 2013

Videos and slide presentations from talks at the recent SEA Software Engineering Conference 2013 are now available at https://sea.ucar.edu/conference/2013. The conference included 30-minute presentations on many topics of interest to Yellowstone users, including several on scalable HPC profilers and tools such as Eclipse, Scalasca, TAU, and the Score-P run-time measurement system. (If no video is visible on a page you select, change “https” to “http” in the URL.)

Extra handouts from the one- and two-day conference tutorials are available to walk-ins through May 3 in Mesa Lab Suite 55. Copies will be sent via black bag to Center Green and Foothills campuses on request to cislhelp@ucar.edu.

April 29, 2013

Due to an unintended change within the administrative functions of SAM, the system default shell was changed from tcsh to bash around 2 p.m. on Tuesday. Yellowstone users who were using the system default shell for Yellowstone may have noticed the change.

We have restored the default shell to tcsh for Yellowstone and that change has now propagated back out to Yellowstone. We have taken steps to prevent such a change from being made unintentionally in the future. We apologize for the inconvenience.