Daily Bulletin Archive

October 26, 2018

Registration is now open for the NCAR/CISL Consulting Services Group’s 45-minute Casper user tutorial at 2:30 p.m. MST on Wednesday, November 14. The tutorial will introduce the capabilities of the new Casper system, describe how to access its features, and provide some best practices. These topics will be covered in detail:

  • The four types of Casper nodes and their features

  • Accessing Casper resources using Slurm

  • Interactive jobs and remote virtual desktops (VNC)

  • Using the GPU capabilities of Casper

Register to attend in person – in the Damon Conference Room at NCAR’s Mesa Lab in Boulder – or attend online by selecting one of these links:

October 24, 2018

A new lightweight, easy-to-use command is available for reporting Cheyenne job failures. The reportfailure command is intended for reporting failures that users suspect were caused by system issues such as node problems.

To report a suspected failure, from your Cheyenne command prompt enter reportfailure followed by a one-line description that includes the job ID. Here are two examples:

  • reportfailure  Job 1234567  MPT Warning, timeout with communication to r3i7n21

  • reportfailure  MPT: shepherd terminated: r2i2n21.ib0.cheyenne.ucar.edu   JobID: 2705231

The command is not intended to be a substitute for opening ExtraView tickets and users will not receive a response. Rather, use of reportfailure will help CISL more quickly identify and respond to potential system issues. If you have questions about this, contact CISL at 303-497-2400 or cislhelp@ucar.edu.

 

October 23, 2018

GLADE users who have not already done so need to take appropriate action soon as a result of several recent and important changes to GLADE’s project and work spaces.

As announced previously, the GLADE project and work spaces in /glade/p_old/ are now read-only and users can no longer move or delete files from those spaces. They will be decommissioned December 31 and all files will be permanently removed from the system. Users who still have files in /glade/p_old/ or /glade/p_old/work should copy them to one of the new storage systems as soon as possible.

Project spaces

CISL recommends moving active project data to /glade/p/<entity>/<project_code> where entity can be univ, uwyo, cesm, mmm, nsc, or other designated NCAR lab or special program. A one-year purge policy will be enforced on files in those new spaces, meaning files that are not accessed for more than one year will be deleted.

Project data that are not active but need to be preserved should be moved to the Campaign Storage archive. Users access and manage their Campaign Storage files with Globus services. A five-year purge policy will be enforced on Campaign Storage effective from the date files are created in that archive.

Work spaces

All users now have individual directories in /glade/work with 1-TB quotas. Files in those directories are not purged. Users should copy any files they need from their /glade/p_old/work/ directories to their new /glade/work directories before December 31.

These and other updates to storage systems have been published in table format here.

Contact cislhelp@ucar.edu with questions or for help copying files.
 

October 15, 2018

A recording of the October 11 NCAR/CISL tutorial, “Using Casper for Data Analysis and Visualization,” is now available on the CISL web site. See the course page to review the presentation and download the slides. The presentation introduces the capabilities of the new Casper system, describes how to access its features, and provides some best practices.

Topics covered include:

  • The three types of Casper nodes and their features

  • Accessing Casper resources using Slurm

  • Using X11 and VNC for visualization

  • Running code on the Casper GPUs

October 12, 2018

Intel software engineers will conduct a training class titled “Intel developer tools training for research computing” on Thursday, October 18, from 10 a.m. to approximately 4 p.m. MDT. The class is open to all UCAR and NCAR employees and will be held at the University of Colorado’s Boulder East Campus at 3100 Marine St., Rooms 646a/b.  Registration is not required and the class will not be broadcast or recorded.

The announced agenda is:

10:00    Intel Distribution for Python

11:15    Intel VTune and Analysis Tools:

  • Intel Inspector

  • Intel Roofline Analysis

  • Intel Advisor

  • Intel Platform Profiler

12:00    Lunch

 1:00    Intel VTune and Analysis Tools (continued)

 2:30    Intel Performance Libraries:

  • Intel Math Kernel Library (MKL)

  • Intel Threading Building Blocks (TBB)

  • Intel Data Analytics Acceleration Library (DAAL)

  • Intel Performance Primitives (IPP) – Overview

 3:30      Intel Open Vino Overview and What’s new in 2019

 4:00     Q&A

October 12, 2018

Early career women working in climate science are encouraged to apply by November 18 to attend the Women in Math and Public Policy workshop January 22-25, 2019, in Los Angeles. The workshop is designed to bring together women in mathematics, science, engineering, and policy to work on pressing research topics in the fields of cybersecurity and climate change. The workshop offers opportunities to work on research projects in small groups in addition to networking and talks by keynote speakers.

While participation in the group projects is by invitation only, the keynote lectures by Lucy Jones (Caltech) and Kristin Lauter (Microsoft Research) will be open to the public. For more information, see Women in Mathematics and Public Policy.

October 9, 2018

A semi-annual Mesa Lab building maintenance power-down scheduled for Saturday, October 13, should have little impact on university users of CISL’s high-end resources. Some Boulder-based UCAR/NCAR staff will be unable to log in to the Cheyenne system or other services with their authentication tokens, but sessions that start before the power-down will not be affected.

The power-down should otherwise not affect the Cheyenne, Casper, Geyser, and Caldera clusters, the GLADE system, or HPSS, which will remain in service at the NCAR-Wyoming Supercomputing Center (NWSC) in Cheyenne. The maintenance work is scheduled to begin at 6 a.m. and conclude by 6 p.m.

Some HPC support services and the HPSS disaster recovery resources that are housed at the Mesa Lab will be unavailable during the power-down. The affected services include the license servers for Mathematica and the PGI compilers, the CISL website, the ExtraView help desk ticketing system, and SAM accounting system. The license server supporting MATLAB users on Cheyenne will not be affected.

Users who have urgent help requests during this time should call 303-497-2400 or 307-996-4300 to reach the NWSC operations center.

October 4, 2018

NCAR’s new data analysis and visualization cluster, Casper, was released to the user community on Wednesday, October 3. See the Casper home page for documentation, which includes guidance for Geyser and Caldera users on preparing to transition to running jobs on Casper nodes.

An introductory Casper training workshop is scheduled for 9 a.m. Thursday, October 11. Get more information and register here.

Casper has 24 nodes featuring Intel’s new Skylake processors. Four of the system’s nodes feature large-memory, dense GPU configurations to support machine learning and deep learning in atmospheric and related sciences.

October 2, 2018

Registration is now open for the NCAR/CISL Consulting Services Group’s 45-minute Casper user tutorial at 9 a.m. MDT on Thursday, October 11. “Using Casper for Data Analysis and Visualization” will introduce the capabilities of the new Casper system, describe how to access its features, and provide some best practices. These topics will be covered in detail:

  • The three types of Casper nodes and their features

  • Accessing Casper resources using Slurm

  • Using X11 and VNC for visualization

  • Running code on the Casper GPUs

Register to attend in person—in the Damon Conference Room at NCAR’s Mesa Lab in Boulder—or attend online by selecting one of these links:

October 2, 2018

The Cheyenne, Geyser, and Caldera clusters and the GLADE file system will be unavailable on Tuesday, October 2, starting at approximately 7 a.m. MDT to allow CISL staff to perform system maintenance on important hardware and software components. The downtime is expected to last until approximately 6 p.m. but every effort will be made to return the system to service as soon as possible. The planned updates include the previously announced changes to GLADE file spaces and repairs to damaged InfiniBand switches.

A system reservation will prevent batch jobs from executing after 7 a.m. All batch queues will be suspended and the clusters’ login nodes will be unavailable throughout the update period. All batch jobs and interactive processes that are still executing when the outage begins will be killed.

CISL will inform users through the Notifier service when all of the systems are restored.

 

Pages