Daily Bulletin Archive

October 12, 2018

Intel software engineers will conduct a training class titled “Intel developer tools training for research computing” on Thursday, October 18, from 10 a.m. to approximately 4 p.m. MDT. The class is open to all UCAR and NCAR employees and will be held at the University of Colorado’s Boulder East Campus at 3100 Marine St., Rooms 646a/b.  Registration is not required and the class will not be broadcast or recorded.

The announced agenda is:

10:00    Intel Distribution for Python

11:15    Intel VTune and Analysis Tools:

  • Intel Inspector

  • Intel Roofline Analysis

  • Intel Advisor

  • Intel Platform Profiler

12:00    Lunch

 1:00    Intel VTune and Analysis Tools (continued)

 2:30    Intel Performance Libraries:

  • Intel Math Kernel Library (MKL)

  • Intel Threading Building Blocks (TBB)

  • Intel Data Analytics Acceleration Library (DAAL)

  • Intel Performance Primitives (IPP) – Overview

 3:30      Intel Open Vino Overview and What’s new in 2019

 4:00     Q&A

October 12, 2018

Early career women working in climate science are encouraged to apply by November 18 to attend the Women in Math and Public Policy workshop January 22-25, 2019, in Los Angeles. The workshop is designed to bring together women in mathematics, science, engineering, and policy to work on pressing research topics in the fields of cybersecurity and climate change. The workshop offers opportunities to work on research projects in small groups in addition to networking and talks by keynote speakers.

While participation in the group projects is by invitation only, the keynote lectures by Lucy Jones (Caltech) and Kristin Lauter (Microsoft Research) will be open to the public. For more information, see Women in Mathematics and Public Policy.

October 9, 2018

A semi-annual Mesa Lab building maintenance power-down scheduled for Saturday, October 13, should have little impact on university users of CISL’s high-end resources. Some Boulder-based UCAR/NCAR staff will be unable to log in to the Cheyenne system or other services with their authentication tokens, but sessions that start before the power-down will not be affected.

The power-down should otherwise not affect the Cheyenne, Casper, Geyser, and Caldera clusters, the GLADE system, or HPSS, which will remain in service at the NCAR-Wyoming Supercomputing Center (NWSC) in Cheyenne. The maintenance work is scheduled to begin at 6 a.m. and conclude by 6 p.m.

Some HPC support services and the HPSS disaster recovery resources that are housed at the Mesa Lab will be unavailable during the power-down. The affected services include the license servers for Mathematica and the PGI compilers, the CISL website, the ExtraView help desk ticketing system, and SAM accounting system. The license server supporting MATLAB users on Cheyenne will not be affected.

Users who have urgent help requests during this time should call 303-497-2400 or 307-996-4300 to reach the NWSC operations center.

October 4, 2018

NCAR’s new data analysis and visualization cluster, Casper, was released to the user community on Wednesday, October 3. See the Casper home page for documentation, which includes guidance for Geyser and Caldera users on preparing to transition to running jobs on Casper nodes.

An introductory Casper training workshop is scheduled for 9 a.m. Thursday, October 11. Get more information and register here.

Casper has 24 nodes featuring Intel’s new Skylake processors. Four of the system’s nodes feature large-memory, dense GPU configurations to support machine learning and deep learning in atmospheric and related sciences.

October 2, 2018

Registration is now open for the NCAR/CISL Consulting Services Group’s 45-minute Casper user tutorial at 9 a.m. MDT on Thursday, October 11. “Using Casper for Data Analysis and Visualization” will introduce the capabilities of the new Casper system, describe how to access its features, and provide some best practices. These topics will be covered in detail:

  • The three types of Casper nodes and their features

  • Accessing Casper resources using Slurm

  • Using X11 and VNC for visualization

  • Running code on the Casper GPUs

Register to attend in person—in the Damon Conference Room at NCAR’s Mesa Lab in Boulder—or attend online by selecting one of these links:

October 2, 2018

The Cheyenne, Geyser, and Caldera clusters and the GLADE file system will be unavailable on Tuesday, October 2, starting at approximately 7 a.m. MDT to allow CISL staff to perform system maintenance on important hardware and software components. The downtime is expected to last until approximately 6 p.m. but every effort will be made to return the system to service as soon as possible. The planned updates include the previously announced changes to GLADE file spaces and repairs to damaged InfiniBand switches.

A system reservation will prevent batch jobs from executing after 7 a.m. All batch queues will be suspended and the clusters’ login nodes will be unavailable throughout the update period. All batch jobs and interactive processes that are still executing when the outage begins will be killed.

CISL will inform users through the Notifier service when all of the systems are restored.

 

September 27, 2018

HPSS downtime: Tuesday, September 25th, 10:30 - 14:30 MDT 

No downtime: Cheyenne, GLADE, Geyser_Caldera

September 26, 2018

NCAR’s new data analysis and visualization cluster, Casper, will be released to all users on Wednesday, October 3. Casper has 24 nodes featuring Intel’s new Skylake processors. Four of the system’s nodes feature large-memory, dense GPU configurations to support machine learning and deep learning in atmospheric and related sciences. Users who were granted early access to Casper to test their applications and workflows have provided very positive feedback.

An introductory Casper training workshop is being scheduled for 9 a.m. on Thursday, October 11. Watch for more details in the Daily Bulletin later this week.

The Geyser and Caldera clusters will remain available until the end of 2018 when they will be decommissioned.

 

September 25, 2018

CISL has implemented and released a new version of the vncserver_submit script for launching VNC sessions on the data analysis and visualization clusters. The new version should improve the overall user experience on the Geyser and Caldera systems and will be compatible with the new Casper system when it launches.

The biggest improvement is that it will be easier to produce new one-time-password codes when returning to existing VNC sessions. You can read more about running the new version of the script in this updated documentation: Starting TurboVNC for Visualization Applications.

September 25, 2018

Many NCAR users are reporting long queue wait times for batch jobs. Notably, jobs requesting larger numbers of nodes are seeing significantly longer than expected wait times independent of the requested queue or wall clock time. Contrary to a number of reported concerns, CISL has not made any adjustments to Cheyenne’s job prioritization scheme or fairshare policy.

Cheyenne system utilization has climbed sharply from July’s daily average of about 60% to an average daily utilization of 96% in late August and early September. NCAR usage increased from 18.9 million core-hours in July to 31.9 million core-hours in August, while university and CSL groups also reached some of their highest monthly usage levels. During this period of high demand, NCAR has been hitting its targeted percentage of the system's delivered core-hours, and given these circumstances the scheduler’s fair share algorithm is functioning as designed and expected, with university, CSL, and Wyoming jobs being given higher priority than NCAR jobs.

We do recognize that ongoing hardware issues are causing longer job run times and some job failures, which exacerbate the backlog of queued jobs and wait times. Cheyenne continues to operate with several damaged InfiniBand switches, and replacement switches are scheduled to be installed during maintenance downtime scheduled for Tuesday, October 2.

 

Pages