Daily Bulletin Archive

October 9, 2018

A semi-annual Mesa Lab building maintenance power-down scheduled for Saturday, October 13, should have little impact on university users of CISL’s high-end resources. Some Boulder-based UCAR/NCAR staff will be unable to log in to the Cheyenne system or other services with their authentication tokens, but sessions that start before the power-down will not be affected.

The power-down should otherwise not affect the Cheyenne, Casper, Geyser, and Caldera clusters, the GLADE system, or HPSS, which will remain in service at the NCAR-Wyoming Supercomputing Center (NWSC) in Cheyenne. The maintenance work is scheduled to begin at 6 a.m. and conclude by 6 p.m.

Some HPC support services and the HPSS disaster recovery resources that are housed at the Mesa Lab will be unavailable during the power-down. The affected services include the license servers for Mathematica and the PGI compilers, the CISL website, the ExtraView help desk ticketing system, and SAM accounting system. The license server supporting MATLAB users on Cheyenne will not be affected.

Users who have urgent help requests during this time should call 303-497-2400 or 307-996-4300 to reach the NWSC operations center.

October 4, 2018

NCAR’s new data analysis and visualization cluster, Casper, was released to the user community on Wednesday, October 3. See the Casper home page for documentation, which includes guidance for Geyser and Caldera users on preparing to transition to running jobs on Casper nodes.

An introductory Casper training workshop is scheduled for 9 a.m. Thursday, October 11. Get more information and register here.

Casper has 24 nodes featuring Intel’s new Skylake processors. Four of the system’s nodes feature large-memory, dense GPU configurations to support machine learning and deep learning in atmospheric and related sciences.

October 2, 2018

Registration is now open for the NCAR/CISL Consulting Services Group’s 45-minute Casper user tutorial at 9 a.m. MDT on Thursday, October 11. “Using Casper for Data Analysis and Visualization” will introduce the capabilities of the new Casper system, describe how to access its features, and provide some best practices. These topics will be covered in detail:

  • The three types of Casper nodes and their features

  • Accessing Casper resources using Slurm

  • Using X11 and VNC for visualization

  • Running code on the Casper GPUs

Register to attend in person—in the Damon Conference Room at NCAR’s Mesa Lab in Boulder—or attend online by selecting one of these links:

October 2, 2018

The Cheyenne, Geyser, and Caldera clusters and the GLADE file system will be unavailable on Tuesday, October 2, starting at approximately 7 a.m. MDT to allow CISL staff to perform system maintenance on important hardware and software components. The downtime is expected to last until approximately 6 p.m. but every effort will be made to return the system to service as soon as possible. The planned updates include the previously announced changes to GLADE file spaces and repairs to damaged InfiniBand switches.

A system reservation will prevent batch jobs from executing after 7 a.m. All batch queues will be suspended and the clusters’ login nodes will be unavailable throughout the update period. All batch jobs and interactive processes that are still executing when the outage begins will be killed.

CISL will inform users through the Notifier service when all of the systems are restored.

 

September 27, 2018

HPSS downtime: Tuesday, September 25th, 10:30 - 14:30 MDT 

No downtime: Cheyenne, GLADE, Geyser_Caldera

September 26, 2018

NCAR’s new data analysis and visualization cluster, Casper, will be released to all users on Wednesday, October 3. Casper has 24 nodes featuring Intel’s new Skylake processors. Four of the system’s nodes feature large-memory, dense GPU configurations to support machine learning and deep learning in atmospheric and related sciences. Users who were granted early access to Casper to test their applications and workflows have provided very positive feedback.

An introductory Casper training workshop is being scheduled for 9 a.m. on Thursday, October 11. Watch for more details in the Daily Bulletin later this week.

The Geyser and Caldera clusters will remain available until the end of 2018 when they will be decommissioned.

 

September 25, 2018

CISL has implemented and released a new version of the vncserver_submit script for launching VNC sessions on the data analysis and visualization clusters. The new version should improve the overall user experience on the Geyser and Caldera systems and will be compatible with the new Casper system when it launches.

The biggest improvement is that it will be easier to produce new one-time-password codes when returning to existing VNC sessions. You can read more about running the new version of the script in this updated documentation: Starting TurboVNC for Visualization Applications.

September 25, 2018

Many NCAR users are reporting long queue wait times for batch jobs. Notably, jobs requesting larger numbers of nodes are seeing significantly longer than expected wait times independent of the requested queue or wall clock time. Contrary to a number of reported concerns, CISL has not made any adjustments to Cheyenne’s job prioritization scheme or fairshare policy.

Cheyenne system utilization has climbed sharply from July’s daily average of about 60% to an average daily utilization of 96% in late August and early September. NCAR usage increased from 18.9 million core-hours in July to 31.9 million core-hours in August, while university and CSL groups also reached some of their highest monthly usage levels. During this period of high demand, NCAR has been hitting its targeted percentage of the system's delivered core-hours, and given these circumstances the scheduler’s fair share algorithm is functioning as designed and expected, with university, CSL, and Wyoming jobs being given higher priority than NCAR jobs.

We do recognize that ongoing hardware issues are causing longer job run times and some job failures, which exacerbate the backlog of queued jobs and wait times. Cheyenne continues to operate with several damaged InfiniBand switches, and replacement switches are scheduled to be installed during maintenance downtime scheduled for Tuesday, October 2.

 

September 21, 2018

HPSS downtime: Wednesday, Sep. 25th from 10:30 to 14:30 MDT 

No downtime: Cheyenne, GLADE, Geyser_Caldera

 

September 20, 2018

Major changes to the GLADE file spaces will be executed on Tuesday, October 2, as announced previously in the Daily Bulletin. The changes include:

  • /glade/scratch_old will be removed

  • /glade/p_old will become read-only

  • /glade/p_old/work will become read-only

Data remaining in the space being removed (decommissioned) will be deleted with no backups. Users should copy all valuable data from all of these old file spaces to their new spaces as soon as possible.

Pages