Daily Bulletin Archive

April 27, 2020

Scheduled downtimes (tentative): Cheyenne login nodes 8 p.m. April 29; brief outage for Cheyenne service nodes at noon April 30. More details will be provided before those events in the CISL Daily Bulletin.

April 24, 2020

Video and slides from the April 22 tutorial about using NCAR data storage resources are now available here in the CISL training library. The presentation by Mick Coady of the CISL Consulting Services Group covered these topics: GLADE file spaces, Campaign Storage, HPSS file migration, data collections, and support for managing data.

See these other CISL web pages for related documentation:

April 23, 2020

Registration is open until May 1 for a two-day MPI workshop presented by XSEDE and the Pittsburgh Supercomputing Center. The online workshop on May 5 and 6 is intended to give C and Fortran programmers a hands-on introduction to MPI programming.

Both days are compact to accommodate multiple time zones, but packed with useful information and lab exercises. Attendees will leave with a working knowledge of how to write scalable codes using MPI, the standard programming tool of scalable parallel computing.

The schedule and registration link are available here.

April 22, 2020

Cheyenne user documentation regarding PBS job scripts has been updated to reflect a new recommendation from CISL staff for how to write standard output and error messages. 

The recommendation is to add directive #PBS -k eod to scripts to reduce the risk of job failures that kill Cheyenne nodes. With the change, the job will write output directly to its final destination directory as it progresses rather than only when the job ends. This will prevent the node from being killed if the user’s application writes too much data, which is a common cause of Cheyenne node failures. In addition, if a node dies at the end of the job, using this directive will increase the likelihood that error messages will be preserved.

A previous version of this announcement recommended replacing the #PBS -j oe directive – which combines stdout and stderr into one file – with #PBS -k eod. In fact, both directives may be used in the same job script to simultaneously join output and error logs and write logs directly to the destination directory. Please use #PBS -k eod in all job scripts, whether or not they include #PBS -j oe.

April 21, 2020

A limit of 36 concurrent running jobs per user is now in effect on the Casper cluster. There was no limit previously to how many jobs a user could run concurrently. The limit was implemented to address the problem of users submitting large numbers of jobs – sometimes numbering in the thousands – and making the system unusable for other users for many hours. Jobs submitted in excess of the 36-job limit will be put into “Pending” state by the Slurm workload manager. As running jobs complete, pending jobs will be released for execution in the order in which they were submitted.

CISL will continue to carefully monitor Casper’s workload and may make adjustments to the job limit as necessary.

April 20, 2020

No scheduled downtime: Cheyenne, Casper, Campaign Storage, GLADE.

Scheduled downtime for HPSS from April 21st 7:00 a.m. to April 21st 9:00 a.m.

April 16, 2020

Users are reminded that the High Performance Storage System (HPSS) will reach its end of life and be decommissioned in 2021. HPSS file owners and project leads were contacted individually earlier this year and instructed on how to access lists of their files. The lists are updated weekly.

For reference, the lists can be found here:

  • /glade/work/csgteam/hpssreports/current/byusers/<userID>.data.gz
  • /glade/work/csgteam/hpssreports/current/byprojects/<projectID>.data.gz

Writing HPSS files is no longer possible, but users can perform most common metadata operations on their HPSS holdings, including deleting, renaming, and moving files. Those who have not already done so should begin deleting files that are no longer needed and moving other data to alternative storage systems.

Documentation and training is available on recommended processes for identifying and organizing HPSS holdings; copying files that need to be preserved to another storage resource; and deleting files that are no longer needed. 

Please contact CISL for advice on individual workflows and storage options.

April 16, 2020

Video and slides from the April 7 tutorial about using remote desktop services for work on Casper are now available here in the CISL training library. The presentation by Sidd Ghosh of the CISL Consulting Services Group shows how to use FastX and VNC remote desktop services to work on the Casper data analysis and visualization cluster.

See these other CISL web pages for related documentation:

April 14, 2020

Nominations are open until April 30 for SIGHPC’s international program of graduate fellowships in computational and data science. The ACM SIGHPC Computational & Data Science Fellowships were created to increase the diversity of students pursuing graduate degrees in data science and computational science, including women and students from racial/ethnic backgrounds that have not traditionally participated in the computing field. The program supports students pursuing degrees at institutions anywhere in the world.

Interested faculty advisors and students can find more information on the fellowships, including a description of the online nomination process, here: Meeting Your Needs - Computational & Data Science Fellowships.

April 13, 2020

No scheduled downtime: Cheyenne, Casper, Campaign Storage, GLADE, HPSS