Daily Bulletin Archive

Feb. 26, 2018

User sessions that consume excessive resources on the Cheyenne system’s login nodes will be killed automatically beginning Monday, February 26, to ensure an appropriate balance between user convenience and login node performance. Users whose sessions are killed will be notified by email.

Misuse of the login nodes can significantly slow response times and increase the difficulty of using the nodes for their main purposes, which include submitting batch jobs, editing scripts, and other processes that consume only modest resources. Some Cheyenne users have been running intense computing, processing, file transfer, and compilation jobs from the command line on those nodes.

Users are encouraged to compile large codes on the Cheyenne batch nodes or the Geyser or Caldera clusters, depending on where they want to run their programs. CISL provides the qcmd script for running CESM and WRF builds and other compiles as well as running compute jobs on batch nodes. Other resource-intensive work such as R and Python jobs that use large amounts of memory and/or processing power can be run efficiently in the Cheyenne “share” queue. Users can contact the Consulting Services Group for assistance.

Feb. 23, 2018

A job-dependency issue in the PBS Pro workload management system that is used for scheduling jobs on Cheyenne sometimes mistakenly allows dependent jobs to run out of sequence. This occurs when such jobs that are in hold status (H) are released.

CISL and the vendor are working on a solution. In the meantime, CISL recommends submitting dependent jobs manually as their parent jobs finish, particularly if running them out of sequence will cause extra cleanup work or damage control. Contact the CISL Consulting Services Group with any questions or requests for assistance.

Feb. 22, 2018

Data sets that are provided to researchers through the CMIP Analysis Platform can now be found on the GLADE disk storage system in /glade2/collections/cmip. The original location (/glade/p/CMIP) will be removed on February 28.

By hosting climate data on GLADE, the CMIP Analysis Platform enables researchers  to work with it on the Geyser and Caldera analysis and visualization clusters without needing to transfer large data sets from Earth System Grid Federation (ESGF) sites to their local machines.

See Adding data sets to request the addition of data sets that are not already available on GLADE.

Feb. 22, 2018

Documentation for how to use CISL’s peak_memusage tool now includes information about running it with Slurm jobs on Geyser and Caldera. Examples of PBS sample scripts for Cheyenne jobs also have been updated. The utility helps users determine how much memory a program needs in order to run successfully. See Checking memory use for details

Feb. 22, 2018

Cheyenne users have increasingly been misusing the system’s login nodes by running intense computing, processing, file transfer, and compilation jobs from the command line on those nodes. This significantly slows response time for others and increases the difficulty of using the login nodes for their main purposes, which include logging in, editing scripts, and other processes that consume only modest resources.

As noted here, use of the login nodes is restricted to running processes that do not consume excessive resources in order to ensure an appropriate balance between user convenience and login node performance. As the situation has become acute recently, users who run jobs that consume excessive resources on the Cheyenne login nodes will have their jobs killed.

Users are encouraged to compile on the Cheyenne batch nodes or the Geyser or Caldera clusters, depending on where they want to run their programs. CISL provides the qcmd script for running CESM and WRF builds and other compiles in addition to compute jobs on batch nodes. Other resource-intensive work such as R and Python jobs that spawn hundreds of files can be run efficiently in the Cheyenne “share” queue. Large file transfers are best done using Globus.

Contact the Consulting Services Group for information if you need help using the Cheyenne batch queues or Globus, or if you would like to discuss what is meant by modest usage of the login nodes.

Feb. 20, 2018

No downtime: Cheyenne, GLADE, Geyser_Caldera and HPSS

Feb. 20, 2018

CISL will reactivate the purge policy for the GLADE scratch file space on Wednesday, February 7. The purge policy was turned off following the December 30 power outage at the NWSC facility so that users would not suddenly lose files when Cheyenne, Geyser, Caldera, and Glade were restored to service.

The purge policy data-retention limit will be increased from 45 days to 60 days and use two time and date factors: a file’s creation date and its last access date. Previously only the last access date was considered.

Files that were created more than 60 days ago and have not been accessed for more than 60 days will be deleted. CISL monitors scratch space usage carefully and reserves the right to decrease the 60-day limit as usage increases. Users will be informed of any change to the purge policy.

GLADE scratch space is for temporary, short-term use and not intended for long-term storage needs.

Feb. 20, 2018

The Cheyenne “standby” batch queue has been removed from the system until further notice due to recently discovered difficulties with scheduling jobs in that queue. The other batch queues remain available to users: premium, regular, economy, and share. See Job-submission queues and charges for more complete information on Cheyenne’s batch queues.

Feb. 16, 2018

All research projects are undertaken with the hope to produce findings and products of lasting value. It is often unthinkable to consider that someone could forget the details relating to a project, especially how the results are produced. However, the state of becoming an “unloved data set” is often reached unintentionally over time. Specifically, if the research projects lose sight of data management actions, research results and products could be at risk of becoming forgotten or “unloved” when the team moves on to new projects.

The Data Stewardship Engineering Team (DSET) is a cross-organizational team formed by the NCAR Directors. DSET’s charter specifies that the DSET leads the organization’s efforts to provide enhanced, comprehensive digital data discovery and access, and the team is focused on providing a user-focused, integrated system for the discovery and access of digital scientific assets.

The DSET and the DASH services are here to help in promoting NCAR’s scientific results and allow them to be used, so that they would be valued for the long term.

If you would like to learn more about DSET/DASH and its services after the LYD week, please contact us at datahelp@ucar.edu.

Thank you for participating in Love Your Data Week by reading this and the previous four posts. If you have missed any of the five posts during this week, they are available in Staff Notes as well as the Daily Bulletin archive, or please feel welcome to contact the Data Curation & Stewardship Coordinator.

Feb. 15, 2018

XSEDE is offering introductory and advanced training sessions this Thursday and Friday via webcast from the Texas Advanced Computing Center. The focus of these training sessions will be on programming for manycore architectures such as Intel's Xeon Phi and Xeon Scalable processors. Both classes run from 7 a.m. to 11 a.m. MST. See these links for registration and class details:

Pages