Daily Bulletin Archive

August 14, 2018

8/13/2018 - HPSS downtime: Tuesday, August 14th 7:00 a.m. - 11:00 a.m.

No downtime: Cheyenne, GLADE, Geyser_Caldera

August 14, 2018

8/2/2018 - CISL is now accepting large-scale allocation requests from university-based researchers for the 5.34-petaflops Cheyenne cluster. Submissions are due September 11.

The fall opportunity will include allocations on some new supporting resources. The Casper analysis and visualization cluster will soon enter production to replace the Geyser/Caldera clusters. University projects should request long-term space on the Campaign Storage resource instead of HPSS. Unlike HPSS, Campaign Storage has no default minimum amount; users are asked to justify the amount requested. Scrutiny of the justification will increase with the size of the request.

Researchers are encouraged to review the allocation instructions before preparing their requests. The CISL HPC Allocations Panel (CHAP) is applying increased scrutiny to data management plans and justifications for storage requests. For instructions and information regarding available resources, see the CHAP page: https://www2.cisl.ucar.edu/chap

At the fall meeting, CISL will allocate 180 million core-hours on Cheyenne, up to 2 PB of Campaign Storage space, and up to 500 TB of GLADE project space. Large allocations on Cheyenne are those requesting more than 400,000 core-hours. CISL accepts requests from university researchers for these large-scale allocations every six months.

Please contact cislhelp@ucar.edu if you have any questions.

August 13, 2018

8/9/2018 - The Cheyenne system’s share queue is operating with far fewer nodes available to it than normal. CISL is exploring a number of solutions with a priority of restoring the queue as soon as possible and minimizing disruptions to users. The time frame for resolving the issue is not yet known.

Until the issue is resolved users will experience a significant backlog of jobs submitted to the share queue. If job turnaround time in the share queue becomes untenable, users are advised to submit their jobs to one of Cheyenne’s other queues, such as the regular queue, as an interim workaround. Note that jobs that run in the non-shared queues are charged for full use of the nodes and therefore use more core-hours, but those jobs will likely execute sooner than in the share queue in its present state.

August 3, 2018

08/06/2018 - No downtime: Cheyenne, GLADE, Geyser_Caldera and HPSS

July 30, 2018

Globus team members will present a workshop at NCAR from 8:30 a.m. to 5 p.m. MDT on Wednesday, September 5, for system administrators who have deployed or are planning to deploy Globus, developers building applications for research, and others who are  interested in learning more about the service for research data management.

Place: Center Green Campus, CG1-1210-South-Auditorium, 3080 Center Green Drive, Boulder
Agenda: https://www.globusworld.org/tour/program?c=14  
Registration: https://www.globusworld.org/tour/register
Registration: No charge to attend; space is limited so register early

The session will include hands-on walkthroughs of:

  • Using Globus for file transfer, sharing and publication

  • Installing and configuring Globus endpoints

  • Incorporating Globus capabilities into your own data portals, science gateways, and other web applications

  • Automating research data workflows using Globus CLI and API — including how to automate scripted transfers to and from the new NCAR Campaign Storage

  • Using Globus in conjunction with the Jupyter platform

  • Integrating Globus services into your institutional repository and data publication workflows

  • Using Globus Auth authentication and fine-grained authorization for accessing your own services

Globus (www.globus.org) is a research data management service developed by the University of Chicago and used by hundreds of thousands of researchers at institutions in the U.S. and abroad.

July 30, 2018

Cheyenne, Geyser, and Caldera users can now get a quick look at what software environment modules are installed on those systems before they log in. These two documentation pages are updated daily with the output of module spider commands:

For more information about module commands, see our environment modules documentation.

July 24, 2018

The Globus interface for transferring data does not handle symbolic links and does not create a symbolic link on a destination endpoint. This is true in both the web and command line interfaces. If you explicitly request a transfer of a symbolic link, Globus will follow that link and transfer the data that the link points to. More important, if you have symbolic links inside a directory which you copy recursively with Globus, the links will be ignored entirely. You can run the following command to determine if you have symbolic links in your transfer:

find /path/to/folder -type l

Because symbolic links are common in working directories, CISL recommends using the cp or rsync commands to move data between various spaces on GLADE. To move data from old work spaces to new work spaces, for example, use the following recursive copy:

cp -a -r /glade/p_old/work/${USER}/data_directory /glade/work/${USER}

For transfers to and from the new Campaign Storage, and for large transfers to file systems at other sites, CISL still recommends Globus as the easy, fast, and secure option to move data. However, it is important to prepare your data for transfer by identifying and managing your symbolic links. There are two approaches you can take:

  1. If you wish to preserve the linked data, simply replace the symbolic link with the target data using cp.

  2. If you wish to preserve the symbolic links themselves, the easiest approach is to create a tarball containing all of the files you want to copy (including the symbolic links), and then use Globus to transfer that tarball to the target file system.

If you need guidance on which approach is the best for your particular data transfer, please contact cislhelp@ucar.edu with questions.


July 20, 2018

The Cheyenne, Geyser, and Caldera clusters will be unavailable Tuesday, July 24, starting at approximately 6 a.m. MDT to allow CISL staff to update key system software components. The outages are expected to last until approximately 6 p.m. Tuesday evening but every effort will be made to return the systems to service as soon as possible.

To minimize impact to running jobs, all Cheyenne batch queues will be suspended at approximately 6:00 p.m. tonight. Running jobs will not be interrupted. After the queues are suspended users will be still able to submit batch jobs but those jobs will be held until the system is returned to service Tuesday evening.  A system reservation will be created on Geyser and Caldera to prevent batch jobs from executing past 6:00 a.m. Tuesday morning.

All batch jobs and interactive processes that are still executing when the outages begin will be killed.  The clusters’ login nodes will be unavailable throughout the outages.

CISL will inform users through the Notifier service when the systems are restored.

July 16, 2018

Containers are a hot topic in high-performance and scientific computing, but while they can provide significant advantages they don't always live up to the hype. That’s why CISL is offering a hands-on class, “Containers and How They Work,” from 9 a.m. to noon MDT on Friday, July 20, at the NCAR Mesa Lab in Boulder.

The course explains what containers are and how they work, and it surveys some popular implementations with an eye toward supporting scientific workloads. Security and other operational concerns will also be covered for cluster administrators who are thinking about supporting containerized workloads on their systems. Topics to be covered include:

  • What are containers and how do they work?

  • Image formats (tar/filesystem/overlayfs/dense filesystem image/singularity)

  • Build vs. run [singularityhub]

  • Platform independence/reproducibility

  • User namespaces and rootless containers

  • Scheduler integration

  • Container runtimes (Docker, Charliecloud, Inception, Singularity, others)

  • OCI/Standards/runc

  • Applications

  • Education/outreach

  • Cloud

  • Reproducible science

The class is intended for anyone who is planning to deploy applications and create application environments using containers; developers and systems support staff getting started with containers; and others interested in learning about containers. Participants should be familiar with Linux and system operations and should bring a laptop and authentication token for connecting to Geyser and Caldera. Laptops should be fully charged as there may not be enough power receptacles in the seminar room.

Please use this form to register so CISL knows how many participants to expect. Space is limited to 50 participants and registration is open through July 16.

July 13, 2018

HPSS downtime: Tuesday, July 17th 7:30 a.m. - 10:00 a.m.

No downtime: Cheyenne, GLADE, Geyser_Caldera