The Daily Bulletin

IBM engineers and CISL system administrators crossed a significant hurdle yesterday, March 22, by successfully recovering missing disk drives on the /glade/p file system.   The recovered disks had been unreachable since the hardware component failure on March 8 causing errors when reading many /glade/p files.

A full file system check was performed overnight night to verify data, locate any file corruption and generate a report for IBM that will be reviewed this morning.  Pending the outcome of that review a full repair of the file system may be initiated that will keep all of /glade/p offline throughout the day.

Users will be kept up to date on all developments through the Notifier service.

The PBS qstat command will be modified during this week's maintenance outage. When Cheyenne is returned to service late this week users will be able to query only for information about their own jobs and not jobs submitted by other users. The reason for this change is to reduce demands on the PBS server, which has frequently been overloaded, resulting in poor system performance and job failures. User should be aware that this change may affect some existing scripts and workflow managers.

CISL learned recently that some users’ scripts were issuing multiple qstat commands, which can be highly resource intensive, every minute or every second. Limiting qstat to return information only for jobs belonging to the user will significantly reduce demands on the system. Before this change, the command’s default behavior was to return information on all jobs in the PBS database.

Users can further help reduce demands on the system by adopting the following changes wherever possible:

  • Use “qstat <jobid>” instead of just “qstat”

  • Avoid using “qstat -f -x”

  • Limit the number and frequency of qstat commands. Multiple calls every minute provides little extra information and adversely affects overall system performance.

CISL thanks all users for their cooperation. Please contact if you have any questions or would like help in this matter.

CISL is now accepting requests from university-based researchers for large-scale allocations for the 5.34-petaflops Cheyenne cluster; submissions are due March 26.

You should be aware that the CHAP is scrutinizing requests for disk and tape storage much more closely  because of the rapidly growing scale of the data generated by many university projects and constraints on the available storage within the CISL environment. Be sure to review the guidance on the updated instruction page before preparing your submission.

Large allocations on Cheyenne are those of more than 400,000 core-hours. CISL accepts requests from university researchers for these large-scale allocations every six months. For updated submission instructions and information regarding available resources, see the CISL HPC Allocations Panel (CHAP) page.

Please contact if you have any questions about this opportunity.