The Yellowstone system, Geyser and Caldera clusters, and GLADE file systems will be unavailable during a scheduled maintenance period from 6 a.m. to 4:00 p.m. Tuesday, July 7. CISL will implement a new Mellanox “PQFT” routing engine algorithm that is expected to reduce congestion in the Yellowstone InfiniBand interconnect fabric. Other updates include replacement of a faulty ethernet switch in the Geyser cluster and upgrading GPFS to version 4.1. The new version will likely reduce system hangs.
A system reservation will be put in place 12 hours before the scheduled downtime. Users’ jobs with specified job times that overlap the reservation period will remain on hold until the system is restored to service. Longer-running jobs that have not finished by 6 a.m. Tuesday will need to be resubmitted after the maintenance period.