Yellowstone deployment efforts continue to pursue performance improvements

September 6, 2012

The Yellowstone timeline has continued to slip despite long hours put in by CISL, IBM and Mellanox staff. At this writing, the most optimistic timeline has the three-week acceptance test period beginning late this week (the tail end of August), which pushes first user access at least to late September.

While the compute and storage hardware looks good and has demonstrated itself to be more stable than anticipated, with little “infant mortality” observed so far, IBM and Mellanox are continuing to address challenges to achieving the expected performance of 90 GB/s between the compute and storage systems.

The performance tuning involves complex hardware, software, and firmware interactions among the more than 4,500 compute nodes on Yellowstone; the 4,500 disk drives, 76 disk controllers, and 20 GPFS servers of the GLADE resource; and the InfiniBand interconnect comprised of nine core switches, 250 leaf switches, and more than 9,500 copper and fibre cables.

CISL is monitoring the deployment process closely, with ongoing interactions with and updates from the IBM team. Given the extent of the delays thus far, CISL is watching the system's stability and performance results and looking for the earliest possible opportunity to move into acceptance testing. If IBM's performance results are not quite at the promised levels, CISL may elect to pursue acceptance despite the shortfall and discuss alternate methods of later achieving performance targets with IBM.

When the Yellowstone timeline solidifies, CISL will also re-evaluate the schedule for Bluefire. With any Yellowstone delays, users can expect Bluefire’s decommissioning date to be extended accordingly.