Keynote Presentation

Metrics for HPC Data Center Power Proportionality and Efficiency

Abstract: Supercomputing Centers for high performance computing with peta-scale capabilities have high power demands, with peak requirements of over 30 megawatts and intra-hour fluctuations of up to 10 megawatts. The last decade has seen power consumption move from an after-thought to a foremost design challenge of new supercomputers. Unfortunately there is not one silver bullet that can resolve this challenge. Instead, there are many silver B-Bs. Stranded capacity and trapped capacity are key issues both for improving capital and operational costs associated with the high and increasing power requirements of Supercomputing Centers. Stranded capacity is that which results from over-estimating the maximum power requirements during capacity planning, design and procurement. It generally produces higher than necessary capital costs. Trapped capacity, on the other hand, is caused by imbalance or sub-optimization in design. Equipment is 'on', but not productive and produces higher than necessary operational costs. Creating more energy efficient HPC is a continuous process that requires the proper tools for measuring, response, checking and validating the results and iterating in a continuous cycle. This needs to be adopted as an integral part of the overall computer operations process. This is true for the infrastructure as well as all levels of the system; from the building power, system components, up through the applications. We are transitioning from the early stages of focusing on the measurement capabilities and tools needed to allow improvements to be made. The measurement capabilities are developing and we are now in the early stages of monitoring and managing power and energy. The continuous improvement process requires understanding key organizational strategic and operational goals and objectives which are translated into metrics for managing progress. This talk will explore recommendations for energy efficiency metrics, including those commonly in use as well as those that are currently emerging. It will explore how these metrics address the issues of stranded and trapped capacity. Finally, some future technologies for dynamic power management that may help resolve these issues will be described. The Energy Efficient HPC Working Group (EE HPC WG) is helping to drive energy efficiency measures and design in HPC. It is a forum for sharing of information (e.g., best practices and peer-to-peer exchange) as well as collective action (guidelines, recommendations, collaborations). There are over 550 members from more than 20 different countries. This collective voice provides a strong influence to encourage system integrators, standards bodies and other organizations to actively participate in the drive for energy efficiency measures and design. Natalie Bates, Chair of the EE HPC WG will describe some key activities and results of the working group that are driving improvements in HPC energy efficiency.

Biography: For the past five years, Natalie Bates has led the Energy Efficient High Performance Computing Working Group (EE HPC WG). The purpose of the WG is to drive implementation of energy efficient design in HPC. Today, there are over 550 members from 20+ countries. Natalie has been the technical and executive leader for this 'open source' working group that disseminates best practices, shares information (peer to peer exchange) and takes collective action. The EE HPC WG has collaborated and negotiated with industry standards committees and major HPC organizations as well as influenced HPC system development.

Prior to leading the EE HPC WG, Natalie's career spanned twenty years with Intel Corporation where she was a senior manager of highly complex programs taking new products to market, delivering multi-component and multi-partner platforms, and negotiating strategic technical industry initiatives.