Energy-Aware High Performance Computing (EnA-HPC)

Keynote Speeches

Energy-Efficient Exascale Computing: An Oxymoron?

Prof. Wu-chun Feng, Virginia Tech, USA

With the annual power and cooling cost for a 1U server now exceeding the acquisition cost of the server, power consumption has arguably become the leading design constraint in high-performance computing. Furthermore, in extrapolating from today's petascale machines to exascale machines, the overall power consumption will be on the order of 10 GW, enough to power New York City twice over (i.e., a population of 16 million). This talk will address the conflicting goals of performance and power on the road to energy-efficient exascale computing via the "second coming" of Green Destiny, which arguably originated as the world's first energy-efficient supercomputer in 2002 ... when being cool wasn't "cool".

Energy Efficient HPC in Metropolitan Environments

Dr. Michel Bénard, HP Laboratories

Among other industries and human activities, IT represents today only a small fraction of the total energy consumption, around 2%. However, this consumption is growing very fast, at a double-digit rate per year. New trends in the use of enterprise, public and private information are fostering an explosion in the amount of data created, transmitted, stored and processed. This situation is putting the IT industry in charge of solving its energy consumption problem, using its own innovation resources, as well as technologies from other fields.

For about 10 years, HP Laboratories has been conducting research on a concept named the Sustainable Data Center. The Sustainable Data Center is designed for net-zero energy consumption from non-renewable sources over its entire lifecycle, from initial resource extraction and manufacturing to operation and end-of-life reclamation. This data center satisfies the Service-Level Agreements (SLAs) of the hosted services while reducing the Total Cost of Ownership (TCO) and emissions. Research thrusts include IT-facility demand management, supply-side management, integrated design and management, and management information systems.

Recently the research has moved towards problems related to the supply-demand side management of resources in the context of a home, a campus or a city. Metropolitan Environments provide a specific challenging set of constraints which have to be taken into account into the SLAs, the data acquisition and processing, and the decision process.

Opening Speeches

Sustainability on a Smarter Planet

Andreas Pflieger, IBM

At IBM, we mean that intelligence is being infused into the systems and processes that make the world work—into things no one would recognize as computers: cars, appliances, roadways, power grids, clothes, even natural systems such as agriculture and waterways.

Today, it's not a question as to whether the technology to build a smarter planet is real. Now, we need to know what to do next. How do you infuse intelligence into a system for which no one enterprise or agency is responsible? How do you bring all the necessary constituents together? How do you make the case for budget? Where should you start?

We've learned a lot over the past year about what it takes to build a smarter planet. Importantly, we’ve learned that our companies, our cities and our world are complex systems—indeed, systems of systems—that require new things of us as leaders, as workers and as citizens. A smarter planet will require a profound shift in management and governance toward far more collaborative approaches. Something that is true for macro economic scenarios such as reducing waste of power in large power grids as well as in as micro economic ones such as enhancing power- and cooling efficiency of supercomputers.

Scientific Session - General Purpose Hardware

A First Look at Integrated GPUs for Green High-Performance Computing

Thomas Scogland, Heshan Lin, Wu-chun Feng

The graphics processing unit (GPU) has evolved from a single-purpose graphics accelerator to a tool that can greatly accelerate the performance of high-performance computing (HPC) applications. Previous work evaluated the energy efficiency of discrete GPUs for compute-intensive scientific computing and found them to be energy efficient but very high power. In fact, a compute- capable discrete GPU can draw more than 200 watts by itself, which can be as much as an entire compute node (without a GPU). This massive power draw presents a serious roadblock to the adoption of GPUs in low-power environments, such as embedded systems. Even when being considered for data centers, the power draw of a GPU presents a problem as it increases the emand placed on support infrastructure such as cooling and available supplies of power, driving up cost. With the advent of compute-capable integrated GPUs with power consumption in the tens of watts, we believe it is time to re-evaluate the notion of GPUs being power-hungry.

Optimizing Energy Consumptions of High Performance Computing Applications on POWER7

Luigi Brochard

Power consumption is a critical consideration in high performance computing systems and it is becoming the limiting factor to build and operate Petascale and Exascale systems. When studying the power consumption of existing systems running HPC workloads, we find power, energy and performance are closely related leading to the possibility to optimize energy without sacrificing (much or at all) performance. This paper presents the power features of the POWER7 and shows how innovative software can use these features to optimize the power and energy consumptions of large cluster running HPC workloads. This paper starts by presenting the new features which have been introduced in POWER7 to manage power consumption. We then analyze the power consumption and performance of different HPC workloads at various levels of the POWER7 server (processor, memory, io) for different frequencies. We propose a model to predict both the power and energy consumption of real workloads based on their performance characteristics measured by hardware performance counters (HPM). We show that the power estimation model can achieve less than 2% error versus actual measurements. In conclusion, we present how an innovative scheduler can help to optimize both power and energy consumptions of large HPC clusters.

Scientific Session - Special Purpose Hardware

Energy Efficiency of Mixed Precision Iterative Refinement Methods using Hybrid Hardware Platforms

Hartwig Anzt, Björn Rocker, Vincent Heuveline

In this paper we evaluate the possibility of using mixed precision algorithms on different hardware platforms to obtain energy-efficient solvers for linear systems of equations. Our test-cases arise in the context of computational fluid dynamics. Therefore, we analyze the energy efficiency of common cluster nodes and a hybrid, GPU-accelerated cluster node, when applying a linear solver, that can benefit from the use of different precision formats. We show the high potential of hardware-aware computing in terms of performance and energy efficiency.

QPACE: Power-efficient parallel architecture based on IBM PowerXCell 8i

Dirk Pleiter, Heinz Baier, Hans Boettiger, Matthias Drochner, Norbert Eicker, Uwe Fischer, Zoltan Fodor, Andreas Frommer, Claude Gomez, Gottfried Goldrian, Simon Heybrock, Dieter Hierl, Matthias Hüsken, Thomas Huth, Benjamin Krill, Jack Lauritsen, Thomas Lippert, Thilo Maurer, Bernhard Mendl, Nils Meyer, Andrea Nobile, Ibrahim Ouda, Marcello Pivanti, Manfred Ries, Andreas Schäfer, Heiko Schick, Fabio Schifano, Hubert Simma, Stefan Solbrig, Thomas Streuer, Karl-Heinz Sulanke, Raffaele Tripiccione, Jörg-Stephan Vogt, Tilo Wettig, Frank Winter

QPACE is a novel massively parallel architecture optimized for lattice QCD simulations. Each node comprises an IBM PowerXCell 8i processor. The nodes are interconnected by a custom 3-dimensional torus network implemented on an FPGA. The architecture was systematically optimized with respect to power consumption. This put QPACE in the number one spot on the Green500 List published in November 2009. In this paper we give an overview of the architecture and highlight the steps taken to improve power efficiency.

Scientific Session - Power Extrapolation, Estimation and Simulation

Quantifying Power Consumption Variations of HPC Systems Using SPEC MPI Benchmarks

Daniel Hackenberg, Daniel Hackenberg, Robert Schöne, Daniel Molka, Matthias Müller, Andreas Knüpfer

The power consumption of an HPC system is not only a major concern due to the huge associated operational cost. It also poses high demands on the infrastructure required to operate such a system. The power consumption strongly depends on the executed workload and is influenced by the system hard- and software and its setup. In this paper we analyze the power consumption of a 32-node cluster across a wide range of parallel applications using the SPEC MPI2007 benchmark. By measuring the variations of the power consumed by different hardware nodes and processes of an applications we lay the ground to extrapolate the energy demand of large parallel HPC systems.

Simulation of Power Consumption of Energy Efficient Cluster Hardware

Timo Minartz, Julian Kunkel, Thomas Ludwig

In recent years the power consumption of high-performance computing clusters has become a growing problem because the number and size of cluster installations has been rising. The high power consumption of clusters is a consequence of their design goal: High performance. With low utilization, cluster hardware consumes nearly as much energy as when it is fully utilized. Theoretically, in these low utilization phases cluster hardware can be turned off or switched to a lower power consuming state. We designed a model to estimate power consumption of hardware based on the utilization. Applications are instrumented to create utilization trace files for a simulator realizing this model. Different hardware components can be simulated using multiple estimation strategies. An optimal strategy determines an upper bound of energy savings for existing hardware without affecting the time-to-solution. Additionally, the simulator can estimate the power consumption of efficient hardware which is energy-proportional. This way the minimum power consumption can be determined for a given application. Naturally, this minimal power consumption provides an upper bound for any power saving strategy. After evaluating the correctness of the simulator several different strategies and energy-proportional hardware are compared.

Adaptive Estimation and Prediction of Power and Performance in High Performance Computing

Reza Zamani, Ahmad Afsahi

Power consumption has become an increasingly important constraint in high-performance computing systems, shifting the focus from peak performance towards improving power efficiency. This has resulted in significant research on reducing and managing power consumption. To have an effective power management system in place, it is essential to model and estimate the runtime power of a computing system. Performance monitoring counters (PMCs) along with regression methods are commonly used in this regard to model and estimate the runtime power. However, architectural intuitions remain fundamental with regards to the current models that relate a computing system’s power to its PMCs. By employing an orthogonal approach, we examine the relationship between power and PMCs from a stochastic perspective. In this paper, we argue that autoregressive moving average (ARMA) models are excellent candidates for modeling various trends in performance and power. ARMA models focus on a time series perspective of events, and we adaptively update them through algorithms such as recursive-least-squares (RLS) filter, Kalman filter (KF), or multivariate normal regression (MVNR). We extend the notion of our model to predict near future power and PMC values. Our empirical results show that the system-level dynamic power is estimated with an average error of 8%, and dynamic runtime power and instructions per cycle can be predicted (65 time steps ahead) with an average error of less than 11.1% and 7%, respectively.

Scientific Session - Metrics and Models

A new Energy Aware Performance Metric

Costas Bekas, Alessandro Curioni

Energy aware algorithms are the wave of the future. The development of exascale systems made it clear that extrapolations of current technologies, algorithmic practices and performance metrics are simply inadequate. The community reacted by introducing the FLOPS/WATT metric in order to promote energy awareness. In this work we take a step forward and argue what one should aim for is the total reduction of the spent energy in conjunction with minimization of time to solution. Thus, we propose to use f(time to solution)⋅energy (FTTSE) as the performance metric, where f(⋅) is an application dependent function of time. In this paper, we introduce our ideas and showcase them with a recently developed framework for solving large dense linear systems.

Collecting Energy Consumption of Scientific Data

Julian Kunkel, Olga Mordvinova, Michael Kuhn, Thomas Ludwig

In this paper the data life cycle management is extended by accounting for energy consumption during the life cycle of files. Information about the energy consumption of data not only allows to account for the correct costs of its life cycle, but also provides a feedback to the user and administrator, and improves awareness of the energy consumption of file I/O. Ideas to realize a storage landscape which determines the energy consumption for maintaining and accessing each file are discussed. We propose to add new extended attributes to file metadata which enable to compute the energy consumed during the life cycle of each file.

Utilization Driven Power-Aware Parallel Job Scheduling

Maja Etinski, Julita Corbalan, Jesus Labarta, Mateo Valero

In this paper, we propose UPAS (Utilization driven Power-Aware parallel job Scheduler) assuming DVFS enabled clusters. A CPU frequency assignment algorithm is integrated into the well established EASY backfilling job scheduling policy. Running a job at lower frequency results in a reduction in its power dissipation and energy consumption, but introduces a penalty in its performance. Furthermore, performance of other jobs may be affected as their wait times can increase. For this reason, we propose to apply DVFS when system utilization is below a certain threshold, exploiting periods of low system activity. As the increase in run times due to frequency scaling can be seen as an increase in computational load, we have done an analysis of HPC system dimension. This paper investigates whether having more DVFS enabled processors and scheduling jobs with UPAS can lead to lower energy consumption and higher performance. Five workload traces from systems in production use with up to 9 216 processors are simulated to evaluate the proposed algorithm and the dimensioning problem. Our approach decreases CPU energy by 8 % on average depending on allowed job performance penalty. Applying UPAS to 20 % larger systems, CPU energy needed to execute same workloads can be decreased by 20 % while having same or better job performance.

Industrial Sessions

Macro and Micro-level Energy-Savings Through COTS Technologies

Dr. Markus Leberecht, Intel

Data center infrastructure optimization and energy-aware programming are two important practices in the toolbox of energy-efficient IT operation. In this talk, we are going to demonstrate how Intel engages in these two fields to drive highly pragmatic approaches towards lowering total and relative energy use in enterprise and high-performance computing.

Standards-based Energy-efficient HPC Systems: Trends, Implementations and Solutions

Dr. Frank Baetke, HP

HP’s HPC product portfolio which has always been based on standards at the processor, node and interconnect level lead to a successful penetration of the High Performance Computing market across all application segments. The rich portfolio of compute, storage and workstation blades comprises a family a components called the Proliant BL-series complementing the well-established rack-based Proliant DL family of nodes. To address additional challenges at the node and systems level HP recently introduced the Proliant SL-series which became the architectural base for HP’s first Peta-scale system to be delivered later this year with new nodes specifically designed for highest density and energy efficiency.

Beyond acquisition cost, the other major factor is power and cooling efficiency. This is primarily an issue of cost for power, but also for the power and thermal density of what can be managed in a data center. To leverage the economics of scale established HPC centers as well as providers of innovative services are evaluating new concepts which have the potential to make classical data center designs obsolete. Those new concepts provide significant advantages in terms of energy efficiency, deployment flexibility and manageability. Examples of this new approach, often dubbed POD for Performance Optimized Datacenter, including a concept to scale to multiple PFLOPS at highest energy efficiency will be shown.

Green Sustained Performance: the Eurotech Aurora Supercomputing Architecture for Large Scale Simulations

Giampietro Tecchiolli, Eurotech

In this speech we present the Aurora supercomputing architecture, a novel, holistic approach to sustained power and computational efficiency. Large Scale Simulations pose some great challenges with regards to scalability, energy cost, IO/memory/CPU balance and overall system size and require suitable HPC architectures. In this analysis, the design of Aurora will be confronted with each of these challenges and in-depth focus will be put on power efficiency features such as: direct liquid cooling, power conversion strategy and the interplay between software and hardware. Computational efficiency and scalability features will also be presented, with emphasis on the UNA (Unified Network Architecture) which merges three different networks: a 3D Torus, a switched Infiniband network and a three-level synchronization system. Finally, the impact of the technologies used in Aurora on the overall size and complexity of a large scale deployment (overall footprint, cabling) will be compared to the equivalent, traditional approach.

Energy Efficiency Aspects in Cray Supercomputers

Vincent Pel, Cray

Power consumption of large HPC systems is of increasing concern, and consequently numerous efforts are targeted towards improving energy efficiency. A very common metric is Power Usage Effectiveness (PUE), which however only compares relative energy use. A poorly designed system with high energy consumption still may have an excellent PUE value if the data center infrastructure provides effective cooling. Therefore, an absolute metric, such as sustained MFLOPS per watt, needs to be considered as well. This presentation discusses the various aspects Cray is taking into consideration to minimize power consumption, efficiently cool the systems, but also highlights the hardware and software concepts that enable real applications to sustain a high degree of performance.

Energy Management for HPC

Klaus Gottschalk, IBM

The system size in processor numbers and memory size installed in HPC system is raising fast. Even though energy efficiency of processors is increasing, the fast growth of HPC systems make energy consumption a possible problem for the future. This talk presents where the energy is used and how the power consumption can be better managed. It explains the technology available today and possible strategies to improve the power consumption of a single application and also of a complete HPC system.

Hybrid-Core Architectures for Energy-Efficient Computing

Ernst Mutke, Convey

Clock rates for standard X86-64 Processors are basically flat and limit the scalability and performance of many applications. Acceleration is typically done by adding more servers. This drastically increases energy cost to run and cool these larger clusters. In order to counter-act the growing demand for energy and cost, companies today implementing specific hardware accelerators to accelerate specific applications or functions. Convey Computer Corporation has developed the first Hybrid-Core Computer, which closely integrates a standard x86 processor with a hardware accelerator that implements application-specific instructions on a FPGA based Co-processor. With this approach, instead of stacking up more servers in the data center and watching power, cooling, cabling and datacenter space costs skyrocket, users can get a lot more performance for specific applications while actually reducing hardware, energy, cabling and footprint expenditures.

How green is red?

Roland Rambau, Oracle

Energy efficiency isn't just a chip or a hardware problem. It is a virtualization problem, an OS problem, a systems management problem, a networking problem, and a storage problem. This presentation provides pointers to many of the topics where Oracle is currently engaged and can help to improve energy efficiency in computing.