Automatic Online Tuning

High Performance Computing is a key enabling factor for research and development. It is based on parallel platforms ranging from high-end workstations and servers to large-scale supercomputers. These systems leverage multi- and manycore processors to reach high execution rates. Heterogeneous systems are getting more common due to advancements in accelerator technology and dynamic variation in resource capabilities, e.g., Dynamic Voltage and Frequency Scaling (DVFS).

The heterogeneous nature of current and future HPC systems mandates the use of a variety of programming paradigms, such as MPI, OpenMP, PGAS, CUDA, and OpenCL, which are often combined within a single program. Higher-level approaches have recently become available that facilitate programming of accelerators via directive-based automatic code generation, e.g., HMPP and OpenACC.

Due to the increasing complexity of parallel architectures for HPC, it is extremely difficult to develop programs exploiting the full capability of the hardware. Application developers have to go through a time-consuming program tuning process after the program was written and debugged. Thus, the whole development process is time consuming and cumbersome and unveils a huge productivity gap.

Program tuning covers many different tuning parameters, e.g., core pipeline utilization, cache optimization, data distribution, idle time reduction in message passing, load balancing, compiler flag selection etc. In addition to tuning applications for performance, energy reduction is getting more and more important in the context of rising energy prices and the pace towards exascale systems. Additionally, since many tuning actions are input data dependent, they need to be verified for different data sets, requiring a large number of experiments.

It is the goal of the AutoTune project that started in October 2011 to develop an extensible tuning environment that automates the tuning process of applications. The framework is called Periscope Tuning Framework (PTF) and will focus on static tuning, i.e., it will identify tuning recommendations in special application tuning runs. These tuning recommendations can then be applied to optimize the code for later production runs.