Optimizing High Performance Loops for Embedded Processors


High Performance computing is becoming more and more used in embedded systems (multimedia applications, games, high resolution printers, etc.). Research topics in classical high performance computing had brought many brilliant results and advances during many decades. For instance, software pipelining methods become commonly used in most of the optimizing compilers now. However, such techniques do not consider yet the specific constraints of the embedded applications such that reduced code size and memory footprint. Establishing the fundamental relationship between code size and performance is still an open problem in computer science and academia in general, even if many ad-hoc techniques exist from the computer engineering perspective. Unfortunately, most of the existing ad-hoc techniques are simple heuristics that make tradeoffs between code size and performance whereas there does not exist any relevant study that proves that such tradeoff is mandatory. Furthermore, our recent experimental results with Pursuit scheduling (a software pipelining method) show that there is indeed much room for improvement (up to 35% reduction in makespan for Livermore loops), with identical or better initiation interval II than a reference compiler (intel icc). Knowing the optimal makespan and code size for a given II is an important aspect from either academic and design-space exploration perspectives.

The community really needs to formally describe the optimal optimization of software pipelining under code size constraints.  We provide here three main arguments:

  1. To be able to provide optimal solutions if really needed (the case of small embedded applications require it)
  2. To be able to provide much better scalable heuristics that tackle the real precise problem instead of tackling inaccurate problems as usually done with ad-hoc techniques
  3. To be able to measure and to compare the efficiency of all the actual and future heuristics

This last point becomes more and more important. Indeed, there exist many published software pipelining techniques under register, resource and sometimes code size constraints. All these publications claim experimental improvements but nobody can really check the real efficiency of such heuristics: no serious enough common benchmarks, static metrics only, variety of resource models, no reference implementations in a robust back-end, etc. However, if we are able to give a formal way to compute optimal solutions, it would be possible to compare all the existing techniques against the optimal solutions and hence we could objectively measure the efficiency of such methods.

Our research cluster will provide an optimizing compilation method that optimally reduces the code size of high performance loops (software pipelined loops) with the mathematical guarantee of non-loosing performance. Also, as a dual problem, we will investigate the optimal way to optimize the loop performance given a code size budget. Our project is based on many fundamental results that prove some of our assertions (see Refs below).

Références:

  • Sid-Ahmed-Ali Touati and Christine Eisenbeis. Early Periodic Register Allocation on ILP Processors.  Parallel Processing Letters, Vol. 14, No. 2, June 2004. World Scientific.
  • Benoit Dupont-de-Dinechin. Modulo Scheduling with Regular Unwinding. Discrete Optimization Methods in Production and Logistics (DOM). 2004, Omsk-Irkutsk, Russia.
  • Benoit Dupont-de-Dinechin. From Machine Scheduling to VLIW Instruction Scheduling. 2004, ST Journal of Research, number 2, volume 1.

Research cluster

Requested: € 14400
Granted: € 14400

Requested: € 8400
Granted: € 8400

  • 8400 euros : 6 months master fellowship (with all included taxes)
  • 6000 euros : travelling and visiting expenses



Requested: 24 month(s)
Granted: 0 month(s), starting on: Sat, September 30, 2006

TOUATI Sid (INRIA) (--colleague--)
ZAKS Ayal (IBM) (--member--)
DUPONT DE DINECHIN Benoit (STMicroelectronics) (--colleague--)

  • Albert Cohen, INRIA-Futurs, France.
  • Mon-Ping Wang, ST-microelectronics, Switzerland.