User login |
Optimizing High Performance Loops for Embedded ProcessorsHigh Performance computing is becoming more and more used in embedded systems (multimedia applications, games, high resolution printers, etc.). Research topics in classical high performance computing had brought many brilliant results and advances during many decades. For instance, software pipelining methods become commonly used in most of the optimizing compilers now. However, such techniques do not consider yet the specific constraints of the embedded applications such that reduced code size and memory footprint. Establishing the fundamental relationship between code size and performance is still an open problem in computer science and academia in general, even if many ad-hoc techniques exist from the computer engineering perspective. Unfortunately, most of the existing ad-hoc techniques are simple heuristics that make tradeoffs between code size and performance whereas there does not exist any relevant study that proves that such tradeoff is mandatory. Furthermore, our recent experimental results with Pursuit scheduling (a software pipelining method) show that there is indeed much room for improvement (up to 35% reduction in makespan for Livermore loops), with identical or better initiation interval II than a reference compiler (intel icc). Knowing the optimal makespan and code size for a given II is an important aspect from either academic and design-space exploration perspectives. The community really needs to formally describe the optimal optimization of software pipelining under code size constraints. We provide here three main arguments:
This last point becomes more and more important. Indeed, there exist many published software pipelining techniques under register, resource and sometimes code size constraints. All these publications claim experimental improvements but nobody can really check the real efficiency of such heuristics: no serious enough common benchmarks, static metrics only, variety of resource models, no reference implementations in a robust back-end, etc. However, if we are able to give a formal way to compute optimal solutions, it would be possible to compare all the existing techniques against the optimal solutions and hence we could objectively measure the efficiency of such methods. Our research cluster will provide an optimizing compilation method that optimally reduces the code size of high performance loops (software pipelined loops) with the mathematical guarantee of non-loosing performance. Also, as a dual problem, we will investigate the optimal way to optimize the loop performance given a code size budget. Our project is based on many fundamental results that prove some of our assertions (see Refs below). Références:
Research cluster Requested: € 14400 Granted: € 14400 Requested: € 8400 Granted: € 8400
Requested: 24 month(s) Granted: 0 month(s), starting on: Sat, September 30, 2006 TOUATI Sid (INRIA) (--colleague--) ZAKS Ayal (IBM) (--member--) DUPONT DE DINECHIN Benoit (STMicroelectronics) (--colleague--)
|