4th HiPEAC Industrial Workshop on Compilers and Architectures at Robinson College, Cambridge

From 1st Jan 70

November 26, 2007
Organized by ARM Ltd. in Cambridge, UK


Call for Papers

Find the call for papers here.


Program and Presentations

Topics

The main focus of this workshop is advanced embedded computer architecture and compiler technology. The topics of interest for this workshop include, but are not limited to:
  • Modern embedded architectures
  • High-performance low-power architectures
  • Ultra Low Power Circuit and Microarchitecture Design Techniques
  • Reliability and Fault Tolerance
  • Symmetric/Asymmetric Multicore, multithreading, superscalar, and VLIW architectures
  • Reconfigurable and soft-core computing
  • Compilers and programming tools for modern embedded systems
  • Dynamic translation and optimization
  • Parallel programming and concurrency support for Multicore/multithreaded systems
  • Performance tools for embedded systems
  • Non-traditional embedded computing systems topics

Workshop Program


08:30Arrival + Registration
09:00 - 9:10Opening
SESSION 1: Binary, Compiler and Memory Optimization for Embedded Systems
09:10 - 9:35 Benoît Dupont de Dinechin, STMicroelectronics
09:35 - 10:00 Dominique Chanet, Jonas Maebe and Koen De Bosschere, Ghent University
10:00 – 10:25 Peter Marwedel, Heiko Falk, Sascha Plazar, Robert Pyka and Lars Wehmeyer, University of Dortmund and Informatik Centrum Dortmund (ICD)
10:25 – 10:50 Benedict R. Gaster, Clearspeed Tech.
10:50 – 11:20COFFEE BREAK
SESSION 2: Language and Tool Support for Multicore Architectures
11:20 – 11:45 Philippe Bonnot, Sami Yehia, Arnaud Grasset, Eric Lenormand and Gilbert Edelin, Thales Research and Technology
11:45 – 12:10 Marina Biberstein, Moon S. Chang, Bilha Mendelson, Uzi Shvadron and Javier Turek, IBM Haifa
12:10 – 12:35 Alastair Donaldson, Colin Riley, Anton Lokhmotov and Andrew Cook, Codeplay Software and University of Cambridge
12:35 – 13:50LUNCH + POSTER SESSION
13:50 - 14:35Keynote Speech Krisztian Flautner, Director of Research, ARM
SESSION 3: Dependable Computing
14:35 – 15:00 Ricardo Fernandez-Pascual, Jose M. Garcia, Manuel E. Acacio and Jose Duato, University of Murcia and University of Valencia
15:00 – 15:25 Veerle Desmet, Yiannakis Sazeides and Costas Vrioni, Ghent University and University of Cyprus
15:25 – 15:50COFFEE BREAK
SESSION 4: Modeling and Simulation
15:50 – 16:15 Sanjay Jinturkar, Vitaly Kalashnikov, Mayan Moudgill, Gary Nacer and John Glossner, Sandbridge Technologies
16:15 – 16:40 Stefan Kraemer, Lei Gao, Rainer Leupers, Gerd Ascheid and Heinrich Meyr, RWTH-Aachen University
16:40 – 17:05 Veerle Desmet, Grigori Fursin, Sylvain Girbal and Olivier Temam, Ghent University and INRIA
17:05 – 17:25Coffee Break
SESSION 5: Relevant EU Projects
17:25 – 17:40 Mike O’Boyle, University of Edinburgh
17:40 – 17:55 Georgi N. Gaydadjiev, Delft University of Technology
17:55 - 18:00Closing

POSTER SESSION details




Abstract:
Towards an Energy Efficient Branch Prediction Scheme Using Profiling
Michael Hicks, Colin Egan, Bruce Christianson and Patrick Quick, University of Hertfordshire, UK

Abstract: Dynamic branch predictors account for between 10% and 40% of a processor’s dynamic power consumption. This power cost is proportional to the number of accesses made to that dynamic predictor during a program’s execution. In this paper we propose the combined use of local delay region scheduling and profiling with an original adaptive branch bias measurement. The adaptive branch bias measurement takes note of the dynamic predictor’s accuracy for a given branch and decides whether or not to assign a static prediction for that branch. The static prediction and local delay region scheduling information is represented as two hint bits in branch instructions. We show that, with the combined use of these two methods, the number of dynamic branch predictor accesses/updates can be reduced by up to 62%. The associated average power saving is very encouraging; for the example high-performance embedded architecture n average global processor power saving of 6.22% is achieved.
The ARISE Framework: Extending Processors with Arbitrary Hardware Accelerators
Nikolaos Vassiliadis, George Theodoridis, and Spiridon Nikolaidis, Aristotle University of Thessaloniki, Greece

Abstract: ARISE introduces a systematic approach for extending once a processor to support thereafter the coupling of an arbitrary number of Custom Computing Units (CCUs). A CCU can be hardwired or reconfigurable unit, while it can be utilized following a hybrid, tight and/or loose, model of computation. By selecting the appropriate model of computation for each part of the application, the complete application space can be considered for acceleration, resulting to significant increase of performance improvements. To support these features we introduce a machine organization that allows the co-operation of a processor and a set of CCUs. To control the CCUs the instruction set of the processor is extended with eight instructions. To efficiently incorporate these features to an embedded processor, a micro-architecture implementation that minimizes the control and communication overhead between the processor and the CCUs is introduced. To evaluate our proposal we have extended a MIPS processor with the ARISE infrastructure and implemented it on a Xilinx FPGA and proved that the timing model of the processor is not affected. A set of benchmarks were implemented on the ARISE evaluation machine. Performance results prove that exploiting the hybrid model of computation, the ARISE machine achieves performance improvements of up to 68% compared to a typical approach.
Light-Weight SIMD Extension for Embedded Processors
Magnus Sjalander and Per Larsson-Edefors, Chalmers University of Technology, Sweden

Abstract: We present a light-weight SIMD extension for embedded general-purpose processors with negligible impact on delay and power dissipation. This is achieved by modifying existing functional units, such that they support multiple-precision operations and by limiting the number of added SIMD instructions. Particularly, a twin-precision multiplier is utilized to give support for low-overhead SIMD multiplications. A MIPS-R2000-like processor is extended with the proposed light-weight SIMD support, and the performance estimates from placed-and-routed layouts in a 0.13-μm technology are subsequently analyzed. A SIMDenabled version of the EEMBC’s FFT benchmark shows that on top of a dramatically reduced memory access activity, the total execution time and total energy is reduced by 15% and 14%, respectively.
Filtering drowsy caches to improve their performance
Paolo Bennati and Roberto Giorgi, University of Siena, Italy

Abstract: Leakage power in data cache memories represents a sizable fraction of total power consumption, and many techniques have been proposed to reduce it. As a matter of fact, during a fixed period of time, only a small subset of cache lines is used. Drowsy technique, for instance, put unused lines to drowsy state in order to save power. Our idea is to adaptively select mostly used cache lines in order to maintain mostly used data always available. We found that this can be achieved automatically by using a tiny cache acting as a filter “L0” cache. Our main contributions are: i) evaluation of filter cache to reduce leakage; ii) improvement of an existing power-saving techniques. Our experiments, with complete MiBench suite for ARM based processor, show (in average) 10% improvement in leakage saving and 17% in leakage energy-delay versus drowsy-cache.
Automatic Parallelization in GCC
Razya Ladelsky, IBM Haifa, Israel

Abstract: With the emergence of multicore architectures there is a growing need for automatic parallelization, that distributes sequential code into multi threaded code. OpenMP defines language extensions to C, C++, and Fortran for implementing multi-threaded shared memory applications. Generation of such extensions by the compiler relieves programmers from the manual parallelization process. OpenMP specification has been implemented in GCC and integrated into version 4.2. The OpenMP infrastructure together with existent data dependence analyses served as the basic infrastructure for an automatic parallelization optimization implementation recently in GCC. The initial automatic parallelization work was contributed by Sebastian Pop and Zdenek Dvorak, and supports loops whose iterations are independent of each other. We later enhanced these capabilities to support loops with reduction dependence among the iterations, thereby parallelizing additional loops. These auto-parallelization contributions are being incorporated into the upcoming version 4.3. In this talk we summarize the existing OpenMP and data dependence infrastructures in GCC, then describe the current state of automatic parallelization in GCC, demonstrated by some examples. Finally, we discuss future directions of work that may further extend the optimization's applicability.


Workshop Registration

The registration website is open for Hipeac members at Online Registration.

For people who are not Hipeac members, please click here to register.

Practical Information