Lecturers

The topics of this year's Summer School will be presented by the following world-class experts.

The structure of the Summer School is such that the participants will have the opportunity to intensely interact with the lecturers during the full duration of the summer school (during meals, breaks, evening activities). All lecturers will stay on campus during the full week.

Courses

The summer school consists of 12 courses spread over two morning slots and two afternoon slots. Per slot there are three parallel courses of which you can take only one. The number of students that can enroll in one course is limited to 70. When applying for admission, you will be asked to indicate your preference.

The courses have been allocated to slots in such a way that it is in any case possible to create a summer school program that matches your research interests.

  Lecturer Course title
Slot 1David AlbonesiPower- and Reliability-Aware Microarchitecture
Walid NajjarOpportunities and Challenges of Reconfigurable Computing
Peter PuschnerWCET Analysis: Problems, Methods, and Time-Predictable Architectures
Slot 2Jaejin LeeCompilers and Runtimes Support for Explicitly Managed Memory Hierarchies
Radu MarculescuNetworks-on-Chip: Why, What, and How?
Paul McKenneyPerformance, Scalability, and Real-Time Response From the Linux Kernel
Slot 3David WoodTransactional Memory
Grant MartinPractical System-Level Design Methodologies for Processor-Centric SoC and Embedded Systems
Kim HazelwoodProcess Virtualization and Symbiotic Program Optimization
Slot 4Bruce JacobEmbedded Systems, Memory Systems, and Embedded Memory Systems
J. (Ram) RamanujamOptimizations for multicore and GPGPU architectures
Lieven EeckhoutPerformance Evaluation and Benchmarking

Course information

Power- and Reliability-Aware Microarchitecture

by David Albonesi
Abstract

Power consumption is the number one design constraint in many computing systems today, with reliability rapidly becoming a major issue as we continue to scale CMOS technology. Process and circuit designers can no longer address these issues alone; computer architects must devise power- and reliability-aware microarchitures that meet performance goals while dramatically improving power-efficiency and reliability.

This course will introduce students to the issues of power and reliability in modern computer systems and the numerous solutions that have been proposed. The topics covered will include the following:

Bio

David Albonesi is an Associate Professor of Electrical and Computer Engineering at Cornell University. Before joining Cornell in 2004, he was on the faculty of the University of Rochester. He previously spent 10 years in the computer industry in technical and management positions before receiving his PhD in 1996 from the University of Massachusetts Amherst. He has received the National Foundation CAREER Award and IBM Faculty Awards and is currently Editor-in-Chief of IEEE Micro.







Opportunities and Challenges of Reconfigurable Computing

by Walid Najjar
Abstract

The recent growth in both size and speed of FPGAs (Field Programmable Gate Arrays) have opened up tremendous opportunities for using these as spatial computing platforms in the form of hardware accelerators on applications ranging from image and video processing, cryptography, bioinformatics, high-performance computing, molecular dynamics, data bases, information retrieval etc. These implementations have routinely demonstrated speedups of two or more orders of magnitude. This course looks at FPGAs as code accelerators and what opportunities they offer, the challenges to be overcome and what role they can play in a post Moore's Law era. Relevant topics include architecture, languages and compilation, run-time systems, algorithms and data representation.

OUTLINE

Brief look at FPGA structure and architectures and Hardware Description Languages (HDLs). Review of applications using FPGA as code accelerators. Platforms for reconfigurable computing: SGI RASC, Intel QuickAssist, etc. Programming FPGAs, HDLs or HLLs? New algorithmic approaches for spatial computing? Open research issues.

Bio

Walid A. Najjar is a Professor in the Department of Computer Science and Engineering at the University of California Riverside. His research interests are in the fields of computer architecture, compiler optimizations and embedded systems. Lately, he has been very active in the area of compilation for FPGA-based code acceleration and reconfigurable computing. His research has been supported by NSF, DARPA and various industry sponsors. He received a B.E. in Electrical Engineering from the American University of Beirut in 1979 and the M.S. and Ph.D. in Computer Engineering from the University of Southern California in 1985 and 1988 respectively. He was on the faculty of the Department of Computer Science at Colorado State University (1989 to 2000), before that he was with the USC-Information Sciences Institute (1986 to 1989). He has served on the program committees for a number of leading conferences in this area including CASES, ISSS-CODES, DATE, HPCA, and MICRO. He is a Fellow of the IEEE.



WCET Analysis: Problems, Methods, and Time-Predictable Architectures

by Peter Puschner
Abstract

As embedded real-time computing systems find their ways into a steadily increasing number of safety- and time-critical applications, the timely operation of real-time computer systems must be guaranteed. Embedded systems engineers and researchers must therefore be familiar with problems in and methods for obtaining information about the timing of a real-time application, i.e., schedulability analysis and Worst-Case Execution-Time analysis (WCET analysis).

This course will provide participants with a solid understanding of the problems of WCET analysis and WCET-analysis methods. We will introduce the callenges in WCET analysis, present strategies for static and measurement-based WCET analysis, and discuss software and hardware architectures that improve the temporal predictability of applications.


An overview of the course is given below:

Lecture 1: WCET foundations, methods of static WCET analysis
Lecture 2: Modeling the timing of hardware 1: simple vs. complex architectures, pipelines, caches
Lecture 3: Modeling the timing of hardware 2: cache modeling revisited, timing anomalies
Lecture 4: Measurement-based WCET analysis
Lecture 5: Time-predictable software and hardware architectures
Bio

Peter Puschner is a professor in computer science at Vienna University of Technology. He received his PhD from Vienna University of Technology in 1994 and then worked as a research associate at TU Vienna. In 1999 Puschner became a professor. From February 2000 to February 2001 Puschner was a Marie-Curie research fellow at the University of York, England.

P. Puschner's research interests are on hard real-time systems for safety-critical applications, with a focus on the worst-case execution time (WCET) analysis of real-time programs and on software and hardware architectures for time-predictable computing. He has strongly influenced the state of the art in these research areas, published more than 100 referred conference and journal papers, and was a guest editor for the special issue on WCET analysis of the Kluwer International Journal on Real-Time Systems in 2000.

P. Puschner has been member of numerous program committees on embedded real-time systems, was the program-committee chair of the IEEE International Symposium on Object-oriented Real-time distributed Computing (ISORC) in 2003 and of the Euromicro Conference on Real-Time Systems (ECRST) in 2004. He was general chair of the Euromicro Conference on Real-Time Systems in 2002 and of ISORC in 2004. P. Puschner is a member of the Euromicro technical committee on real-time systems and of the steering committee for the Euromicro workshop series on worst-case execution time analysis. He chairs the steering committee of the IFIP Workshop series on Software Technologies for Embedded and Ubiquitous Computing Systems (SEUS). Peter Puschner is a member of the IEEE Computer Society, Euromicro, the Austrian Computer Society (OCG), and the Marie-Curie Fellowship Association.


Compilers and Runtimes Support for Explicitly Managed Memory Hierarchies

by Jaejin Lee
Abstract

Directly addressable memory structures (a.k.a. scratchpad memory or tightly-coupled memory) can be easily found today at a higher level in the memory hierarchies of many embedded processors and high-performance heterogeneous multicores (e.g., Cell BE and general-purpose GPUs). These memory hierarchies are explicitly managed by software. This course focuses on the management techniques of such memory hierarchies at the compiler or runtime level both for code and data. The management goals are ease of programming, small memory footprint, energy efficiency, and high performance.

This course is self-contained. The topics covered will include preliminaries for explicitly managed memory hierarchies, coherence and consistency issues, real-time issues, design and implementation issues of software-managed caches, compiler (postpass) analysis and optimization techniques, runtimes support, open-source tools (FaCSim and COMIC), and some open research issues.

Bio

Jaejin Lee is an associate professor in the school of Computer Science and Engineering at Seoul National University (SNU), Korea. He received his PhD degree in Computer Science from the University of Illinois at Urbana-Champaign (UIUC) in 1999. His PhD study was supported in part by graduate fellowships from IBM and Korea Foundation for Advanced Studies. He received an MS degree in Computer Science from Stanford University in 1995 and a BS degree in Physics from Seoul National University in 1991. After obtaining his PhD degree, he spent a half year at the UIUC as a visiting lecturer and postdoctoral research associate. He was an assistant professor in the department of Computer Science and Engineering at Michigan State University from January 2000 to August 2002 before joining SNU.

He has published over 40 technical papers in the conferences and journals of compilers, architectures, parallel processing, and embedded systems including ISCA, HPCA, PACT, PPoPP, ICS, IPDPS, EMSoft, LCTES, and CASES. He also served on program committees in these areas including PACT, IPDPS, ICS, EMSoft, and LCTES. He is the general chair of LCTES 2010. He received the NSF CAREER Award but withdrew it before finalization due to his movement to SNU. He is a member of ACM and IEEE. More information can be found at http://aces.snu.ac.kr


Networks-on-Chip: Why, What, and How?

by Radu Marculescu
Abstract

Nowadays, many embedded systems ranging from set-top boxes to mobile phones and PDAs are being designed using multiprocessor System-on-Chip (SoC) platforms which are difficult to optimize but offer the promise of flexibility, low cost, and time-to-market advantages. The abundance of the computational resources places tremendous demands on the communication infrastructure; this makes the bus-based (or point-to-point) communication inappropriate for complex designs involving tens or hundreds of IP cores. The Network-on-Chip (NoC) approach emerged recently as a promising solution to these complex communication problems.

This course will introduce participants to the emerging area of NoC design. Specifically, we plan to discuss performance models and optimization techniques that can be used to design different NoC architectures, while reasoning about performance, energy, and fault-tolerance tradeoffs. To better understand the advantages offered by the NoC approach, we plan to discuss a concrete NoC-based implementation of an MPEG-2 encoder and provide direct measurements using an FPGA prototype and actual video clips.

The topics covered will include the following:
Bio

Radu Marculescu received his Ph.D. in Electrical Engineering from the University of Southern California. He is currently a Professor in the Dept. of Electrical and Computer Engineering at Carnegie Mellon University, USA.

Dr. Marculescu has received the Best Paper Award of IEEE Transactions on VLSI Systems in 2005, as well as several best paper awards in major conferences in the area of design automation. Dr. Marculescu was the recipient of the CAREER Award from the National Science Foundation in 2000. Dr. Marculescu is currently an Associate Editor of IEEE Transactions on Computers, ACM Transactions on Design Automation of Embedded Systems, and ACM Transactions on Embedded Computing Systems. In the past, Dr. Marculescu was an Associate Editor of IEEE Transactions on Very Large Scale Integration Systems.

Dr. Marculescu has been involved in organizing many international symposia, conferences and workshops, as well as guest editor of special issues in archival journals and magazines. He was the co-Founding General Chair of the IEEE Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia).

His research focuses on design methodologies and software tools for system-on-chip design, on-chip communication, and ambient intelligence. He has extensively published in these areas and contributed to several edited books.


Performance, Scalability, and Real-Time Response From the Linux Kernel

by Paul McKenney
Abstract

Embedded systems increasingly use multi-core processors, which increasingly means that embedded software must provide good performance and scalability on multi-core systems, while also meeting real-time-response requirements. This course will describe some of the primitives and capabilities provided by the Linux kernel to enable performance, scalability, and real-time response, as well as how to adapt these primitives and capabilities to open-source application-level software.


Lecture 1: Introduction to performance, scalability, and real-time issues on modern multicore hardware
Lecture 2: Performance and scalability technologies in the Linux kernel
Lecture 3: Creating performant and scalable Linux applications
Lecture 4: Real-time technologies in the Linux kernel
Lecture 5: Creating real-time Linux applications
Bio

Paul E. McKenney (http://www.rdrop.com/users/paulmck) is a Distinguished Engineer and the CTO of Linux at IBM. McKenney is an active member of the Linux kernel community, where he leads development on the read-copy update (RCU) synchronization mechanism and also contributes to its real-time capabilities. Prior to joining IBM, McKenney worked at Sequent Computer Systems on SMP and NUMA algorithms in the DYNIX/PTX operating-system kernel, which ran seven of the ten largest Oracle installations as of 1999. Prior to that, McKenney worked at SRI International on Internet and packet-radio research, prior to which he worked as an independent contractor on soft real-time systems.

McKenney holds more than 30 patents with 30 more pending, and has published more than 60 papers, including one book chapter, 12 journal articles, and 24 refereed conference papers, including a best-paper award. A number of these papers are cited in leading textbooks in the areas of parallel programming, operating systems, and networking.

McKenney received bachelors degrees in Computer Science and in Mechanical Engineering from Oregon State University in 1981, a masters degree in Computer Science from Oregon State University in 1988, as well as a Ph.D. in Computer Science and Engineering from Oregon Health and Science University in 2004. His Ph.D. topic was RCU (http://www.rdrop.com/users/paulmck/RCU/RCUdissertation.2004.07.14e1.pdf).


Transactional Memory

by David Wood, University of Wisconsin-Madison, USA
Abstract

Chip multiprocessors (a.k.a. multi-core chips) promise unprecedented computing power on a single chip. Yet to harness this power, most programmers rely on low-level programming constructs such as locks to coordinate multiple threads. Experience has shown that the resulting parallel programs are difficult to write, debug, and maintain, and often perform poorly as well.

Transactional memory is one promising approach to solving this problem. A TM system lets a programmer declare that a code region should appear atomic and rely on the system to make it so. A successful transaction commits, while an unsuccessful one (e.g., one that conflicts with a concurrent transaction) aborts and may transparently or explicitly retry. TM systems may be implemented with direct hardware support (fast but currently not completely virtualized), software only (virtualized but currently slow), or as hybrids where hardware accelerates a software TM system.

This class will introduce students to the fundamental concepts of transactional memory, discuss recent research in TM systems, and prepare students to contribute to future TM systems. Four lectures are planned:

  1. Shared-memory multiprocessor review, with emphasis on cache coherence via snooping or directories, and basic parallel programming.
  2. Origins of transactional memory (TM), including database management systems, IBM 801, original TM, and Speculative Lock Elision.
  3. Modern hardware transactional memory (HTM), including TCC, LogTM, and TokenTM.
  4. Software transactional memory (STM) and Hybrid transactional memory (HybridTM), including HyTM (i.e., Sun Rock) and USTM.
Bio

Prof. David A. Wood is a Professor in the Computer Sciences Department at the University of Wisconsin, Madison. Dr. Wood also holds a courtesy appointment in the Department of Electrical and Computer Engineering. Dr. Wood received a B.S. in Electrical Engineering and Computer Science (1981) and a Ph.D. in Computer Science (1990), both at the University of California, Berkeley. He joined the faculty at the University of Wisconsin in 1990.

Dr. Wood was named an ACM Fellow (2005) and IEEE Fellow (2004), received the University of Wisconsin's H.I. Romnes Faculty Fellowship (1999), and received the National Science Foundation's Presidential Young Investigator award (1991). Dr. Wood is Area Editor (Computer Systems) of ACM Transactions on Modeling and Computer Simulation, is Associate Editor of ACM Transactions on Architecture and Compiler Optimization, served as Program Committee Chairman of ASPLOS-X (2002), and has served on numerous program committees. Dr. Wood is an ACM Fellow, an IEEE Fellow, and a member of the IEEE Computer Society. Dr. Wood has published over 70 technical papers and is an inventor on twelve U.S. and International patents.

Dr. Wood co-leads the Wisconsin Multifacet project with Prof. Mark Hill (URL http://www.cs.wisc.edu/multifacet), which is exploring techniques for improving the availability, designability, programmability, and performance of commercial multiprocessor and chip multiprocessor servers. Research on transactional memory includes the LogTM, LogTM-SE, and TokenTM systems which are supported by the widely-distributed Wisconsin GEMS full-system simulator.


Practical System-Level Design Methodologies for Processor-Centric SoC and Embedded Systems

by Grant Martin
Abstract

In the past several years, several advanced system level design methodologies have moved from the speculative or research domain into real practical usage. Among these are application-specific instruction set processor (ASIP) customisation, use of multiple heterogeneous processors, system level modelling using SystemC for architectural design space exploration, the use of virtual platforms or prototypes for early software development, linking algorithmic modelling to processor customisation and HW-SW tradeoffs, modelling and specification of advanced interconnect, and high-level synthesis. Although most of these technologies still have areas for further development, they can be used in very pragmatic ways as part of an advanced design methodology for SoCs and embedded systems today. This course will concentrate on defining what the usable state of the art is in these areas, and build towards a set of practical design methodologies that can be immediately applied. They will be illustrated with examples drawn from real processor-centric subsystem designs in multimedia, signal and image processing and wireless and wired communications.

Topics:
Bio

Grant Martin is a Chief Scientist at Tensilica, Inc. in Santa Clara, California. Before that, Grant worked for Burroughs in Scotland for 6 years; Nortel/BNR in Canada for 10 years; and Cadence Design Systems for 9 years, eventually becoming a Cadence Fellow in their Labs. He received his Bachelor's and Master's degrees in Mathematics (Combinatorics and Optimisation) from the University of Waterloo, Canada, in 1977 and 1978.

Grant is a co-author or co-editor of nine books dealing with SoC design, SystemC, UML, modelling, EDA for integrated circuits and system-level design, including the first book on SoC design published in Russian. His most recent book, ESL Design and Verification, written with Brian Bailey and Andrew Piziali, was published by Elsevier Morgan Kaufmann in February, 2007.

He was co-chair of the DAC Technical Programme Committee for Methods for 2005 and 2006. His particular areas of interest include system-level design, IP-based design of system-on-chip, platform-based design, and embedded software. Grant is a Senior Member of the IEEE.


Process Virtualization and Symbiotic Program Optimization

by Kim Hazelwood
Abstract

Process-level virtualization systems, such as Pin, Valgrind, and DynamoRIO, have proven to be extremely valuable for program introspection, architectural simulation, bug detection, and security enforcement. These systems enable transparent access to processor and memory state after every executed application instruction. Users may gather arbitrary statistics about the run-time behavior of an executing application to locate inefficiencies or errors without perturbing the normal execution of that application. More recently, researchers have taken advantage of the transparent nature of process virtualization to do more than just inspect program behavior. The same frameworks have been used to implement runtime optimizations, run-time enforcement of security policies, and run-time adaptation to environmental factors, such as power and temperature. Process virtualization is therefore a significant opportunity that comes with significant implementation challenges.

In this course, I will provide an overview of process-level virtualization, will cover several implementation considerations, and will cover various applications of the technology. One application I will cover in detail is symbiotic program optimization. Research efforts in optimizing computer systems have historically targeted a single logical layer in the system stack, be it application code, operating systems, virtual machines, microarchitecture, or circuits. An effective way to mitigate the complexity of modern designs is to employ a holistic solution that considers multiple layers of hardware and software in conjunction, allowing software to adapt and react to changing system conditions at run time. A move toward cross-layer design means that one layer will be required to collate, analyze, and respond to the various inputs from multiple design layers in order to orchestrate a cohesive solution, and a virtualization layer is an ideal place to perform this orchestration.

Bio
Kim Hazelwood received her Ph.D. in computer science from Harvard University in 2004, and has been an Assistant Professor at the University of Virginia since 2005. Her research lies at the interface of hardware and software, where she focuses on virtual execution environments, their applications, and their implementation. Prior to joining UVa, Kim held a post-doctoral position with the Intel Pin team, and she continues to collaborate with Intel as a faculty consultant. She has also contributed to several similar projects over 10 years, including HP Dynamo, CarbonFire, DELI, DynamoRIO, and IBM Jikes RVM. Kim has published over 25 peer-reviewed articles relating to architecture, compilers and virtual machines. She has served on over a dozen program committees, including PLDI, MICRO, and PACT, and is the program chair of CGO 2010. She is the recipient of numerous awards, including the FEST Distinguished Young Investigator Award for Excellence in Science and Technology, an NSF CAREER Award, a Woodrow Wilson Career Enhancement Fellowship, and research awards from Microsoft, Google, NSF, and the SRC.


Embedded Systems, Memory Systems, and Embedded Memory Systems

by Bruce Jacob
Abstract

One of the primary figures of merit that distinguishes embedded systems from general-purpose systems is amenability to design-time analysis, because the degree to which design-time assumptions match run-time behaviors often dictates (and limits) the correctness and precision of an embedded system. Whereas general-purpose systems can often tolerate run-time variances from design-time assumptions, most embedded systems cannot. For instance, most general-purpose systems can tolerate long or variable-length memory latencies; the only ramification is reduced performance. By contrast, run-time memory latencies that fail to match design-time expectations can be catastrophic to an embedded system.

A problem is that main memory systems today, especially those using commodity memory controllers, are as complex and as unpredictable (at design time) as out-of-order microprocessors. They exhibit multiple levels of caching, arbitration, queueing, and sophisticated scheduling. They are deeply pipelined and can reorder requests to a significant degree. One consequence is that real-world latency distributions look more like pink noise than spikes or bi-modal distributions. This is problematic because most simulators and analytical performance models assume simple latency distributions and simple behavior ... with obvious ramifications for the design of embedded systems when using these tools.

This course will cover the fundamentals of embedded systems and their design, a tutorial on memory systems, in-depth discussion of how the two conflict, and some potential mitigation techniques.

Bio

Bruce Jacob is an Associate Professor and the Director of Computer Engineering in the Dept. of Electrical and Computer Engineering at the University of Maryland, College Park. He received his Ars Baccalaureate, cum laude, in Mathematics from Harvard University in 1988, and his M.S. and Ph.D. in Computer Science and Engineering from the University of Michigan in 1995 and 1997, respectively. In addition to his academic credentials, he has extensive experience in industry: he designed real-time embedded applications and real-time embedded architectures in the area of telecommunications for two successful Boston-area startup companies, Boston Technology (now part of Comverse Technology) and Priority Call Management (now part of uReach Technologies). At Priority Call Management he was employee number 2, the system architect, and the chief engineer.

Jacob's work in advanced DRAM architectures at Maryland is the first comparative evaluation of today's memory technologies, and he received the prestigious CAREER Award in 1999 from the National Science Foundation for his early work in that area. Honors for his teaching include the departmental George Corcoran Award, the University of Maryland Award for Teaching Excellence, and his 2006 induction as a Clark School of Engineering Keystone Professor. He has published over 50 papers on a wide range of topics, including computer architecture and memory systems, low-power embedded systems, electromagnetic interference and circuit integrity, distributed computing, astrophysics, and algorithmic composition. His recently published book on computer memory systems (Jacob, Ng, and Wang: Memory Systems -- Cache, DRAM, Disk, Morgan Kaufmann Publishers, Fall 2007) is large enough to choke a small elephant.


Optimizations for multicore and GPGPU architectures

by J. (Ram) Ramanujam
Abstract

On-chip parallelism with multiple cores is now ubiquitous. Because of power and cooling constraints, recent performance improvements in both general-purpose and special-purpose processors have come primarily from increased on-chip parallelism rather than increased clock rates. Parallelism is therefore of considerable interest to a much broader group than developers of parallel applications for high-end supercomputers. Several programming environments have recently emerged in response to the need to develop applications for GPUs, the Cell processor, and multi-core processors from AMD, IBM, Intel etc. As commodity computing platforms all go parallel, programming these platforms in order to attain high performance has become an extremely important issue. There has been considerable recent interest in using compiler optimization frameworks to automatically transform sequential programs for parallel execution. This course will provide an extensive overview of this topic in addition to addressing the issue of programming models.

This course on ll focus on source-to-source compiler optimizations aimed at multicore and general purpose GPU architectures. Concerning multicore, we will discuss (i) dependences, (ii) transformations, (iii) polyhedral models and tiling, (iv) simdization, and (v) locality and parallelism optimizations. Concerning GPUs, we will discuss (i) performance characterization, (ii) memory access optimizations, and (iii) scratchpad management, and (iv) multi-level parallelism optimization.

Bio

J. Ramanujam received the B. Tech. degree in Electrical Engineering from the Indian Institute of Technology, Madras, India in 1983, and his M.S. and Ph. D. degrees in Computer Science from The Ohio State University in 1987 and 1990 respectively. He is currently the John E. and Beatrice L. Ritter Distinguished Professor in the Department of Electrical and Computer Engineering at Louisiana State University (LSU), Baton Rouge, Louisiana, USA. In addition, he is on the faculty at the Center for Computation and Technology at LSU. His research interests are in compilers for high-performance computer systems, embedded systems, software optimizations for low-power computing, high- level hardware synthesis, parallel architectures and algorithms. He has led and participated in several NSF-funded projects. Additional details can be found at: http://www.ece.lsu.edu/jxr/


Performance Evaluation and Benchmarking

by Lieven Eeckhout
Abstract

Performance evaluation and benchmarking is at the heart of experimental computer science and engineering research and development,both in software and hardware. Researchers use benchmarking to evaluate the impact on performance of their novel research ideas; developers benchmark products under development to assess their performance and market analysts compare commercial products based on published performance numbers. As such, it is absolutely crucial to have a rigorous and effective performance evaluation and benchmarking toolbox, inappropriate tools may lead to incorrect conclusions in practice.

This course will cover a number of performance evaluation and benchmarking tools, applicable to both software and hardware researchers and developers. The four main topics of this course are:

  1. analytical performance modeling: a mechanistic model will be presented that provides deep insight in superscalar processor performance
  2. statistical simulation: a paradigm for capturing large workloads in very small synthetic benchmarks.
  3. workload composition: a methodology for composing representative benchmark suites.
  4. rigorous performance evaluation and benchmarking of managed runtime systems and multi-program workloads.
Bio
Lieven Eeckhout is an assistant professor at Ghent University, Belgium, and is a postdoctoral fellow with the Fund for Scientific Research Flanders (FWO). He received his PhD degree in computer science and engineering from Ghent University in Dec 2002. His main research interest include computer architecture, virtual machines, performance modeling and analysis, simulation methodology, and workload characterization. He has published papers in top conferences such as ISCA, ASPLOS, HPCA, OOPSLA, PACT, CGO, DAC and DATE; he has served on multiple program committees including ISCA, PLDI, HPCA and IEEE Micro Top Picks; and he is the program chair for ISPASS 2009. His work on hardware performance counter architectures was selected by IEEE Micro Top Picks from 2006 Computer Architecture Conferences as one of the "most significant research publications in computer architecture based on novelty and industry relevance". He graduated 5 PhD students, and currently supervises one postdoctoral researcher and 5 PhD students.