Multicore Programming Models and their Compilation Challenges

Vivek Sarkar, Rice University


Abstract

The computer industry is at a major inflection point in its hardware roadmap due to the end of a decades-long trend of exponentially increasing clock frequencies. It is widely agreed that spatial parallelism in the form of multiple power-efficient cores must be exploited to compensate for this lack of frequency scaling. Unlike previous generations of hardware evolution, this shift towards homogeneous and heterogeneous manycore computing will have a profound impact on software. Two complementary compiler approaches to address this problem are 1) compilation and optimization of explicitly parallel programs, and 2) automatic extraction of parallelism from sequential programs. This course addresses the first approach, whereas the second approach is addressed in the course titled "Compilation for Multicore Processors" by Prof. Scott Mahlke.

In this course, we will start with a brief overview of modern programming models for multicore processors including Cilk, CUDA, Java threads, and OpenMP 3.0. Our focus on these programming models will be from the compiler viewpoint, and we will identify a common set of primitives that are suitable for use in parallel intermediate representations (PIRs) for multicore programs. These primitives (async, finish, isolated, phasers, places) are derived from the X10 language and are directly embodied in the pedagogical Habanero-Java (HJ) language developed at Rice University.

The remainder of the course focuses on compilation challenges for parallel programs at the PIR level. The historical foundations of code optimization including intermediate representations, data flow analyses, and optimizing transformations are all deeply entrenched in the von Neumann model of sequential computing, and have to be reworked for parallelism. We summarize the state of the art in analysis and optimization of parallel programs by covering the following topics:

  1. Intermediate Representations for Parallel Programs
  2. Data Flow Analysis frameworks for Parallel Programs
  3. Memory Models and their impact on Code Optimization
  4. Privatization and Escape Analyses
  5. Optimization of Task Granularity and Synchronization

Bio

Vivek Sarkar is the E.D. Butcher Professor of Computer Science at Rice University. He conducts research in multiple aspects of parallel software including programming languages, program analysis, compiler optimizations and runtimes for parallel and high performance computer systems. He currently leads the Habanero Multicore Software Research project at Rice University, serves as Associate Director of the NSF-funded Center for Domain-Specific Computing, and as co-PI on the DARPA-funded project on Platform-Aware Compilation Environment (PACE). Prior to joining Rice in July 2007, Vivek was Senior Manager of Programming Technologies at IBM Research. His past projects include the X10 programming language, the Jikes Research Virtual Machine, the ASTI optimizer used in IBM's XL Fortran product compilers, the PTRAN automatic parallelization system, and profile-directed partitioning and scheduling of Sisal programs. Vivek became a member of the IBM Academy of Technology in 1995, the E.D. Butcher Professor of Computer Science at Rice University in 2007, and was inducted as an ACM Fellow in 2008. He holds a B.Tech. degree from the Indian Institute of Technology, Kanpur, an M.S. degree from University of Wisconsin-Madison, and a Ph.D. from Stanford University. In 1997, he was on sabbatical as a visiting associate professor at MIT, where he was a founding member of the MIT RAW multicore project.