B.1.3 State of the Art
Unfortunately, general-purpose architectures and compilers are not suitable for the design of real-time and high-performance (massively parallel) programmable system-on-chip. Indeed, to achieve tera-operations per second, the multi-core VLIW or superscalar architectures foreseen in mid-term will require both a high frequency and a large die area, with a power budget incompatible with most embedded markets. Achieving a higher compute density and still preserving programmability is a challenge for the choice of an appropriate architecture, programming language and compiler. Typically, a low power design requires the clock frequency of the chip to be as low as possible; this means that hundreds of operations per cycle must be sustained in real-time, exploiting multiple levels of parallelism present in the application. This is particularily true for the domain of streaming applications, i.e. where the system has to deal, in real-time, with a continuous flow of information that need to be transformed, such as video- and audio-related applications.
Furthermore, the variety of processing tasks that have to be performed by a system may require the use of heterogeneous multi-processors-on-a-chip (MPSoCs) consisting of multiple, heterogeneous, tightly coupled processor cores aimed at handling specific tasks and types of concurrency, e.g., vector processing, data-dependent SIMD operations, and instruction-level parallelism in control-oriented parts of the code.
Currently, the typical approach to programming such highly parallel systems consists of many stages of empirical, error-prone, manual transformations, involving the interaction of many engineers with specific skills and expertise. Much of the complexity in this approach comes from inadequacies between the properties of the application domain and the execution model that underlies the general-purpose languages, compilers, and architectures:
| Application domain | Execution model |
|
|
In response to this issue, the Massachussetts Institute of Technology (Cambridge, MA, United States) proposed to entirely replace the traditional programming languages, and introduced the StreamIt language. In StreamIt, streams and stream operators (so-called "filters") are the primitive elements of the language. This approach offers direct advantages - especially in terms of clearly defined semantics and possibilities of powerful program analysis. However, it also set a very high threshold for the acceptance of the language within the industry: as long as it remains unknown or unfamiliar to developers and to software architects, it will not be easily accepted nor deployed. Finally, while the process of compiling StreamIt programs is simplified by the clear definition of the language, the challenge of generating optimised code for a variety actual hardware targets remains a task beyond the capacity of a research laboratory, yet it must be addressed before industrial deployment.
Another complexity and cost factor is the fact that the heterogeneity of the targeted MPSoC systems. Currently, program development flow for MPSoCs is split early on into processor-specific tasks, forcing fundamental design decisions early in the application development process. These design choices cannot be easily modified at later stages, and lead to substantial re-design costs and delays if/when the original decisions are proven wrong. Furthermore, the use of multiple compilation chains implies additional costs, such as training of personnel, duplication of required skills, and the purchase and maintenance of additional software tools.
These issues are best illustrated by the current approach to programming the Philips Nexperia "Viper" processor series - consisting of a MIPS core and one or two TriMedia cores - and the IBM Cell system consisting of a PowerPC core and a set of Synergistic Processing Units (SPUs). In both cases, two distinct compilers (from two distinct vendors, in the Philips case) are used to program the two types of processor cores present in the system. The programs for the control processor (MIPS, resp. PowerPC) and the stream-oriented cores (TriMedias, resp. SPUs) are separated at source level, and are processed by two separate compilers. There is a recent research project on compiler technology for scalable architectures [1], aimed at supporting SIMD and heterogeneous parallelism by automatically partitioning the data and code of a single source program among the different processors (one PPE and eight SPU's) of CELL. This work is based on a proprietary compiler, not the Open Source GCC, and does not deal with an Absract Streaming Machine model.
A solution acceptable to the industrial players will therefore have to combine two complementary factors: domain-oriented extensions to a programming language accepted in the industry, and efficient code generation for multiple, heterogeneous, tightly coupled processor cores from a single program representation.
While for general-purpose applications this is very ambitious venture, its restriction to stream-oriented applications and architectures forms a clearly defined task responding to the current and future needs of European semiconductor industry. This is why the ACOTES consortium combines the expertise in streaming applications (Inria, Philips, ST), in representations of concurrency (Inria, Philips, UPC), in compiler construction (IBM, Inria, Philips, ST), and in high-performance stream-oriented embedded architectures (IBM, Philips, ST, UPC).
TODO: add scientific/technical positioning wrt.
The coordination with these projects/NoEs belongs in "B3.6 - Other National and International Research Activities".
The topics of this project fully match those of the HiPEAC Netowrok of Excellence: High performance embedded architecture and compilers. The ACOTES project will directly benefit from the communication infrastructure and activities of the HiPEAC Network of Excellence (conference, cluster meetings, summer school, and other dissemination and integration activities). Since most of the members of the consortium are also members of the network, HiPEAC events will provide additional opportunities for meetings among consortium members, as well as discussion and exchange of ideas with other HiPEAC members, and additional benefit will be obtained from the dissemination and speading excellence activities of the NoE.
It must be noted that Networks of Excellence are not research projects themselves, but instruments intended to ease collaborative research among members in a given set of topics. As such, the ACOTES project is a token of the success of HiPEAC in bringing together key players in the embedded compiler and technology area.
The SCALA Integrated Project will explore the different alternatives for programming models on new chip-multiprocessors. It will go from the traditional MPI and OpenMP towards new developments like the Cray (Chapel) or IBM (X-10) proposals. Most of the prototypes that will be built will go under the simulation environment of the new architecture proposed in the project.
Instead, the proposal in this Strep is to focus on one of the alternatives to define a programming model oriented to streamming applications. This project will use an existing compiler technology (GCC4) to develop the programming model and the code generation for some of the already existing chip-multiprocessors (Cell, Philips, ST), ensuring the ability to run the final binaries on actual machines. Having 5 out of the 6 partners in this project in SCALA is a very good opportunity to go beyond the pure research, and transfer the technology into real products.
MilePost is a STREP on Machine Learning Compilation that be submitted concurrently by other members of the HiPEAC network (and non HiPEAC members), led by the University of Edinburgh. It will focus on improving the productivity of compiler designers, relying on machine learning techniques and iterative optimization to drive complex optimizations automatically. ACOTES will eventually benefit from researches and developments from this STREP, since some of the work will target GCC as well (this decision was coordinated by the HiPEAC network, to favor the emergence of a competitive common platform for compilation research). Yet no machine learning work will take place in ACOTES: the domain-specific nature of streaming applications and architectures makes classical static model approaches more appropriate, benefitting from a custom-designed programming model, user annotations and the Abstract Streaming Machine (ASM) description. [1] http://researchweb.watson.ibm.com/cellcompiler/compiler.htm
