Research on High Performance Interconnection Networks for Embedded Applications


General Topic:

After hitting the power dissipation wall, the computer industry moved to multi-core processing chips in order to continue increasing the computing speed while having a bounded power consumption budget. The embedded processor industry soon followed the same trend, mostly motivated by the need to minimize power and energy consumption.

Although the number of cores in current processing devices is rather small (i.e. two to eight cores per chip), this trend is expected to continue for many years, with some recent announcements claiming that 80-core chips will be available in the market within 5 years. Such a large number of cores requires a high-performance interconnection network to efficiently interconnect those cores among them and with cache blocks and/or memory controllers.

The lower speed of embedded devices makes it easier to interconnect the cores within a chip, a shared bus being enough for current designs. However, although a high-performance shared bus can provide enough communication bandwidth to fulfill the requirements of embedded multi-core chips during the next five to ten years, switched
interconnects with point-to-point links have the potential to drastically reduce power consumption with respect to shared buses. The reason is that packets are only transmitted to the required destination (instead of broadcasting them through the shared medium and having every node checking whether the packet is for it) through links with much lower parasitic capacitance than a shared bus. Thus, research on switched interconnection networks is crucial for the success of future embedded systems.

Different european researchers have been researching on interconnection networks for a varying number of years. This cluster aims at gathering most of these researchers to promote the collaboration among them regarding the research on interconnection networks applied to embedded systems. Thus, this cluster, although is the first time it is applied for, will settle a series of clusters that will tighten the relationship among the european researchers working on interconnection networks.

Within HiPEAC, these topics constituted part of the subject of the "Scalable Systems Architecture" Cluster. That cluster is now being divided into two new clusters, because its subject was found to be too wide: this is the first of these two new clusters, the other one being "Interprocessor Communication Mechanisms".

Besides the specific topics that will be also covered in this cluster (see below), all the participants are interested on exchanging ideas and working together. For instance, the group in Poznan University of Technology (PUT) works on packet scheduling algorithms in single stage (cross-bar) and multistage (Clos-type) packet switches. The aim of the current work will be to study possibilities of implementing the proposed algorithms and solutions on interconnection networks for multiprocessor systems. Therefore, contacts, information exchange and meetings with people working on interconnection networks in multiprocessors system will help them to get some experience in this field and consider possible future collaborations focused on implementing the proposed algorithms.

In this first cluster proposal, three well defined research lines on interconnection networks have been already identified. Indeed, the first contacts among the interested participants have already taken place. In particular, in this first cluster we intend to research on Networks on Chips from the routing, latency-power trade-offs, and task mapping perspectives. Also, we will research on high-radix switch organizations for off-chip networks.

In this cluster we apply for general meetings among all the participants in the cluster and also for some local meetings for the already identified research topics. Also, several related cluster proposals will be concurrently submitted, requesting travel fellowships to work on the specific topics described below.

Specific Topics:

Systems-on-Chip (SoCs) are becoming increasingly large, complex, and heterogeneous. Future SoCs will integrate hundred of heterogeneous intellectual property (IP) cores, making the on-chip interconnection system a key issue. The International Technology Roadmap for Semiconductors foresees that it will represent the limiting factor for performance and power consumption in next generation SoCs. In fact, the on-chip interconnection system is one of the major elements which has to be optimized in designing a complex digital system.

Networks-on-Chip (NoCs) are considered as the answer to the growing communication demand such complex systems will require. They are generally viewed as the ultimate solution for the design of modular and scalable communication architectures, and provide inherent support to the integration of heterogeneous cores through the standardization of the network boundary. NoC architectures loosen the delay bottleneck in signal propagation across deep-submicron interconnects and are likely to improve design predictability, although their area and power overheads still remain critical issues to be addressed by research.

The NoC paradigm brings networking issues to the on-chip domain. Many of the approaches and techniques used in the NoC domain get inspiration from the classical computer network domain, and more frequently, from the interconnection networks for parallel computers. Although these worlds share several design goals (like performance optimization, quality of service, connectivity, reliability, etc.), in the NoC world
new design goals like power dissipation, energy consumption, silicon area must be considered also.

Many factors affect the overall performance of a NoC. Network topology, flow control mechanism, switching technique and routing algorithm represent just a short list. This proposal mainly focuses on the following three main topics:

- Routing algorithms,
- Task mapping,
- Optimized routing unit implementations.

PROJECT1: Routing in NoCs (UPV, UNICT, Jonkoping, UV)
-----------------------------------------------------

Routing algorithm determines the path selected by a packet to reach its destination. In this context, wormhole switching is used in communication networks as the most suitable option for on-chip communication. Unfortunately, in this case, routing is very prone to deadlock because messages are allowed to hold many resources while requesting others. In general-purpose platforms, freedom from deadlock is achieved at a high loss of adaptivity. A routing algorithm, with high adaptivity has the potential of achieving high performance (low latency, low packet drop and high throughput), fault tolerance and a more uniform utilization of network resources.

Current general-purpose routing algorithms have been designed without any reference to the communication traffic characteristics. The only requirement they have to satisfy is related to complete connectivity, that is, the possibility for any node of the network to send a message to any other node of the network. Although it is a necessary requirement in a general purpose scenario, it can be considered as a weak requirement in an application specific scenario where, often, there are several pairs of network nodes which never communicate. The weak connectivity requirements allow relaxing some of the hypothesis considered during the design of deadlock free routing algorithms which translates into more freedom in the routing optimization process. Other domain specific information can also be exploited for the optimization of the communication infrastructure. For example, communications scheduling information, available after task mapping and task scheduling phases of the design flow, can be exploited to further improve communication performance.

In this subproject we aim at researching on application specific routing for NoCs and their efficient implementation in terms of latency, area, and power consumption. Also, we will analyze and explore alternatives to implement efficient task mapping algorithms in order to improve the performance and reduce cost and power.

PROJECT2: Latency-power trade-off in the design of Network on Chip topologies (UNIBO, UNIFE, UPV, Simula)
---------------------------------------------------------------------------------------------------------

In this project, the basic idea is to explore competing NoC topologies for a given specified system (i.e., given number of communication initiators and targets) through different levels of abstractions. The high-level analysis will provide an indication of candidate topologies, spanning the latency-area (or wiring) trade-off. Let us think, for instance, to the different trade-offs spanned by a mesh and a torus topology.

However, if we move to a lower level analysis, we can introduce architectural or even physical level design trade-offs in the comparison framework. In particular, the lower latency of a given topology with respect to another one can be exploited to tune the synthesis process of the switches of that topology for low-power. This will come at the expense of performance, thus balancing the inherent performance benefits of that topology. Moreover, the eventual wiring overhead of the lower-latency topology will be accounted for in the synthesis backend.

Overall, while meeting the same (or equivalent) performance requirements, the different candidate topologies will incur different power dissipations. This latency-power trade-off exploration of alternative NoC topologies will be the specific objective of this project.

The condition for the above exploration to be technically feasible is to have an NoC architecture alternatively tunable for low-latency or for low-power. In this domain, UNIFE and UNIBO have a strong expertise, and some ongoing joint (and/or partially overlapped) research activities. The main idea is to have short critical path network components, which can be therefore easily optimized for low latency (i.e., high operating frequency) or for low power (by applying netlist transformations and high-Vt library cells). The development of a suitable synthesis methodology spanning the logic synthesis, floorplanning and place-and-route steps are part of the research effort. Obviously the network protocol might change as an effect of the specific optimization (e.g., circuit-switching vs packet-switching).

The component level latency-power trade-off exploration should be carried out both for switches and for network interfaces, in order to get a global assessment of network trade-offs. Such exploration will involve the development of cycle-accurate functional models of network components with corresponding synthesizable views in SystemC and/or Verilog/VHDL.

Also, we will study and propose methodologies for dynamically turning off cores not to be used.

In this subproject we aim at:

- Topology exploration with physical implementation awareness
- high-level analysis of NoC topologies
- Development of functional models of network components
- Latency-power trade-off exploration for switch implementation
- Latency-power-area trade-off for network interface implementation
- Topology-level trade-off analysis

PROJECT 3: High-Radix Switch Organization (FORTH and UPV)
----------------------------------------------------------

Crossbar switches with a large number of ports ("high-radix" switches) are used to build multi-chip interconnection networks, and are occasionally used in networks-on-chip as well. Their advantage is minimization of hop count, and thus of energy consumption as well; their cost is due to the quadratic nature of the crossbar cost. In some cases, buffered crossbars are advantageous; FORTH has worked on reducing their cost by reducing the size of crosspoint buffers. In other cases, input/output buffered switches are preferable;
UPV has worked on reducing their cost by partitioning their internal organization into separate wires going to odd/even outputs. A doctoral student of FORTH (George Passas) will visit UPV for a couple of extended periods, and will work on combining the two schemes and proposing and studying new variants.


Research cluster

Requested: € 37800

Requested: € 0

Meetings for PROJECT 1
----------------------
Meeting at UPV
For UNICT: 1 person (Maurizio)
For UPV: 1 invited person (Shashi)

Meeting at UPV
For UNICT: 1 person (Maurizio)
For UPV: 1 invited person (Shashi)

Meeting at Jonkoping
For UPV: 2 persons (Jose and Pepe)

Meetings for PROJECT 2
----------------------
Meeting at UPV
For UPV: 1 person (Davide)
For SIMULA: 2 persons (Tor and Thomas)

Meeting at UNIFE
For UPV: 2 person (Maria Engracia and Pedro López)
For SIMULA: 2 persons (Tor and Thomas)

Global meetings
---------------
3 meeting at HiPEAC events:
For UNICT: 1 person
For UPV: 3 persons
For FORTH: 2 persons
For SIMULA: 3 persons

BUDGET:
-------

For FORTH: 5 person-trips * 1000 Euro/person-trip = 5000 Euro
For UNICT: 5 person-trips * 1000 Euro/person-trip = 5000 Euro
For UPV (invited persons): 3 person * 2 days * 300 Euro = 1800 Euro
For UPV: 13 person-trips * 1000 Euro/person-trip = 13000 Euro
For SIMULA: 13 person-trips * 1000 Euro/person-trip = 13000 Euro

Total amount: 37800 Euro


Requested: 12 month(s)

FLICH José (University Politecnica de Valencia) (--colleague--)
MEJIA Andres (University Politecnica de Valencia) (--phd student--)
DUATO Jose (University Politecnica de Valencia) (--member--)
CATANIA Vincenzo (University of Catania) (--member--)
KATEVENIS Manolis (FORTH) (--member--)
PNEVMATIKATOS Dionisios (FORTH) (--member--)
LYSNE Olav (Simula Research Laboratory) (--member--)
PALESI Maurizio (University of Catania) (--colleague--)
SKEIE Tor (Simula Research Laboratory) (--colleague--)
SøDRING Thomas (Simula Research Laboratory) (--colleague--)
GILABERT VILLAMON Francisco (University Politecnica de Valencia) (--phd student--)
LóPEZ Pedro (University Politecnica de Valencia) (--colleague--)
GOMEZ Maria (University Politecnica de Valencia) (--colleague--)
CHRYSOS Nikolaos (FORTH) (--colleague--)
ORDUñA Juan Manuel (University of Valencia (Estudi General)) (--member--)
KABACINSKI Wojciech (Poznan University of Technology) (--member--)
KUMAR Shashi (Jönköping University) (--member--)
BERTOZZI Davide (University of Ferrara) (--member--)
BENINI Luca (University of Bologna) (--member--)

Rickard Holsmark (Jonkoping)
Juan Manuel Orduña (UV)
Federico Silla (UPV)
Vicente Santonja (UPV)
Crispín Gómez (UPV)
Crispín Gómez (UPV)
Nikos Chrysos (FORTH)
George Passas (FORTH)
Sven-Arne Reinemo (SIMULA)
Åshild Grønstad Solheim (SIMULA)
Jose Luis Sanchez (UCLM)
Francisco Jose Alfaro (UCLM)