Programming FPGA-Based Accelerators using ROCCC 2.0
Recent advances in VLSI technology have given a tremendous boost to both the size and speed of circuits that can be mapped onto FPGA. This has opened up a huge range of potential applications for FPGAs at a time when the doubling in of the CPU speed every two years, made possible by Mooreʼs Law, seems very unlikely. This has in turn infused a new life in the area of reconfigurable computing where a computation is expressed as a circuit through which data is streamed rather than a sequence of instructions operating on a fixed data-path. In this model the circuit is reconfigured, at times dynamically, to conform to the computation at hand. Numerous studies have demonstrated speed-ups using this technology that have varied from 10x to 10,000x on a wide spectrum of applications.
The programmability problem is two-fold (1) programming the FPGA requires low-level knowledge not normally known by application designers; (2) and the applications themselves are typically highly tuned to a software execution. The task of transforming temporal code, intended to be run on a microprocessor, into spatial code that best takes advantage of reconfigurable hardware is non-trivial and a hardware compiler may not perform the proper set of transformations to generate efficient hardware. Configuring the software so that the optimal hardware is generated may require extensive transformation to both the application and the compiler.
To address these issues, as well as provide an open source platform for further growth, we introduce the second generation of the Riverside Optimizing Compiler for Configurable Computing (ROCCC 2.0). ROCCC 2.0 is a compiler that transforms modules and systems written in C into synthesizable VHDL. Modules are written in a subset of C and transformed into parallel, pipelined VHDL. Modules compiled by ROCCC are then exported back to the user and may be used to build up larger modules and complete systems that interface with memory through user specified channels. The main features of ROCCC 2.0 are:
- a. Modular (LEGO-like) code design supporting code re-use and compiler generated modular redundancy for increased reliability.
- b. Separation of user application code from the interface to external devices.
- c. Allows the import of hard or soft IP cores into modules written in C code.
Including a hardware module in C code is accomplished by calling the exported C function with no special syntax, resulting in the generation of hardware that includes the module directly in the pipeline where necessary. This allows for the creation of hardware systems from the bottom-up while retaining the extensive parallelizing optimizations performed by ROCCC 1.0. Approaching the creation of hardware accelerators in this manner gives designers the control necessary to identify and create systems appropriate for hardware directly in C. In this tutorial we discuss the details of programming for and implementing ROCCC, features currently in development including support for redundancy, as well as examples of ROCCC compilation.