Optimising and Generating OpenCL code for an Embedded GPU Architecture
Affiliated to
ARMLocation
Cambridge, UKTiming
FlexibleDescription
OpenCL aims to provide software portability across heterogeneous systems, consisting of a host (e.g. an ARM CPU core) and accelerator devices (e.g. ARM GPU cores). OpenCL per se, however, does not address the problem of performance portability. That is, OpenCL code optimised for one accelerator device may perform dismally on another, since performance may significantly depend on low-level details, such as data layout and iteration space mapping. To enable performance portability, OpenCL should better be viewed as a target language for innovative programming tools based on advanced code generation techniques.
You will first investigate, for a class of algorithms important for mobile computing, the differences between optimising OpenCL code for a high-end graphics card (e.g. from NVIDIA) and an ARM embedded GPU. You will then investigate, prototype and evaluate a tool for generating efficient system-specific OpenCL code from a system-independent algorithm representation and system-dependent mapping parameters.
Requirements:
- Keen interest in accelerated computing: accelerator architectures, programming tools and applications;
- Excellent programming skills in C/C++, good knowledge of Linux, familiarity with CUDA and OpenCL;
- Strong communication skills; proactive and positive attitude;
- Experience with Clang/LLVM is a plus.
<< Go back to Industrial Internship Home Page
