With a full implementation of OpenMP 3.1. already available upstream, we aim at continuing that work and add support for OpenMP 4.0 as well. One important component introduced by OpenMP 4.0 is offloading which enables the execution of a given structured block to be transferred to a device other than the host.
An implementation for OpenMP offloading infrastructure in clang is proposed in http://goo.gl/L1rnKJ. This document is already a second iteration that includes contributions from several vendors and members of the LLVM community. It was published in http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084304.html for discussion by the community, and so far we didn’t have any major concern about the design.
Unlike other OpenMP components, offloading requires support from the compiler driver given that for the same source file, several (host and target) objects will be generated using potentially different toolchains. At the same time, the compiler needs to have a mechanism to relate variables in the host with the ones generated with target, so communication between toolchains is required. The way this relation is supported by the driver will also have implications in the code generation.
This patch proposes an implementation of the driver support for offloading. The following summarizes the main changes this patch introduces:
a) clang can be invoked with -fopenmp=libiom5 -omptargets=triple1,…,tripleN, where triplei are the target triples the user wants to be able to offload to.
b) driver detects whether the offloading triples are valid or not and if the corresponding toolchain is prepared to offload. This patch only enables offloading for Linux toolchains.
c) Each target compiler phase takes the host IR (result of the host compiler phase) as a second input. This will enable the host generation to specify the variables that should be emitted for the target in the form of metadata and this metadata could be read by the target frontend.
d) Given that the same host IR result info is used by the different toolchains, the driver keeps a cache of results in order to avoid the job that generates a given result to be emitted twice.
e) Offloading leverages the argument translation functionality in order to convert host arguments into target arguments. This is currently used to make sure a shared library is always produced by the target toolchain - a library that can be loaded by the OpenMP runtime library.
f) The target shared libraries are embedded into the host binary by using a linker script produced by the driver and passed to the host linker.
g) The driver passes to the frontend offloading a command that specify if the frontend is producing code for a target. This is required as the code generation for target and host have to be different.
h) A full path to the original source file is passed to the frontend so it can be used to produce unique IDs that are the same for the host and targets.
Thanks!
Samuel
Example?