This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][Clang] Usability improvements for OpenMP offloading
AbandonedPublic

Authored by sdmitriev on Jul 18 2018, 2:34 PM.

Details

Summary

Current OpenMP offload implementation assumes that offload targets cannot be dropped. If object file is created with some set of offload targets, you cannot drop any offload target at link step. This limitation impacts offload usability.

Offload targets are non discardable because of the compiler generated offload initialization code which is intended to register target binary for each offload target at program startup. Compiler creates the initialization code in each host object. This code depends on the offload targets the object was compiled with which makes host part of the object dependent on the particular targets that were specified at compile time.

This patch implements a different way of registering target binaries at runtime. Compiler no longer creates registration code for the target binaries in the host part of each fat object which makes host object completely independent from particular offload targets. Instead, registration code is moved to a dynamically generated "offload-wrapper" object which besides the registration code also contains offload target binaries packaged as read only data. The wrapper object is created by the clang driver at link step with a help of a new tool called clang-offload-wrapper.

Making host part of the object independent from offload targets relaxes the requirement to use exactly the same set of offload targets for compile and link steps. Offload targets provided at the link steps now need to be a subset of the targets used at compile steps.

Diff Detail

Event Timeline

sdmitriev created this revision.Jul 18 2018, 2:34 PM
kkwli0 added a subscriber: kkwli0.Jul 18 2018, 2:55 PM
  1. The new tool definitely requires an RFC at first.
  2. The patch is too big and should be split into several small patches
sdmitriev abandoned this revision.Aug 2 2018, 8:28 PM

A simpler solution was proposed while discussing an RFC which I submitted a while ago per Alexey's request. It would lead to (almost) the same results but requires much smaller amount of changes.