Page MenuHomePhabricator

[Clang][OpenMP offload] Eliminate use of OpenMP linker script
AbandonedPublic

Authored by sdmitriev on Jul 18 2019, 1:23 PM.

Details

Summary

OpenMP linker script is known to cause problems for gold and lld linkers on Linux and it will also cause problems for Windows enabling in future. This is a sufficient justification for eliminating use of OpenMP linker script and replacing it with a portable solution. This patch contains prototype changes which implement such solution.

First a brief explanation how OpenMP linker script is being used in existing implementation. OpenMP linker script is dynamically generated by the clang driver and is added to the host link command to fulfil the following tasks

  1. Insert device binaries into the host binary at link time as data (makes host binary fat)
  2. Creates pair of symbols for the start/end address for each device binary
  3. And creates pair of symbols with start/end addresses around the compiler generated offload entry table

All symbols that are created by the linker script are used by the offload registration code that is added by the compiler to each host object as a comdat group. This compiler generated code consists of a pair of data objects (device binary descriptor) that use those symbols as initializers and two functions. One of those functions registers device binary descriptor at OpenMP runtime at program startup and the other unregisters it. BTW, having offload registration code in each host object is not good because it makes host object dependent on a particular list of targets (device binary descriptor depends on the offload targets).

This patch implements an alternative solution for the above tasks. Device binaries are inserted into the host binary with a help of the wrapper bit-code file which contains device binaries as data as well as the offload registration code for registering device binaries in offload runtime (tasks 1 and 2 in the above list). Wrapper bit-code file is dynamically created by the clang driver with a help of new tool clang-offload-wrapper which takes device binaries as input and produces bit-code file with required contents. Wrapper bit-code is then compiled to an object and resulting object is appended to the host linking by the clang driver.

Start/end symbols around the offload entry table (3 in the list above) are added by the linker which provides definition of start_name/stop_name symbols to satisfy unresolved references for ELF sections with a name representable as C identifier (see https://sourceware.org/binutils/docs/ld/Input-Section-Example.html for details). On Windows start/end symbols can be defined in the wrapper bit-code file with a help of the sections grouping (see https://docs.microsoft.com/en-us/windows/win32/Debug/pe-format#grouped-sections-object-only); Windows support should still be added in future.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sdmitriev added inline comments.Aug 6 2019, 9:45 AM
clang/include/clang/Driver/Action.h
74

Well, I can probably try to reuse bundling action for wrapping, but I think it will just complicate the logic. Wrapping logically differs from bundling and wrapping is done by a different tool, so I think it is natural to add a distinct action class for it.

clang/lib/CodeGen/CGOpenMPRuntime.cpp
9785–9786

Offload entries are actually emitted both for host and target compilations. I have added a check for OpenMP simd mode to createOffloadingBinaryDescriptorRegistration().

clang/lib/CodeGen/CGOpenMPRuntime.h
1472

Ok. Without virtual I do not see much reasons for adding new function which just calls createOffloadEntriesAndInfoMetadata(), so instead I have just made createOffloadEntriesAndInfoMetadata() public and added a check for OpenMP simd mode to this function.

ABataev added inline comments.Aug 6 2019, 11:36 AM
clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
70

Why not ArrayRef?

73–76

Not sure that this is the best solution, we may end up with incorrect size_t type in some cases.

74

Use real type here, not auto

134

Add comment for the true constant with the name of parameter.

137

Same here

144

Comments?

145

Use real type, not auto

149

Seems to me the code is not formatted

197

Use real type, not auto

210

llvm::None instead of {}

217

Remove extra braces here, they are not needed.

223

Extra braces

sdmitriev updated this revision to Diff 213775.Aug 6 2019, 6:29 PM
sdmitriev marked 2 inline comments as done.
sdmitriev marked 14 inline comments as done.Aug 6 2019, 6:39 PM
sdmitriev added inline comments.
clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
70

Changed to MemoryBufferRef which in some sense is similar to ArrayRef.

73–76

I have slightly revised this code to avoid unexpected results.

149

Right. Fixed.

210

Switched to a different constructor which does not have argument types.

vzakhari added inline comments.
clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
227

FYI, llvm.global_dtors does not work on Windows. The symbol will be placed into .CRT$XC[A-Z] portion of .CRT in TargetLoweringObjectFileImpl.cpp, so, basically, tgt_unregister_lib will be called right after tgt_register_lib. We can either use a trick from ASAN, i.e. put llvm.global_dtors into .CRT$XT[A-Z] (I am not sure how solid this solution is) or call atexit() inside tgt_register_lib to register tgt_unregister_lib terminator.

sdmitriev marked 3 inline comments as done.Aug 7 2019, 11:36 AM
sdmitriev added inline comments.
clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
227

It works as expected on Linux, so I guess this is just a bug in lowering code for Windows that need to be fixed.

ABataev added inline comments.Aug 7 2019, 11:41 AM
clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
227

Still, better to call atexit(), this is common solution to call a global destructor/deinitializer

vzakhari added inline comments.Aug 7 2019, 11:42 AM
clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
227

I agree. One other thing: if __tgt_register_lib is never called do we want to call __tgt_unregister_lib? It is possible with global_ctors/global_dtors, but not possible with atexit/cxa_atexit.

As I understand ‘atexit’ solution would be target dependent (‘__cxa_atexit’ on Linux and ‘atexit’ on Windows) whereas @llvm.global_ctors/dtors variables offer similar and platform neutral functionality (http://llvm.org/docs/LangRef.html#the-llvm-global-ctors-global-variable). Why do you think that ‘atexit’ is a better choice?

As I understand ‘atexit’ solution would be target dependent (‘__cxa_atexit’ on Linux and ‘atexit’ on Windows) whereas @llvm.global_ctors/dtors variables offer similar and platform neutral functionality (http://llvm.org/docs/LangRef.html#the-llvm-global-ctors-global-variable). Why do you think that ‘atexit’ is a better choice?

Because it does not work on Windows, better to have portable solution, if possible.

clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
73
  1. Markt it const.
  2. This still is not the best solution, since size_t not necessarily has the pointer size. I don't know if there is a better solution. @hfinkel? If this is the best, why not just to use getIntPtrType(C)?
136

Use \\\ style for comments here

172

Use ArrayRef instead of const SmallVectorImpl &

331

Use std:string instead of auto

332

Also, better to specify type expplicitly.

As I understand ‘atexit’ solution would be target dependent (‘__cxa_atexit’ on Linux and ‘atexit’ on Windows) whereas @llvm.global_ctors/dtors variables offer similar and platform neutral functionality (http://llvm.org/docs/LangRef.html#the-llvm-global-ctors-global-variable). Why do you think that ‘atexit’ is a better choice?

Because it does not work on Windows, better to have portable solution, if possible.

Is there a bug # ?

As I understand ‘atexit’ solution would be target dependent (‘__cxa_atexit’ on Linux and ‘atexit’ on Windows) whereas @llvm.global_ctors/dtors variables offer similar and platform neutral functionality (http://llvm.org/docs/LangRef.html#the-llvm-global-ctors-global-variable). Why do you think that ‘atexit’ is a better choice?

Because it does not work on Windows, better to have portable solution, if possible.

Is there a bug # ?

@vzakhari?

As I understand ‘atexit’ solution would be target dependent (‘__cxa_atexit’ on Linux and ‘atexit’ on Windows) whereas @llvm.global_ctors/dtors variables offer similar and platform neutral functionality (http://llvm.org/docs/LangRef.html#the-llvm-global-ctors-global-variable). Why do you think that ‘atexit’ is a better choice?

Because it does not work on Windows, better to have portable solution, if possible.

Is there a bug # ?

@vzakhari?

I do not have bug #, but the issue was introduced with the following commit:
commit f803b23879d9e1d9415ec1875713534dcc203df5
Author: Reid Kleckner <rnk@google.com>
Date: Fri Sep 7 23:07:55 2018 +0000

[COFF] Implement llvm.global_ctors priorities for MSVC COFF targets

Summary:
MSVC and LLD sort sections ASCII-betically, so we need to use section
names that sort between .CRT$XCA (the start) and .CRT$XCU (the default
priority).

In the general case, use .CRT$XCT12345 as the section name, and let the
linker sort the zero-padded digits.

Users with low priorities typically want to initialize as early as
possible, so use .CRT$XCA00199 for prioties less than 200. This number
is arbitrary.

Implements PR38552.

Reviewers: majnemer, mstorsjo

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D51820

llvm-svn: 341727

The destructors are still in .CRT$XT for default priority (65535) now, but for non-default priority they will go into .CRT$XC. I will upload a fixing patch with a LIT test shortly.

This clang-offload-wrapper commit will work properly, if we use default priority for the destructors.

As I understand ‘atexit’ solution would be target dependent (‘__cxa_atexit’ on Linux and ‘atexit’ on Windows) whereas @llvm.global_ctors/dtors variables offer similar and platform neutral functionality (http://llvm.org/docs/LangRef.html#the-llvm-global-ctors-global-variable). Why do you think that ‘atexit’ is a better choice?

Because it does not work on Windows, better to have portable solution, if possible.

Is there a bug # ?

@vzakhari?

I do not have bug #, but the issue was introduced with the following commit:
commit f803b23879d9e1d9415ec1875713534dcc203df5
Author: Reid Kleckner <rnk@google.com>
Date: Fri Sep 7 23:07:55 2018 +0000

[COFF] Implement llvm.global_ctors priorities for MSVC COFF targets
 
Summary:
MSVC and LLD sort sections ASCII-betically, so we need to use section
names that sort between .CRT$XCA (the start) and .CRT$XCU (the default
priority).
 
In the general case, use .CRT$XCT12345 as the section name, and let the
linker sort the zero-padded digits.
 
Users with low priorities typically want to initialize as early as
possible, so use .CRT$XCA00199 for prioties less than 200. This number
is arbitrary.
 
Implements PR38552.
 
Reviewers: majnemer, mstorsjo
 
Subscribers: hiraditya, llvm-commits
 
Differential Revision: https://reviews.llvm.org/D51820
 
llvm-svn: 341727

The destructors are still in .CRT$XT for default priority (65535) now, but for non-default priority they will go into .CRT$XC. I will upload a fixing patch with a LIT test shortly.

This clang-offload-wrapper commit will work properly, if we use default priority for the destructors.

'IMHO' if there is a problem with lowering of LLVM IR constructs for some
particular targets, that problem must be resolved instead of adding workarounds.

As I understand ‘atexit’ solution would be target dependent (‘__cxa_atexit’ on Linux and ‘atexit’ on Windows) whereas @llvm.global_ctors/dtors variables offer similar and platform neutral functionality (http://llvm.org/docs/LangRef.html#the-llvm-global-ctors-global-variable). Why do you think that ‘atexit’ is a better choice?

Because it does not work on Windows, better to have portable solution, if possible.

Is there a bug # ?

@vzakhari?

I do not have bug #, but the issue was introduced with the following commit:
commit f803b23879d9e1d9415ec1875713534dcc203df5
Author: Reid Kleckner <rnk@google.com>
Date: Fri Sep 7 23:07:55 2018 +0000

[COFF] Implement llvm.global_ctors priorities for MSVC COFF targets
 
Summary:
MSVC and LLD sort sections ASCII-betically, so we need to use section
names that sort between .CRT$XCA (the start) and .CRT$XCU (the default
priority).
 
In the general case, use .CRT$XCT12345 as the section name, and let the
linker sort the zero-padded digits.
 
Users with low priorities typically want to initialize as early as
possible, so use .CRT$XCA00199 for prioties less than 200. This number
is arbitrary.
 
Implements PR38552.
 
Reviewers: majnemer, mstorsjo
 
Subscribers: hiraditya, llvm-commits
 
Differential Revision: https://reviews.llvm.org/D51820
 
llvm-svn: 341727

The destructors are still in .CRT$XT for default priority (65535) now, but for non-default priority they will go into .CRT$XC. I will upload a fixing patch with a LIT test shortly.

This clang-offload-wrapper commit will work properly, if we use default priority for the destructors.

'IMHO' if there is a problem with lowering of LLVM IR constructs for some
particular targets, that problem must be resolved instead of adding workarounds.

I completely agree with you! I am testing the patch for destructors.

sdmitriev updated this revision to Diff 215728.Aug 16 2019, 6:05 PM
sdmitriev marked 4 inline comments as done.

Addressed review comments.

sdmitriev marked 2 inline comments as done.Aug 16 2019, 6:09 PM
sdmitriev added inline comments.
clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
73

It cannot be const because of Type::getIntXXTy(LLVMContext &C) calls.

172

Done. And I have also changed MemoryBufferRef => ArrayRef (as you earlier suggested).

sdmitriev marked an inline comment as done.Aug 16 2019, 6:11 PM

As I understand ‘atexit’ solution would be target dependent (‘__cxa_atexit’ on Linux and ‘atexit’ on Windows) whereas @llvm.global_ctors/dtors variables offer similar and platform neutral functionality (http://llvm.org/docs/LangRef.html#the-llvm-global-ctors-global-variable). Why do you think that ‘atexit’ is a better choice?

Because it does not work on Windows, better to have portable solution, if possible.

@vzakhari has already committed his fix for llvm.global_dtors (https://reviews.llvm.org/D66373), so I assume use of llvm.global_dtors in this patch would no longer cause problems on Windows.

Looks like there will be no more comments. If so, I will update the first part https://reviews.llvm.org/D65130 which adds clang-offload-wrapper tool with the latest changes.

sdmitriev added a reviewer: grokos.

Rebase

I'm not sure copying the crtbegin/crtend mechanism from the early days of C runtime is ideal. Since the data is stored in a common section anyway, please could we rename it to __omp_offloading_entries in which case the linker will provide start/end symbols automatically? That removes the two object files and the link order dependency which is a hazard to bitcode libraries.

OpenMP linker script is known to cause problems for gold and lld linkers on Linux and it will also cause problems for Windows enabling in future

What are the known problems with the linker script? I'm wondering if they can be resolved without the overhead of introducing a new tool.

OpenMP linker script is known to cause problems for gold and lld linkers on Linux and it will also cause problems for Windows enabling in future

What are the known problems with the linker script? I'm wondering if they can be resolved without the overhead of introducing a new tool.

They just do not support linker script. And, thus, cannot be used for offloading. Only ld supports it.

OpenMP linker script is known to cause problems for gold and lld linkers on Linux and it will also cause problems for Windows enabling in future

What are the known problems with the linker script? I'm wondering if they can be resolved without the overhead of introducing a new tool.

They just do not support linker script. And, thus, cannot be used for offloading. Only ld supports it.

In what respect? I've used linker scripts with both gold and lld, and both instances of --help text claim to support them. In the case of lld, a very complicated script hit a few internal errors, but I believe they've all been fixed since.

OpenMP linker script is known to cause problems for gold and lld linkers on Linux and it will also cause problems for Windows enabling in future

What are the known problems with the linker script? I'm wondering if they can be resolved without the overhead of introducing a new tool.

They just do not support linker script. And, thus, cannot be used for offloading. Only ld supports it.

In what respect? I've used linker scripts with both gold and lld, and both instances of --help text claim to support them. In the case of lld, a very complicated script hit a few internal errors, but I believe they've all been fixed since.

Hmm, I tried it with gold some time ago and it just did not work for me. The linking failed with diagnostics that some of the commands in the script are unknown.

JonChesterfield added a comment.EditedSep 11 2019, 12:33 PM

OpenMP linker script is known to cause problems for gold and lld linkers on Linux and it will also cause problems for Windows enabling in future

What are the known problems with the linker script? I'm wondering if they can be resolved without the overhead of introducing a new tool.

They just do not support linker script. And, thus, cannot be used for offloading. Only ld supports it.

In what respect? I've used linker scripts with both gold and lld, and both instances of --help text claim to support them. In the case of lld, a very complicated script hit a few internal errors, but I believe they've all been fixed since.

Hmm, I tried it with gold some time ago and it just did not work for me. The linking failed with diagnostics that some of the commands in the script are unknown.

The problem turns out to be the 'insert before' statement. ld and lld support it, gold does not. According to https://bugzilla.redhat.com/show_bug.cgi?id=927573, the recommended workaround is essentially that implemented in this differential. See also https://sourceware.org/bugzilla/show_bug.cgi?id=15373.

OpenMP linker script is known to cause problems for gold and lld linkers on Linux and it will also cause problems for Windows enabling in future

What are the known problems with the linker script? I'm wondering if they can be resolved without the overhead of introducing a new tool.

They just do not support linker script. And, thus, cannot be used for offloading. Only ld supports it.

In what respect? I've used linker scripts with both gold and lld, and both instances of --help text claim to support them. In the case of lld, a very complicated script hit a few internal errors, but I believe they've all been fixed since.

Hmm, I tried it with gold some time ago and it just did not work for me. The linking failed with diagnostics that some of the commands in the script are unknown.

The problem turns out to be the 'insert before' statement. ld and lld support it, gold does not. According to https://bugzilla.redhat.com/show_bug.cgi?id=927573, the recommended workaround is essentially that implemented in this differential. See also https://sourceware.org/bugzilla/show_bug.cgi?id=15373.

A small example that I presented on the OpenMP multi company meeting earlier:

bash-4.2$ cat foo.c
#include <stdio.h>

int main() {
  int X = 0;

#pragma omp target map(tofrom: X)
  X += 3;

  printf("X = %d\n", X);
  return 0;
}

bash-4.2$ clang -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu -fuse-ld=gold foo.c
/usr/bin/ld.gold: error: /tmp/a-c699cd.lk:25:8: syntax error, unexpected STRING
/usr/bin/ld.gold: fatal error: unable to parse script file /tmp/a-c699cd.lk
clang-10: error: linker command failed with exit code 1 (use -v to see invocation)
bash-4.2$ clang -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu -fuse-ld=lld foo.c
ld.lld: error: unable to INSERT AFTER/BEFORE .data: section not defined
clang-10: error: linker command failed with exit code 1 (use -v to see invocation)
bash-4.2$

Also OpenMP linker script will obviously cause problems on Windows once we start enabling offload on Windows.

I'm not sure copying the crtbegin/crtend mechanism from the early days of C runtime is ideal. Since the data is stored in a common section anyway, please could we rename it to __omp_offloading_entries in which case the linker will provide start/end symbols automatically?

Well, I never said that it is an ideal solution, but it is a known mechanism that works well in many cases and can also be reused for the offloading entry table.
I do not fully understand your suggestion for renaming entries section, how it will help with providing start/end symbols for the entries. Can you please provide more details?

JonChesterfield added a comment.EditedSep 11 2019, 2:01 PM

I'm not sure copying the crtbegin/crtend mechanism from the early days of C runtime is ideal. Since the data is stored in a common section anyway, please could we rename it to __omp_offloading_entries in which case the linker will provide start/end symbols automatically?

Well, I never said that it is an ideal solution, but it is a known mechanism that works well in many cases and can also be reused for the offloading entry table.
I do not fully understand your suggestion for renaming entries section, how it will help with providing start/end symbols for the entries. Can you please provide more details?

Given a custom elf section with a C identifier as a name, the linker will provide definitions of __start_name/__stop_name to satisfy unresolved symbols. I don't believe this occurs if the section name is not a C identifier, e.g. contains a period. So unless I've misinterpreted the purpose of the two object files, they can be removed in exchange for renaming the section.

I'm not sure copying the crtbegin/crtend mechanism from the early days of C runtime is ideal. Since the data is stored in a common section anyway, please could we rename it to __omp_offloading_entries in which case the linker will provide start/end symbols automatically?

Well, I never said that it is an ideal solution, but it is a known mechanism that works well in many cases and can also be reused for the offloading entry table.
I do not fully understand your suggestion for renaming entries section, how it will help with providing start/end symbols for the entries. Can you please provide more details?

Given a custom elf section with a C identifier as a name, the linker will provide definitions of __start_name/__stop_name to satisfy unresolved symbols. I don't believe this occurs if the section name is not a C identifier, e.g. contains a period. So unless I've misinterpreted the purpose of the two object files, they can be removed in exchange for renaming the section.

Hm, I was not aware of this Linux linker feature, thanks a lot for the explanation! I see only one problem with using it as a replacement for the begin/end objects – it looks like __start_name/__stop_name symbols are created with default visibility instead of hidden. I guess it will cause problems for offload programs that use shared libraries because DSO’s __start_name/__stop_name symbols will be preempted by the executable’s symbols and that is not what we want. Is there any way to change this behavior?

As for the Windows support, you are right, __omp_offloading_entries_begin/__omp_offloading_entries_end symbols can be defined in the wrapper bit-code file with a help of the sections grouping (https://docs.microsoft.com/en-us/windows/win32/Debug/pe-format#grouped-sections-object-only). We were going to add this code to the wrapper tool later while adding Windows support.

Hm, I was not aware of this Linux linker feature, thanks a lot for the explanation! I see only one problem with using it as a replacement for the begin/end objects – it looks like __start_name/__stop_name symbols are created with default visibility instead of hidden. I guess it will cause problems for offload programs that use shared libraries because DSO’s __start_name/__stop_name symbols will be preempted by the executable’s symbols and that is not what we want. Is there any way to change this behavior?

Declaring the symbol as __attribute__((__visibility__("hidden"))) just works as far as I can tell. The linker still provides the right definition, objdump says it's hidden.

I'm on board with getting rid of the linker script. Gold's limited support for that seems conclusive.

I believe the current script does two things:
1/ takes a binary and embeds it in a section named .omp_offloading.amdgcn-amd-amdhsa
2/ provides start/end symbols for that section and for .omp_offloading.entries.

2/ is discussed above.
1/ can be implemented as a call to (llvm-)objcopy

If binary is used as the value for --input-target, the input file will be embedded as a data section in an ELF relocatable object, with symbols _binary_<file_name>_start, _binary_<file_name>_end, and _binary_<file_name>_size representing the start, end and size of the data, where <file_name> is the path of the input file as specified on the command line with non-alphanumeric characters converted to _.

I think dropping the linker script means that cmake will need to invoke an extra executable. As far as I can see, that tool can be objcopy instead of clang-offload-wrapper.

Does this diff mix getting rid of the linker script with other changes? E.g. it looks like the metadata generation is moving from clang to the new tool, but that seems orthogonal to dropping the linker script.

I'm on board with getting rid of the linker script. Gold's limited support for that seems conclusive.

I believe the current script does two things:
1/ takes a binary and embeds it in a section named .omp_offloading.amdgcn-amd-amdhsa
2/ provides start/end symbols for that section and for .omp_offloading.entries.

2/ is discussed above.
1/ can be implemented as a call to (llvm-)objcopy

If binary is used as the value for --input-target, the input file will be embedded as a data section in an ELF relocatable object, with symbols _binary_<file_name>_start, _binary_<file_name>_end, and _binary_<file_name>_size representing the start, end and size of the data, where <file_name> is the path of the input file as specified on the command line with non-alphanumeric characters converted to _.

I think dropping the linker script means that cmake will need to invoke an extra executable. As far as I can see, that tool can be objcopy instead of clang-offload-wrapper.

Does this diff mix getting rid of the linker script with other changes? E.g. it looks like the metadata generation is moving from clang to the new tool, but that seems orthogonal to dropping the linker script.

Metadata is still generated by the clang, there are no changes in this area. What is moving to a wrapper tool is the generation of the offload registration code. Let me just attach the slides that I presented on the inter company meeting were the proposal was discussed. It'll probably answer most of your questions.

Does this diff mix getting rid of the linker script with other changes? E.g. it looks like the metadata generation is moving from clang to the new tool, but that seems orthogonal to dropping the linker script.

Metadata is still generated by the clang, there are no changes in this area. What is moving to a wrapper tool is the generation of the offload registration code. Let me just attach the slides that I presented on the inter company meeting were the proposal was discussed. It'll probably answer most of your questions.

It does indeed, thanks. I see the motivation for delaying offload registration code. I'm pretty sure that is indeed orthogonal to removing the linker script.

How would you feel about using objcopy to embed the device binary?

Does this diff mix getting rid of the linker script with other changes? E.g. it looks like the metadata generation is moving from clang to the new tool, but that seems orthogonal to dropping the linker script.

Metadata is still generated by the clang, there are no changes in this area. What is moving to a wrapper tool is the generation of the offload registration code. Let me just attach the slides that I presented on the inter company meeting were the proposal was discussed. It'll probably answer most of your questions.

It does indeed, thanks. I see the motivation for delaying offload registration code. I'm pretty sure that is indeed orthogonal to removing the linker script.

How would you feel about using objcopy to embed the device binary?

I see some problems with using llvm-objcopy for that. First issue is that symbols created by llvm-objcopy for embedded data depend on the input file name. As you know these symbols are referenced from the offload registration code that is currently added to an object by the clang at compile time. I not sure how you can guarantee that symbol names will match. And another, more important problem is that it won't work on Windows because llvm-objcopy produces ELF object according to the description.

Anyway I am going to change entries section name to "omp_offloading_entries", remove omptargetbegin.o/omptargetend.o and upload the revised patch.

I see some problems with using llvm-objcopy for that. First issue is that symbols created by llvm-objcopy for embedded data depend on the input file name. As you know these symbols are referenced from the offload registration code that is currently added to an object by the clang at compile time. I not sure how you can guarantee that symbol names will match.

That seems solvable by renaming the input file / passing a string to clang.

And another, more important problem is that it won't work on Windows because llvm-objcopy produces ELF object according to the description.

objcopy works with coff in the meantime, and we already need a bunch of unix tools to build llvm on windows.

Anyway I am going to change entries section name to "omp_offloading_entries", remove omptargetbegin.o/omptargetend.o and upload the revised patch.

Thanks!

sdmitriev updated this revision to Diff 220039.Sep 12 2019, 9:19 PM
sdmitriev edited the summary of this revision. (Show Details)
  • Changed offload entry section name to “omp_offloading_entries”
  • Wrapper bit-code now uses start_ omp_offloading_entries/stop_ omp_offloading_entries symbols for accessing offload entry table assuming that these symbols are defined by the linker
  • Removed omptargetbegin.o/omptargetend.o objects

This LGTM. I'm happy that this is a design improvement over the current scheme. @JonChesterfield , @ABataev , any further comments?

This LGTM. I'm happy that this is a design improvement over the current scheme. @JonChesterfield , @ABataev , any further comments?

This patch mixes two concerns.
1/ Remove the linker script
2/ Change generation of offload registration code

These should be separate patches. I think the linker script removal would then be uncontentious.

It'll be easier to consider the offload registration changes without the linker script changes. That's a more complicated design space. In particular, this change is motivated by supporting additional platforms, and I don't see how offload registration is related to that.

ABataev added inline comments.Sep 25 2019, 6:50 AM
clang/lib/Driver/Driver.cpp
3124

User real type instead of auto *

3433

auto *->real type

clang/lib/Driver/ToolChains/Clang.cpp
6482

auto->real type

6494

auto->real type

clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
73

Maybe better to get the size of size_t type from command line option rather than rely on some non-stable assumptions?

204

Just ArrayRef<char>, no need to make it const ArrayRef<char> &

sdmitriev updated this revision to Diff 221847.Sep 25 2019, 3:31 PM
sdmitriev marked 3 inline comments as done.

I have rebased patch and addressed last Alexey’s comments. If there are no more comments, I propose to split this patch into 3 pieces

  1. Use linker for creating start/end symbols for the offload entry table. This part will also include renaming offload entry section name from “.omp.offloading_entries” to “omp_offloading_entries” since section name must be representable as a C identifier if we want linker to create start/stop symbols for it.
  2. Add offload wrapper tool and associated driver changes for invoking this tool for wrapping device binaries. Wrapper bit-code file at this step will contain device binaries only (no offload registration code). Technically OpenMP linker script will be eliminated at this step.
  3. And the last piece will move offload registration code generation from clang to the wrapper tool.

@hfinkel, @ABataev, @JonChesterfield , are you Ok with this?

The three way split looks great, thanks.

The three way split looks great, thanks.

Makes sense to me.

I have uploaded the first part to https://reviews.llvm.org/D68070

The second part was uploaded to https://reviews.llvm.org/D68166.

I have uploaded the last part to https://reviews.llvm.org/D68746

sdmitriev abandoned this revision.Oct 16 2019, 1:59 PM

All three parts have been committed, so I am abandoning the original patch.