Page MenuHomePhabricator

[libomptarget] Read standard notes for ELF offload images
AbandonedPublic

Authored by vzakhari on Mar 29 2021, 4:19 PM.

Details

Summary

This patch is a piece of D99360.

The patch prints ELF notes in SHT/PT_NOTE sections/segments of the ELF offload images. The implementation is based on LLVM ELFObjectFile.

Debug output from the plugins would look like this:

TARGET Common ELF --> LLVMOMPOFFLOAD ELF note NT_LLVM_OPENMP_OFFLOAD_VERSION with value: '1.0'
TARGET Common ELF --> LLVMOMPOFFLOAD ELF note NT_LLVM_OPENMP_OFFLOAD_PRODUCER with value: 'LLVM'
TARGET Common ELF --> LLVMOMPOFFLOAD ELF note NT_LLVM_OPENMP_OFFLOAD_PRODUCER_VERSION with value: '13.0.0git 9f8975163c75b1f9f736f9a8e0a60e29ac062754'

Diff Detail

Event Timeline

vzakhari created this revision.Mar 29 2021, 4:19 PM
vzakhari requested review of this revision.Mar 29 2021, 4:19 PM

That's a lot of code. Mostly straightforward, though the raw new/delete makes me slightly nervous. I think it's almost all here to abstract over libelf or llvm's elf library with a higher level interface. I haven't read through it carefully yet.

There is a profiling build that depends on llvm. Enabling that did break some builds. It might be worth reviewing whether to treat llvm binaries as a hard requirement for libomptarget.

Aside from that, if we want to continue on this path. I suspect we don't need a third interface. In particular, we can probably implement llvm's libelf interface in terms of elf.h or vice versa. I'd lean towards converting the plugins to use llvm libelf's interface and providing a fallback implementation of the (few) non-header pieces that are used by the plugins. That gives zero bug surface for builds using llvm's libelf and a clear path to dropping the gnu libelf dependency.

The other way round, implementing the subset of libelf that we use in terms of llvm (or from scratch, elf isn't very complicated), is simpler today as the plugins don't have to change, but kind of ties us into an abstraction layer that looks like debt in the medium term.

(tagging myself as reviewer because I'd like to drop the libelf dependency from the amdgpu plugin, and this is a path by which that could be done)

That's a lot of code. Mostly straightforward, though the raw new/delete makes me slightly nervous. I think it's almost all here to abstract over libelf or llvm's elf library with a higher level interface. I haven't read through it carefully yet.

This is correct. This is all here just to abstract ELF reading with a new interface.

There is a profiling build that depends on llvm. Enabling that did break some builds. It might be worth reviewing whether to treat llvm binaries as a hard requirement for libomptarget.

The LLVM ELF implementation is disabled by default for upstream, so it should not cause any issues with LLVM component libraries. I have tested the LLVM ELF implementation downstream, and I believe it is working now (I will upload my latest patch shortly).

Aside from that, if we want to continue on this path. I suspect we don't need a third interface. In particular, we can probably implement llvm's libelf interface in terms of elf.h or vice versa. I'd lean towards converting the plugins to use llvm libelf's interface and providing a fallback implementation of the (few) non-header pieces that are used by the plugins. That gives zero bug surface for builds using llvm's libelf and a clear path to dropping the gnu libelf dependency.

Implementing GNU libelf interface as-is via LLVM ELF is problematic because this will require const casting of constant references/pointers exposed by LLVM ELF. Another side effect of linking LLVM component libraries to the plugins is the binary size. The plugins grow in size quite significantly, and I do not know whether this is acceptable for everyone.

Can you please clarify which non-header pieces you are referring to?

The other way round, implementing the subset of libelf that we use in terms of llvm (or from scratch, elf isn't very complicated), is simpler today as the plugins don't have to change, but kind of ties us into an abstraction layer that looks like debt in the medium term.

(tagging myself as reviewer because I'd like to drop the libelf dependency from the amdgpu plugin, and this is a path by which that could be done)

FWIW, generic-elf-64bit code includes link.h (for link_map declaration), which, in turn, includes system elf.h. This will conflict with LLVM ELF, if we try to use it directly in generic-elf-64bit code.

I assume getting rid of GNU libelf dependency in all plugins will take some time. Moreover, not all plugin developers may be willing to switch to LLVM ELF implementaion. I wonder if we can start using the new interface, and then gradually switch to LLVM ELF and deprecate the new interface, if it is convenient for all.

vzakhari updated this revision to Diff 334243.Mar 30 2021, 11:56 AM

Implementing GNU libelf interface as-is via LLVM ELF is problematic because this will require const casting of constant references/pointers exposed by LLVM ELF.

Ah, yes. GNU uses char* everywhere. We could implement that using the interfaces that take const char* by malloc&memcpy&free, but that is fairly horrible. We could implement the llvm elf interface in terms of gnu by casting away the const and hoping GNU doesn't actually mutate the arguments, but that seems hazardous too. Overall this seems to be digging in the wrong direction - we are taking on a lot of debt in order to preserve the ability to use libelf, which is only of use to toolchains that want to use llvm's libomptarget without compiling llvm.

Another side effect of linking LLVM component libraries to the plugins is the binary size. The plugins grow in size quite significantly, and I do not know whether this is acceptable for everyone.

This doesn't seem necessary. Either we dynamically link the LLVMObject and the footprint will hopefully be comparable to using libelf, or we statically link it and let the linker deadstrip.

openmp/libomptarget/plugins/common/elf_common/elf_common.cpp
54

Missed this at first glance. This patch is a functional change - it abstracts over the elf interface, and also does some new stuff with said interface. It would probably be easier/safer to review when split down that line.

openmp/libomptarget/plugins/common/elf_common/elf_constants.h
20 ↗(On Diff #334243)

This enum is presumably in llvm somewhere. We should include it from llvm (note that some other header-only files are already included from llvm, and the cmake ensures that it is available when building libomptarget). We may need to slightly refactor llvm first to make the numbers available without pulling it lots of other stuff.

Thank you for the comments, Jon!

Another side effect of linking LLVM component libraries to the plugins is the binary size. The plugins grow in size quite significantly, and I do not know whether this is acceptable for everyone.

This doesn't seem necessary. Either we dynamically link the LLVMObject and the footprint will hopefully be comparable to using libelf, or we statically link it and let the linker deadstrip.

I will rebuild it from scratch to see how big the difference is in the release build. I guess we do not care about the debug build (the code size hit is much bigger for the debug).

Just to reiterate, I believe we agreed that we should link LLVM ELF library statically, otherwise, it will have to become redistributable. This may be quite inconvenient for some users.

openmp/libomptarget/plugins/common/elf_common/elf_common.cpp
54

I can split it, but I wanted to have at least something functional in this patch.

openmp/libomptarget/plugins/common/elf_common/elf_constants.h
20 ↗(On Diff #334243)

Yes, these constants are defined in D99612.

elf_constants.h is included for both libelf and LLVM ELF implementations. If I include the LLVM ELF header file here it will conflict with libelf header files in libeld implementation. I can get rid of these definitions in two cases:

  1. System elf.h gets these definitions, so that I can include two different headers defining these constants for two different implementations.
  2. We get rid of libelf dependency altogether.

Right, we don't really have minutes for the weekly call.

I suggested we link the libraries dynamically when clang is linked dynamically and statically when clang is linked statically. Someone pointed out that libomptarget gets distributed with applications that haven't necessarily distributed clang, so unconditionally statically linking makes that use case much easier. That seems right to me, at least as the default cmake thing.

I've emailed openmp-dev. No objections yet. A suggestion for that is to dig out the profiler diff that wanted to unconditionally link an llvm library and reach out to the people who objected then, or to create a diff making the llvm library unconditionally linked in the context of the profiler and see if that attracts objections.

Once the gates are open (i.e. the profiler's required libs are in), depending on llvmobject et al are unlikely to cause further problems. One hopes. It's possible we'll want to link libomptarget itself against the libraries and have the plugins depend on libomptarget for elf munging, just so we don't have N+1 copies of llvm for N plugins. Unless we can statically link (some of) the plugins.

Hi Jon,
Regarding the LLVM component libraries linking into the libomptarget/plugins, I guess the bigger change is that OpenMP offload runtime will not build in out-of-tree mode. Should we actually start with disabling the out-of-tree builds, and see how people react to this?

Hi Jon,
Regarding the LLVM component libraries linking into the libomptarget/plugins, I guess the bigger change is that OpenMP offload runtime will not build in out-of-tree mode. Should we actually start with disabling the out-of-tree builds, and see how people react to this?

If I understand you correctly, we took that step some time ago. If you check the libomptarget cmake there's some logic that tries to find the rest of llvm and bails if it isn't available. That is currently used to allow including llvm headers - at least the amdgpu plugin does so to keep enums consistent between clang and the plugin. The most recent attempt to unconditionally link a llvm binary is discussed in the comments of D93055

Hi Jon,
Regarding the LLVM component libraries linking into the libomptarget/plugins, I guess the bigger change is that OpenMP offload runtime will not build in out-of-tree mode. Should we actually start with disabling the out-of-tree builds, and see how people react to this?

If I understand you correctly, we took that step some time ago. If you check the libomptarget cmake there's some logic that tries to find the rest of llvm and bails if it isn't available. That is currently used to allow including llvm headers - at least the amdgpu plugin does so to keep enums consistent between clang and the plugin. The most recent attempt to unconditionally link a llvm binary is discussed in the comments of D93055

Note that in D93055 there is still a path for OPENMP_STANDALONE_BUILD. I believe we have not yet abandoned the out-of-tree builds (https://openmp.llvm.org/README.txt). And it seems like a big change to me.

Note that in D93055 there is still a path for OPENMP_STANDALONE_BUILD. I believe we have not yet abandoned the out-of-tree builds (https://openmp.llvm.org/README.txt). And it seems like a big change to me.

That is true. I'm under the impression that a 'standalone build' means one without LLVM, in which case we would hit

# LLVM source tree is required at build time for libomptarget                                                                                                                                                      
if (NOT LIBOMPTARGET_LLVM_INCLUDE_DIRS)
  message(FATAL_ERROR "Missing definition for LIBOMPTARGET_LLVM_INCLUDE_DIRS")
endif()

in which case the branch in D93055 for standalone build would be dead, but 'standalone build' might not mean what I think it does.

The current status, as I understand it, is libomp can build without llvm available, but libomptarget requires a copy of llvm on disk and optionally a copy of the llvm support library.

vzakhari added a comment.EditedApr 13 2021, 11:00 PM

The current status, as I understand it, is libomp can build without llvm available, but libomptarget requires a copy of llvm on disk and optionally a copy of the llvm support library.

You are right, libomptarget requires a copy of llvm on disk (for the header files). At the same time just a copy of the llvm support library is not enough to use it, e.g. for profiling or something else. You may see that D93055 uses add_llvm_library() that is simply not available in an out-of-tree build. In this change-set I am using llvm_update_compile_flags(), which also requires true in-tree build. Maybe we can make these macros available by using find_package(LLVM), but I am not sure about that.

So I believe getting rid of libelf dependency implies strong requirement for a true in-tree build, and this is a big change for downstream projects. I am not sure how I should approach this major shift.

JonChesterfield added a comment.EditedApr 26 2021, 4:07 AM

Hitting consensus on unconditionally using the llvm libraries seems to be taking longer than I hoped. Perhaps this is therefore the way to go in the mean time? There's always the hope that we can delete this abstraction layer later.

edit: an alternative might be to move some of the llvm elf handling into header-only style

This patch still mixes the large but routine elf library shim with the note reading, those should be separate. Mostly for ease of review and testing - swapping out the elf library should leave everything still working, and if it doesn't, that'll be easier to track down in the smaller delta.

edit2: looked through the code again. Would it be a disaster to use the llvm native interface, with a fallback that goes through libelf that does a lot of largeish copies to handle the const char* vs char* distinction? We're not totally sure who needs the no-llvm-libraries build, so can perhaps assume they'll be OK with some performance overhead, and if they aren't, they'll speak up and we can reinstate something like this.

edit3: in all this I'm assuming llvm elf reports errors up the stack instead of aborting. That may not be a safe assumption. The runtime is set up for soft fail and recover, clang is not.

vzakhari updated this revision to Diff 364194.Aug 4 2021, 11:12 AM
vzakhari edited the summary of this revision. (Show Details)

ping...
Can someone please review the updated (simplified) patch?

JonChesterfield accepted this revision.EditedAug 9 2021, 1:29 PM

The commit message is out of date.

The implementation is very clear. Thank you for the ping, missed the patch update email.

Edit: appears I had some inline comments from a previous revision that I had not posted, please disregard them.

openmp/libomptarget/plugins/common/elf_common/elf_light.cpp
25 ↗(On Diff #334243)

stuff like this is likely to be available in a freestanding llvm header already

42 ↗(On Diff #334243)

I believe this to be correct for 64 bit elf. For example, from https://llvm.org/doxygen/BinaryFormat_2ELF_8h_source.html#l01659

using Elf64_Word = uint32_t;
// ...
// Node header for ELF64.
 struct Elf64_Nhdr {
   Elf64_Word n_namesz;
   Elf64_Word n_descsz;
   Elf64_Word n_type;
 };

I remember being surprised by Elf64_word being uint32_t last time I saw this.

This revision is now accepted and ready to land.Aug 9 2021, 1:29 PM

The commit message is out of date.

The implementation is very clear. Thank you for the ping, missed the patch update email.

Edit: appears I had some inline comments from a previous revision that I had not posted, please disregard them.

Thank you for the review, Jon!
I would like to clarify whether I should wait for more comments at tomorrow meeting. If there are no objections, I will start merging the patches tomorrow.

vzakhari updated this revision to Diff 366799.Aug 16 2021, 9:47 PM
vzakhari edited the summary of this revision. (Show Details)

In the latest update I added llvm-objcopy as a dependence for clang-offload-wrapper, because otherwise make check-libomptarget would fail.

vzakhari abandoned this revision.Fri, Aug 5, 11:50 AM

Not working on the project any more.

Herald added a project: Restricted Project. · View Herald TranscriptFri, Aug 5, 11:50 AM