This is an archive of the discontinued LLVM Phabricator instance.

[clang-offload-wrapper] Add standard notes for ELF offload images
ClosedPublic

Authored by vzakhari on Mar 29 2021, 4:13 PM.

Details

Summary

This patch is a piece of D99360.

The patch adds ELF notes into SHT_NOTE sections of ELF offload images passed to clang-offload-wrapper.

The new notes use a null-terminated "LLVMOMPOFFLOAD" note name. There are currently three types of notes:

VERSION: a string (not null-terminated) representing the ELF offload image structure. The current version '1.0' does not put any restrictions on the structure of the image. If we ever need to come up with a common structure for ELF offload images (e.g. to be able to analyze the images in libomptarget in some standard way), then we will introduce new versions.
PRODUCER: a vendor specific name of the producing toolchain. Upstream LLVM uses "LLVM" (not null-terminated).
PRODUCER_VERSION: a vendor specific version of the producing toolchain. Upstream LLVM uses LLVM_VERSION_STRING with optional <space> LLVM_REVISION.

All three notes are not mandatory currently.

Diff Detail

Event Timeline

vzakhari created this revision.Mar 29 2021, 4:13 PM
vzakhari requested review of this revision.Mar 29 2021, 4:13 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
vzakhari updated this revision to Diff 334282.Mar 30 2021, 3:22 PM

I am not sure what is wrong with the uploaded ELF files. The test works locally. Let's try it with a thinner YAML template.

vzakhari updated this revision to Diff 334292.Mar 30 2021, 3:59 PM

Updated revision fixes BUILD_SHARED_LIBS build.

grokos accepted this revision.Apr 6 2021, 7:19 PM

Change looks good, so it's accepted on my end. I'll let the other reviewers have a look and post their comments. Please do not commit until we have reached an agreement for all 4 patches together (D99551, D99552, D99553, D99612).

This revision is now accepted and ready to land.Apr 6 2021, 7:19 PM
vzakhari updated this revision to Diff 364193.Aug 4 2021, 11:10 AM
This revision was landed with ongoing or failed builds.Aug 16 2021, 1:20 PM
This revision was automatically updated to reflect the committed changes.

This patch broke offloading on my machine. If I compile a basic offloading application I get an invalid device image.

#include <omp.h>

int main() {
  int isHost;
#pragma omp target map(from:isHost)
  { isHost = omp_is_initial_device(); }
  return isHost;
}

Compiling and running with clang -fopenmp -fopenmp-targets=nvptx64 gives me the following results.

CUDA error: Error returned from cuModuleLoadDataEx
CUDA error: device kernel image is invalid 
Libomptarget error: Unable to generate entries table for device id 0.
Libomptarget error: Failed to init globals on device 0
Libomptarget error: Run with LIBOMPTARGET_INFO=4 to dump host-target pointer mappings.
Libomptarget fatal error 1: failure of target construct while offloading is mandatory

Reverting 93d08acaacec951dbb302f77eeae51974985b6b2 fixes it.

@jhuber6, thank you for reporting this. I do not have a properly setup CUDA system currently. Can you please invoke clang-offload-wrapper with -save-temps and send the temporary files to me?

@jhuber6, thank you for reporting this. I do not have a properly setup CUDA system currently. Can you please invoke clang-offload-wrapper with -save-temps and send the temporary files to me?

I sent it to you, any luck or should we revert this upstream for the time being.

@jhuber6, thank you for reporting this. I do not have a properly setup CUDA system currently. Can you please invoke clang-offload-wrapper with -save-temps and send the temporary files to me?

I sent it to you, any luck or should we revert this upstream for the time being.

Unfortunately, I did not receive it. Can you please check if the mail was blocked on your side? Can you please try to archive it with a password and send it again?
I would like to prepare a patch to put this functionality under a switch that is off by default. I think we want to have this working eventually (and it actually works with x86_64 offload), so it will be easier to just flip a switch, when I figure out what CUDA API does not like about the modified ELF image. Does it sound appropriate to you?

@jhuber6, thank you for reporting this. I do not have a properly setup CUDA system currently. Can you please invoke clang-offload-wrapper with -save-temps and send the temporary files to me?

I sent it to you, any luck or should we revert this upstream for the time being.

Unfortunately, I did not receive it. Can you please check if the mail was blocked on your side? Can you please try to archive it with a password and send it again?
I would like to prepare a patch to put this functionality under a switch that is off by default. I think we want to have this working eventually (and it actually works with x86_64 offload), so it will be easier to just flip a switch, when I figure out what CUDA API does not like about the modified ELF image. Does it sound appropriate to you?

I put it in a .tar.gz file, I think your domain is blocking it. Anything else I can use?

@jhuber6, thank you for reporting this. I do not have a properly setup CUDA system currently. Can you please invoke clang-offload-wrapper with -save-temps and send the temporary files to me?

I sent it to you, any luck or should we revert this upstream for the time being.

Unfortunately, I did not receive it. Can you please check if the mail was blocked on your side? Can you please try to archive it with a password and send it again?
I would like to prepare a patch to put this functionality under a switch that is off by default. I think we want to have this working eventually (and it actually works with x86_64 offload), so it will be easier to just flip a switch, when I figure out what CUDA API does not like about the modified ELF image. Does it sound appropriate to you?

I put it in a .tar.gz file, I think your domain is blocking it. Anything else I can use?

Can you please try sharing it via Google drive?

Nvptx broken here too, amdgpu is fine. I'm guessing one of the cuda tools does some overly aggressive input validation that we're running afoul of.

There was a discussion about this on the call today - plan was to put it behind a disabled boolean argument while fixing to avoid downstream churn. Sadly the original patch was not authored with that in mind. I suggest if we can't get that patch together asap we revert this and fix it offline (even if the fix is adding said flag)

Nvptx broken here too, amdgpu is fine. I'm guessing one of the cuda tools does some overly aggressive input validation that we're running afoul of.

There was a discussion about this on the call today - plan was to put it behind a disabled boolean argument while fixing to avoid downstream churn. Sadly the original patch was not authored with that in mind. I suggest if we can't get that patch together asap we revert this and fix it offline (even if the fix is adding said flag)

I am about to merge D108246 that is adding the switch.

It's a hack, but D108303 will unblock nvptx offloading. Alternative to reverting. Suggest we go with that then revisit in a couple of weeks.