OpenMP device offloading code generation produces a cubin file which is then integrated in the host binary using the host linker.
Details
- Reviewers
arpith-jacob caomhin carlo.bertolli ABataev Hahnfeld jlebar rnk hfinkel tstellar - Commits
- rG4cdba82ee0ca: [OpenMP] Integrate OpenMP target region cubin into host binary
rC310291: [OpenMP] Integrate OpenMP target region cubin into host binary
rL310291: [OpenMP] Integrate OpenMP target region cubin into host binary
Diff Detail
- Repository
- rL LLVM
Event Timeline
With your latest set of updates, I am not sure which, if any, patches you need me to take another look at. Unfortunately I don't have a ton of time for CUDA stuff these days, so where possible I'd prefer to shunt reviews over to hfinkel or chandlerc. But do let me know.
lib/Driver/ToolChains/Cuda.cpp | ||
---|---|---|
324 ↗ | (On Diff #93192) | This entire if block constructs a completely unrelated link line from the rest of the function. They should be separate functions. In fact, CudaToolChain::buildLinker() should probably return a different tool depending on whether you're using OpenMP or not, and this logic should be in a different tool. Maybe NVPTX::Linker should be called NVPTX::FatLink as well. |
388–394 ↗ | (On Diff #93192) | I don't think copy is an actual binary on Windows, it's a builtin cmd command. At least, that's what where copy suggests. clang-cl can also run on Linux, so you probably want to check #ifdef LLVM_ON_WIN32 for this. Honestly, I'd prefer it if you instead found a way to get the driver to rename the inputs right before executing nvlink. We have cross-platform utilities in Support for doing renames. You can probably make some kind of custom Command to do this. |
test/Driver/openmp-offload.c | ||
594 | In this case, it would be nicer if you arranged for ptxas to output to a .cubin file. | |
601 | Can you add a test that links two input ptxas .o files into a binary? You can use touch to make the .o files. |
lib/Driver/ToolChains/Cuda.cpp | ||
---|---|---|
319 ↗ | (On Diff #94475) | Extra whitespace. |
Use the rename() utility function of LLVM for renaming the PTXAS output before invoking NVLINK.
Avoid renaming by enabling PTXAS to generate an output file with the appropriate extension, in this case a cubin extension.
Change output of unbundler to produce cubin files when using OpenMP to offload to NVIDIA GPUs.
lib/Driver/ToolChains/Cuda.cpp | ||
---|---|---|
388 ↗ | (On Diff #97331) | Comment sentences should start with a capital letter (here and below). |
407 ↗ | (On Diff #97331) | Add period at the end of the sentence. |
409 ↗ | (On Diff #97331) | Remove this blank line. |
420 ↗ | (On Diff #97331) | Can you please explain how this works for libraries specified by file name (e.g. /path/to/foo.so or /path/to/foo.a) or why these should be treated differently? Maybe I'm misunderstanding what this check does. |
[Update regression tests] Add a test for propagating the compute capability to the OpenMP device offloading toolchain which targets NVIDIA GPUs.
This is a test for patch D34784 which is enabled by this patch.
Hi @gtbercea,
I couldn't reply to the email as cfe-commits didn't even register this commit somehow, so I'm replying here.
Unfortunately I had to revert this commit (r310291), + two others for a clean revert (r310300 and r310332) because it caused a test failure on macOS. This particular run line:
// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %t1.o %t2.o 2>&1 \ // RUN: | FileCheck -check-prefix=CHK-TWOCUBIN %s
Causes the following assertion failure:
assert(CachedResults.find(ActionTC) != CachedResults.end() && "Result does not exist??");
Here's a backtrace:
* frame #0: 0x00007fffbf3a2b2e libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fffbf4c72de libsystem_pthread.dylib`pthread_kill + 303 frame #2: 0x00007fffbf30041f libsystem_c.dylib`abort + 127 frame #3: 0x00007fffbf2c9f34 libsystem_c.dylib`__assert_rtn + 320 frame #4: 0x0000000103a1f2d1 clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fff5fbfe4e8, C=0x0000000111b11830, A=0x0000000111b11ed0, TC=0x0000000112819000, BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3418 frame #5: 0x0000000103a1cf11 clang`clang::driver::Driver::BuildJobsForAction(this=0x00007fff5fbfe4e8, C=0x0000000111b11830, A=0x0000000111b11ed0, TC=0x0000000112819000, BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210 frame #6: 0x0000000103a1dfb3 clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fff5fbfe4e8, C=0x0000000111b11830, A=0x0000000111b12130, TC=0x0000000112819000, BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3348 frame #7: 0x0000000103a1cf11 clang`clang::driver::Driver::BuildJobsForAction(this=0x00007fff5fbfe4e8, C=0x0000000111b11830, A=0x0000000111b121f0, TC=0x0000000112819000, BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210 frame #8: 0x0000000103a1db3e clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fff5fbfe4e8, C=0x0000000111b11830, A=0x0000000111b11bf0, TC=0x0000000112819000, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3310 frame #9: 0x0000000103a1cf11 clang`clang::driver::Driver::BuildJobsForAction(this=0x00007fff5fbfe4e8, C=0x0000000111b11830, A=0x0000000111b11bf0, TC=0x0000000112819000, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210 frame #10: 0x0000000103a0a5c2 clang`clang::driver::Driver::BuildJobs(this=0x00007fff5fbfe4e8, C=0x0000000111b11830) const at Driver.cpp:2843 frame #11: 0x0000000103a00b5c clang`clang::driver::Driver::BuildCompilation(this=0x00007fff5fbfe4e8, ArgList=ArrayRef<const char *> @ 0x00007fff5fbfc1e8) at Driver.cpp:746 frame #12: 0x0000000100005a52 clang`main(argc_=9, argv_=0x00007fff5fbff670) at driver.cpp:463 frame #13: 0x00007fffbf260c05 libdyld.dylib`start + 1
Could you please take a look?
Let me know if you need anything else,
Cheers,
Alex
Hi Alex,
I have a fix for the failing test. What's the easiest way to do this? Do I have to commit those patches again or can you push them back in and then I also push the fix?
Thanks,
--Doru
--
Great, thanks! I think that you can just revert my revert with the fix applied in one commit
Hi Alex,
I just commited the changes again.
Let me know if it still fails for you. I think the issue was actually that the test wasn't quite correct so I cleaned that up.
--Doru
Looks like it's still failing unfortunately: http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental_check/39182/console
The last RUN line in the new commit triggers the same assertion failure:
Assertion failed: (CachedResults.find(ActionTC) != CachedResults.end() && "Result does not exist??"), function BuildJobsForActionNoCache, file /Users/alex/bisect/llvm/tools/clang/lib/Driver/Driver.cpp, line 3419.
backtrace:
* frame #0: 0x00007fffbf3a2b2e libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fffbf4c72de libsystem_pthread.dylib`pthread_kill + 303 frame #2: 0x00007fffbf30041f libsystem_c.dylib`abort + 127 frame #3: 0x00007fffbf2c9f34 libsystem_c.dylib`__assert_rtn + 320 frame #4: 0x0000000103a1f311 clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fff5fbfe518, C=0x0000000111c01130, A=0x0000000111c017d0, TC=0x0000000113000000, BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3418 frame #5: 0x0000000103a1cf51 clang`clang::driver::Driver::BuildJobsForAction(this=0x00007fff5fbfe518, C=0x0000000111c01130, A=0x0000000111c017d0, TC=0x0000000113000000, BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210 frame #6: 0x0000000103a1dff3 clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fff5fbfe518, C=0x0000000111c01130, A=0x0000000111c01a30, TC=0x0000000113000000, BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3348 frame #7: 0x0000000103a1cf51 clang`clang::driver::Driver::BuildJobsForAction(this=0x00007fff5fbfe518, C=0x0000000111c01130, A=0x0000000111c01af0, TC=0x0000000113000000, BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210 frame #8: 0x0000000103a1db7e clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fff5fbfe518, C=0x0000000111c01130, A=0x0000000111c014f0, TC=0x0000000113000000, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3310 frame #9: 0x0000000103a1cf51 clang`clang::driver::Driver::BuildJobsForAction(this=0x00007fff5fbfe518, C=0x0000000111c01130, A=0x0000000111c014f0, TC=0x0000000113000000, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210 frame #10: 0x0000000103a0a602 clang`clang::driver::Driver::BuildJobs(this=0x00007fff5fbfe518, C=0x0000000111c01130) const at Driver.cpp:2843 frame #11: 0x0000000103a00b9c clang`clang::driver::Driver::BuildCompilation(this=0x00007fff5fbfe518, ArgList=ArrayRef<const char *> @ 0x00007fff5fbfc218) at Driver.cpp:746 frame #12: 0x0000000100005a92 clang`main(argc_=7, argv_=0x00007fff5fbff6a8) at driver.cpp:463 frame #13: 0x00007fffbf260c05 libdyld.dylib`start + 1 frame #14: 0x00007fffbf260c05 libdyld.dylib`start + 1
Hi Alex, I'm not sure why it's failing as I can't reproduce the error locally. Do you have access to a machine with the configuration the test uses?
The cached results map doesn't have the key:
(lldb) p CachedResults (std::__1::map<std::__1::pair<const clang::driver::Action *, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, clang::driver::InputInfo, std::__1::less<std::__1::pair<const clang::driver::Action *, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, std::__1::allocator<std::__1::pair<const std::__1::pair<const clang::driver::Action *, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, clang::driver::InputInfo> > >) $0 = size=8 { [0] = { first = { first = 0x0000000111c01320 second = "nvptx64-nvidia-cuda-host" } second = { Data = { Filename = 0x00007fff5fbff8f8 "/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp1.o" InputArg = 0x00007fff5fbff8f8 } Kind = Filename Act = 0x0000000111c01320 Type = TY_Object BaseInput = 0x00007fff5fbff8f8 "/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp1.o" } } [1] = { first = { first = 0x0000000111c01320 second = "x86_64-apple-darwin17.0.0-x86_64-host" } second = { Data = { Filename = 0x00007fff5fbff8f8 "/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp1.o" InputArg = 0x00007fff5fbff8f8 } Kind = Filename Act = 0x0000000111c01320 Type = TY_Object BaseInput = 0x00007fff5fbff8f8 "/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp1.o" } } [2] = { first = { first = 0x0000000111c01420 second = "nvptx64-nvidia-cuda-host" } second = { Data = { Filename = 0x00007fff5fbff949 "/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp2.o" InputArg = 0x00007fff5fbff949 } Kind = Filename Act = 0x0000000111c01420 Type = TY_Object BaseInput = 0x00007fff5fbff949 "/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp2.o" } } [3] = { first = { first = 0x0000000111c017d0 second = "nvptx64-nvidia-cuda-openmp" } second = { Data = { Filename = 0x0000000111c048b0 "/var/folders/sh/cpr85hld32qf79m8x7vd31bw0000gn/T/openmp-offload-e30496.o" InputArg = 0x0000000111c048b0 } Kind = Filename Act = 0x0000000111c017d0 Type = TY_Object BaseInput = 0x00007fff5fbff8f8 "/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp1.o" } } [4] = { first = { first = 0x0000000111c017d0 second = "x86_64-apple-darwin17.0.0-host" } second = { Data = { Filename = 0x0000000111c04830 "/var/folders/sh/cpr85hld32qf79m8x7vd31bw0000gn/T/openmp-offload-b856ec.o" InputArg = 0x0000000111c04830 } Kind = Filename Act = 0x0000000111c017d0 Type = TY_Object BaseInput = 0x00007fff5fbff8f8 "/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp1.o" } } [5] = { first = { first = 0x0000000111c01900 second = "nvptx64-nvidia-cuda-openmp" } second = { Data = { Filename = 0x0000000111c035d0 "/var/folders/sh/cpr85hld32qf79m8x7vd31bw0000gn/T/openmp-offload-be86a1.o" InputArg = 0x0000000111c035d0 } Kind = Filename Act = 0x0000000111c01900 Type = TY_Object BaseInput = 0x00007fff5fbff949 "/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp2.o" } } [6] = { first = { first = 0x0000000111c01900 second = "x86_64-apple-darwin17.0.0-host" } second = { Data = { Filename = 0x0000000111c034e0 "/var/folders/sh/cpr85hld32qf79m8x7vd31bw0000gn/T/openmp-offload-92791a.o" InputArg = 0x0000000111c034e0 } Kind = Filename Act = 0x0000000111c01900 Type = TY_Object BaseInput = 0x00007fff5fbff949 "/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp2.o" } } [7] = { first = { first = 0x0000000111c01a90 second = "nvptx64-nvidia-cuda-openmp" } second = { Data = { Filename = 0x0000000111c03d90 "/var/folders/sh/cpr85hld32qf79m8x7vd31bw0000gn/T/openmp-offload-8db204.out" InputArg = 0x0000000111c03d90 } Kind = Filename Act = 0x0000000111c01a90 Type = TY_Image BaseInput = 0x00007fff5fbff8f8 "/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp1.o" } } }
Key:
(lldb) p ActionTC (std::__1::pair<const clang::driver::Action *, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >) $1 = { first = 0x0000000111c017d0 second = "x86_64-apple-darwin17.0.0-x86_64-host" }
If you look at the map then you can see that it contains very similar keys, but not the exact one:
first = { first = 0x0000000111c017d0 second = "x86_64-apple-darwin17.0.0-host" } and first = { first = 0x0000000111c01320 second = "x86_64-apple-darwin17.0.0-x86_64-host" }
It looks like the triple is in the list though:
second = "x86_64-apple-darwin17.0.0-x86_64-host
it is entry [1].
Is the assertion the last access? Yes.
There must be a discrepancy between
UI.DependentBoundArch in the loop above and BoundArch that's used to compute TargetTC, otherwise GetTriplePlusArchString would return the key that matches the 0x0000000111c017d0 pointer, i.e. without the additional x86_64.
Maybe I misunderstood the output you pasted but it looks like ActionTC contains the same triple+arch that you can find in entry [1] in the map.
Yes, but entry 1 has a different pointer (0x0000000111c01320 vs 0x0000000111c017d0) so the keys are different.
Why is there a DependentBoundArch if it's always empty?
...
Hi Alex, I'm not sure why it's failing as I can't reproduce the error locally. Do you have access to a machine with the configuration the test uses?
Can you reproduce if you specifically force the host target to x86_64-apple-darwin17.0.0 (e.g., you pass -target x86_64-apple-darwin17.0.0)?
Driver/openmp-offload.c still fails on http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/7038, please fix.
In this case, it would be nicer if you arranged for ptxas to output to a .cubin file.