gtbercea (Gheorghe-Teodor Bercea)
User

Projects

User does not belong to any projects.

User Details

User Since
Dec 29 2016, 12:44 AM (33 w, 3 d)

Recent Activity

Sat, Aug 12

gtbercea added a comment to D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.

Couldn't fix/find the actual error so for now, just moving the flag patch tests to openmp-offload-gpu.c which is a disabled test.

310765

Bad news, the bot is still red: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/7114

Disabled openmp-offload.c on Linux again: https://reviews.llvm.org/rL310772

Sat, Aug 12, 11:03 AM

Fri, Aug 11

gtbercea added a comment to D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.
Fri, Aug 11, 2:19 PM
gtbercea added a comment to D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.

I have re-enabled the previous offloading tests and moved the new GPU offloading tests to a new file which is disabled for linux (for now).

310718

Alex thanks so much for the logs, they have been very useful to understand what's going on.

Aleksey, I have since tried to install a Clang version with the address sanitizer enabled but without much success. Apart from turning on the sanitizer in the cmake using the -DLLVM_USE_SANITIZER="Address" flag is there any other flag that I need to pass to cmake?
I am trying to run this on my macbook x86_64 and OS X 10.11. I am getting the following error when building the compiler:

[2966/4254] Linking CXX shared library lib/libc++abi.1.0.dylib
FAILED: lib/libc++abi.1.0.dylib
Undefined symbols for architecture x86_64:

"___asan_after_dynamic_init", referenced from:
    __GLOBAL__sub_I_cxa_default_handlers.cpp in cxa_default_handlers.cpp.o
"___asan_before_dynamic_init", referenced from:
    __GLOBAL__sub_I_cxa_default_handlers.cpp in cxa_default_handlers.cpp.o

[...]
ld: symbol(s) not found for architecture x86_64

Actually, you can run our bot, it is in zorg (http://llvm.org/git/zorg.git), zorg/buildbot/builders/sanitizers/buildbot_fast.sh (the one I linked the last time).

Create a temp folder and from that folder run:
BUILDBOT_REVISION= BUILDBOT_CLOBBER= $PATH_YOUR_PROJECTS$/zorg/zorg/buildbot/builders/sanitizers/buildbot_fast.sh

Fri, Aug 11, 1:38 PM
gtbercea added a comment to D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.

I have re-enabled the previous offloading tests and moved the new GPU offloading tests to a new file which is disabled for linux (for now).

Fri, Aug 11, 9:08 AM

Thu, Aug 10

gtbercea added a comment to D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.

First of all, I apologize if I've upset you with my previous post. I am actively working on understanding what is causing these issues. It is not my intention to write tests that work on local configurations only. I am upset to see that these tests keep failing for your and maybe other configurations. Without knowing the actual reason of the failures I can only speculate what is going wrong with them hence the flurry of changes.

Thank you, apology accepted. That was exactly my point, not to start a fight, but to emphasize that depending on local configuration is never going to work, you will never be able to see and test all of them. Please disable the test ASAP and until the better way to handle it is determined.

Thu, Aug 10, 10:00 AM
gtbercea added a comment to D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.

The failures were very widespread, e.g. there's a linux buildbot that was red until the revert: http://bb.pgr.jp/builders/test-clang-i686-linux-RA. If you have access to a linux machine you should be able to reproduce the failures that the bot experienced by using the same cmake arguments (I don't know the exact ones, but judging from the bot you should be able to reproduce them using 32 bit release build with assertions enabled). I don't know what GPU that buildbot has.

Thu, Aug 10, 9:24 AM
gtbercea added a comment to D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.

First of all, I apologize if I've upset you with my previous post. I am actively working on understanding what is causing these issues. It is not my intention to write tests that work on local configurations only. I am upset to see that these tests keep failing for your and maybe other configurations. Without knowing the actual reason of the failures I can only speculate what is going wrong with them hence the flurry of changes.

Thu, Aug 10, 8:35 AM
gtbercea added a comment to D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.
Thu, Aug 10, 6:44 AM

Wed, Aug 9

gtbercea added a comment to D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.

310549 should solve this problem by using a default architecture that is supported by the current device version.

Wed, Aug 9, 10:04 PM
gtbercea added a comment to D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.

Thanks for running the test on your machine! This is very useful.

Wed, Aug 9, 8:15 PM
gtbercea added a comment to D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.

I've removed that test. Let's see if the other two tests pass or not.

Wed, Aug 9, 4:50 PM
gtbercea added a comment to D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.

Even after r310505, openmp-offload.c continues to haunt our bots, for example http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/2012. Can you please fix this test?

Wed, Aug 9, 1:51 PM
gtbercea closed D36537: [OpenMP] Enable executable lookup into driver directory..
Wed, Aug 9, 12:53 PM
gtbercea updated the summary of D36537: [OpenMP] Enable executable lookup into driver directory..
Wed, Aug 9, 12:03 PM
gtbercea updated the diff for D36537: [OpenMP] Enable executable lookup into driver directory..

Add comment.

Wed, Aug 9, 12:02 PM
gtbercea created D36537: [OpenMP] Enable executable lookup into driver directory..
Wed, Aug 9, 11:50 AM
gtbercea added a comment to D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.

Revision 310505 fixes the tests for this patch.

Wed, Aug 9, 11:30 AM
gtbercea added a comment to D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.

Looks like this test is failing on macOS again after this change:

http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_check/39231/testReport/Clang/Driver/openmp_offload_c/

Can you please take a look?

Wed, Aug 9, 10:50 AM
gtbercea closed D29905: [OpenMP] Pass argument to device kernel by reference when map is used. .

Already covered by D34888

Wed, Aug 9, 9:18 AM
gtbercea closed D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.
Wed, Aug 9, 8:57 AM
gtbercea closed D29659: [OpenMP] Add flag for disabling the default generation of relocatable OpenMP target code for NVIDIA GPUs..
Wed, Aug 9, 8:28 AM

Tue, Aug 8

gtbercea added a comment to D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.

I have just pushed a fix, revision 310433.

Tue, Aug 8, 6:05 PM
gtbercea added a comment to D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.

The last RUN line in the new commit triggers the same assertion failure:

...

Hi Alex, I'm not sure why it's failing as I can't reproduce the error locally. Do you have access to a machine with the configuration the test uses?

Can you reproduce if you specifically force the host target to x86_64-apple-darwin17.0.0 (e.g., you pass -target x86_64-apple-darwin17.0.0)?

Tue, Aug 8, 10:38 AM
gtbercea added a comment to D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.

Is that the last access to CachedResults before the error?

Is the assertion the last access? Yes.

There must be a discrepancy between

UI.DependentBoundArch in the loop above and BoundArch that's used to compute TargetTC, otherwise GetTriplePlusArchString would return the key that matches the 0x0000000111c017d0 pointer, i.e. without the additional x86_64.

Tue, Aug 8, 10:07 AM
gtbercea added a comment to D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.

Is that the last access to CachedResults before the error?

Tue, Aug 8, 9:57 AM
gtbercea added a comment to D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.

The "x86_64-apple-darwin17.0.0-x86_64-host" triple looks suspicious though

Tue, Aug 8, 9:49 AM
gtbercea added a comment to D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.

The last RUN line in the new commit triggers the same assertion failure:

Assertion failed: (CachedResults.find(ActionTC) != CachedResults.end() && "Result does not exist??"), function BuildJobsForActionNoCache, file /Users/alex/bisect/llvm/tools/clang/lib/Driver/Driver.cpp, line 3419.

backtrace:

* frame #0: 0x00007fffbf3a2b2e libsystem_kernel.dylib`__pthread_kill + 10
  frame #1: 0x00007fffbf4c72de libsystem_pthread.dylib`pthread_kill + 303
  frame #2: 0x00007fffbf30041f libsystem_c.dylib`abort + 127
  frame #3: 0x00007fffbf2c9f34 libsystem_c.dylib`__assert_rtn + 320
  frame #4: 0x0000000103a1f311 clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fff5fbfe518, C=0x0000000111c01130, A=0x0000000111c017d0, TC=0x0000000113000000, BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3418
  frame #5: 0x0000000103a1cf51 clang`clang::driver::Driver::BuildJobsForAction(this=0x00007fff5fbfe518, C=0x0000000111c01130, A=0x0000000111c017d0, TC=0x0000000113000000, BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
  frame #6: 0x0000000103a1dff3 clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fff5fbfe518, C=0x0000000111c01130, A=0x0000000111c01a30, TC=0x0000000113000000, BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3348
  frame #7: 0x0000000103a1cf51 clang`clang::driver::Driver::BuildJobsForAction(this=0x00007fff5fbfe518, C=0x0000000111c01130, A=0x0000000111c01af0, TC=0x0000000113000000, BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
  frame #8: 0x0000000103a1db7e clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fff5fbfe518, C=0x0000000111c01130, A=0x0000000111c014f0, TC=0x0000000113000000, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3310
  frame #9: 0x0000000103a1cf51 clang`clang::driver::Driver::BuildJobsForAction(this=0x00007fff5fbfe518, C=0x0000000111c01130, A=0x0000000111c014f0, TC=0x0000000113000000, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
  frame #10: 0x0000000103a0a602 clang`clang::driver::Driver::BuildJobs(this=0x00007fff5fbfe518, C=0x0000000111c01130) const at Driver.cpp:2843
  frame #11: 0x0000000103a00b9c clang`clang::driver::Driver::BuildCompilation(this=0x00007fff5fbfe518, ArgList=ArrayRef<const char *> @ 0x00007fff5fbfc218) at Driver.cpp:746
  frame #12: 0x0000000100005a92 clang`main(argc_=7, argv_=0x00007fff5fbff6a8) at driver.cpp:463
  frame #13: 0x00007fffbf260c05 libdyld.dylib`start + 1
  frame #14: 0x00007fffbf260c05 libdyld.dylib`start + 1
Tue, Aug 8, 8:22 AM
gtbercea added a comment to D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.

Great, thanks! I think that you can just revert my revert with the fix applied in one commit

Tue, Aug 8, 7:39 AM
gtbercea added a comment to D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.

Hi @gtbercea,
I couldn't reply to the email as cfe-commits didn't even register this commit somehow, so I'm replying here.

Unfortunately I had to revert this commit (r310291), + two others for a clean revert (r310300 and r310332) because it caused a test failure on macOS. This particular run line:

// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %t1.o %t2.o 2>&1 \
// RUN:   | FileCheck -check-prefix=CHK-TWOCUBIN %s

Causes the following assertion failure:

assert(CachedResults.find(ActionTC) != CachedResults.end() &&
       "Result does not exist??");

Here's a backtrace:

* frame #0: 0x00007fffbf3a2b2e libsystem_kernel.dylib`__pthread_kill + 10
  frame #1: 0x00007fffbf4c72de libsystem_pthread.dylib`pthread_kill + 303
  frame #2: 0x00007fffbf30041f libsystem_c.dylib`abort + 127
  frame #3: 0x00007fffbf2c9f34 libsystem_c.dylib`__assert_rtn + 320
  frame #4: 0x0000000103a1f2d1 clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fff5fbfe4e8, C=0x0000000111b11830, A=0x0000000111b11ed0, TC=0x0000000112819000, BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3418
  frame #5: 0x0000000103a1cf11 clang`clang::driver::Driver::BuildJobsForAction(this=0x00007fff5fbfe4e8, C=0x0000000111b11830, A=0x0000000111b11ed0, TC=0x0000000112819000, BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
  frame #6: 0x0000000103a1dfb3 clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fff5fbfe4e8, C=0x0000000111b11830, A=0x0000000111b12130, TC=0x0000000112819000, BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3348
  frame #7: 0x0000000103a1cf11 clang`clang::driver::Driver::BuildJobsForAction(this=0x00007fff5fbfe4e8, C=0x0000000111b11830, A=0x0000000111b121f0, TC=0x0000000112819000, BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
  frame #8: 0x0000000103a1db3e clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fff5fbfe4e8, C=0x0000000111b11830, A=0x0000000111b11bf0, TC=0x0000000112819000, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3310
  frame #9: 0x0000000103a1cf11 clang`clang::driver::Driver::BuildJobsForAction(this=0x00007fff5fbfe4e8, C=0x0000000111b11830, A=0x0000000111b11bf0, TC=0x0000000112819000, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
  frame #10: 0x0000000103a0a5c2 clang`clang::driver::Driver::BuildJobs(this=0x00007fff5fbfe4e8, C=0x0000000111b11830) const at Driver.cpp:2843
  frame #11: 0x0000000103a00b5c clang`clang::driver::Driver::BuildCompilation(this=0x00007fff5fbfe4e8, ArgList=ArrayRef<const char *> @ 0x00007fff5fbfc1e8) at Driver.cpp:746
  frame #12: 0x0000000100005a52 clang`main(argc_=9, argv_=0x00007fff5fbff670) at driver.cpp:463
  frame #13: 0x00007fffbf260c05 libdyld.dylib`start + 1

Could you please take a look?

Let me know if you need anything else,
Cheers,
Alex

Tue, Aug 8, 7:10 AM

Mon, Aug 7

gtbercea closed D32035: [OpenMP] Error when trying to offload to an unsupported architecture.
Mon, Aug 7, 2:13 PM
gtbercea closed D29904: [OpenMP] Prevent emission of exception handling code when using OpenMP to offload to NVIDIA devices..
Mon, Aug 7, 1:59 PM
gtbercea closed D29642: [OpenMP] Make OpenMP generated code for the NVIDIA device relocatable by default.
Mon, Aug 7, 1:32 PM
gtbercea closed D29644: [OpenMP] Pass -v to PTXAS if it was passed to the driver..
Mon, Aug 7, 1:22 PM
gtbercea closed D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.
Mon, Aug 7, 1:02 PM
gtbercea updated the diff for D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.

Add -no-canonical-prefixes to tests.

Mon, Aug 7, 12:46 PM
gtbercea closed D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.
Mon, Aug 7, 8:39 AM · Restricted Project
gtbercea updated the diff for D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.

Fix test comments.

Mon, Aug 7, 8:36 AM · Restricted Project

Sun, Aug 6

gtbercea updated the diff for D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.

Fix -march special casing.

Sun, Aug 6, 2:45 PM · Restricted Project
gtbercea added inline comments to D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.
Sun, Aug 6, 2:27 PM · Restricted Project
gtbercea updated the diff for D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.

Address comments.

Sun, Aug 6, 2:23 PM · Restricted Project

Sat, Aug 5

gtbercea updated the diff for D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.
Don't exclude flags when host matches offload toolchain.
Sat, Aug 5, 5:35 PM · Restricted Project
gtbercea updated the diff for D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.
New way to handle OpenMP target flags.
Sat, Aug 5, 4:45 PM · Restricted Project

Jul 10 2017

gtbercea added a comment to D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.

I think I have something that works which is similar to what you were requesting. Please let me know your thoughts!

Jul 10 2017, 4:15 PM · Restricted Project
gtbercea updated the diff for D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.

Address comments.

Jul 10 2017, 4:12 PM · Restricted Project
gtbercea added inline comments to D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.
Jul 10 2017, 10:08 AM · Restricted Project

Jul 6 2017

gtbercea added a comment to D29905: [OpenMP] Pass argument to device kernel by reference when map is used. .

Does this also include the fixes in the following revision?

https://reviews.llvm.org/D29905

Sorry, I wasn't aware of this revision and thought that it had long been committed. I just verified that the bug referenced in the summary is also fixed by my patch in D34888. However, I can't comment on whether this patch is still needed. Sorry for the conflicts if yes...

You probably should commit your patches earlier, you currently have 10 accepted revisions that have not yet been committed. This will also avoid complicated rebases and so on.

Jul 6 2017, 1:31 PM
gtbercea closed D29647: [OpenMP] Extend CLANG target options with device offloading kind..
Jul 6 2017, 9:22 AM
gtbercea updated the diff for D29647: [OpenMP] Extend CLANG target options with device offloading kind..

.

Jul 6 2017, 9:21 AM
gtbercea closed D29658: [OpenMP] Customize CUDA-based tool chain selection.
Jul 6 2017, 9:08 AM
gtbercea updated the diff for D29658: [OpenMP] Customize CUDA-based tool chain selection.

Use str()

Jul 6 2017, 9:01 AM

Jul 5 2017

gtbercea added inline comments to D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.
Jul 5 2017, 4:37 PM · Restricted Project
gtbercea updated the diff for D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.

Address Comments.

Jul 5 2017, 4:34 PM · Restricted Project
gtbercea updated the diff for D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.

Address comments.

Jul 5 2017, 4:33 PM · Restricted Project
gtbercea added a comment to D29658: [OpenMP] Customize CUDA-based tool chain selection.

Tests?

Jul 5 2017, 1:06 PM
gtbercea added inline comments to D29658: [OpenMP] Customize CUDA-based tool chain selection.
Jul 5 2017, 1:04 PM
gtbercea updated the diff for D29658: [OpenMP] Customize CUDA-based tool chain selection.

Rebase on latest master.

Jul 5 2017, 12:55 PM
gtbercea added inline comments to D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.
Jul 5 2017, 11:33 AM · Restricted Project
gtbercea added inline comments to D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.
Jul 5 2017, 9:05 AM · Restricted Project
gtbercea updated the diff for D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.

Address comments.

Jul 5 2017, 9:05 AM · Restricted Project
gtbercea added a comment to D34888: [OpenMP] Fix mapping of scalars for combined directives.

Does this also include the fixes in the following revision?

Jul 5 2017, 7:53 AM

Jun 30 2017

gtbercea added a comment to D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.

@hfinkel I've add the flag as suggested. There is one minor change, I used "=" instead of ":" when specifying the toolchain/triple. I also support the triple being omitted when there is only one offloading toolchain specified with -fopenmp-targets.

Jun 30 2017, 4:59 PM · Restricted Project
gtbercea updated the diff for D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.

Check -fopenmp-targets has one entry when using default toolchain in -Xopenmp-target.

Jun 30 2017, 4:55 PM · Restricted Project
gtbercea updated the diff for D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.

Pass OpenMP target options.

Jun 30 2017, 4:43 PM · Restricted Project
gtbercea added a comment to D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.

What happens if you have multiple targets? Maybe this should be -fopenmp-targets-arch=foo,bar,whatever?

Once this all lands, please make sure that you add additional test cases here. Make sure that the arch is passed through to the ptx and cuda tools as it should be. Make sure that the defaults work. Make sure that something reasonable happens if the user specifies the option more than once (if they're all the same).

Hi Hal,

At the moment only one arch is supported and it would apply to all the target triples under -fopenmp-targets.

I was planning to address the multiple archs problem in a future patch.

I am assuming that in the case of multiple archs, each arch in -fopenmp-targets-arch=A1,A2,A3 will bind to a corresponding triple in -fopenmp-targets=T1,T2,T3 like so: T1 with A1, T2 with A2 etc. Is this a practical interpretation of what should happen?

Yea, that's what I was thinking. I'm a bit concerned that none of this generalizes well. To take a step back, under what circumstances do we support multiple targets right now?

We allow -fopenmp-targets to get a list of triples. I am not aware of any limitations in terms of how many of these triples you can have. Even in the test file of this patch we have the following: "-targets=openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu,host-powerpc64le--linux"

Regarding tests: more tests can be added as a separate patch once offloading is enabled by the patch following this one (i.e. D29654). There actually is a test in D29654 where I check that the arch is passed to ptxas and nvlink correctly using this flag. I will add some more test cases to cover the other situations you mentioned.

Sounds good.

Thanks,

--Doru

In our previous solution there might be a problem. The same triple might be used multiple times just so that you can have several archs in the other flag (T1 and T2 being the same). There are some alternatives which I have discussed with @ABataev.

One solution could be to associate an arch with each triple to avoid positional matching of triples in one flag with archs in another flag:

-fopenmp-targets=T1:A1,T2,T3:A2

":A1" is optional, also, in the future, we can pass other things to the toolchain such as "-L/a/b/c/d":

-fopenmp-targets=T1:A1: -L/a/b/c/d,T2,T3:A2

Okay, good, this is exactly where I was going when I said I was worried about generalization. -march seems like one of many flags I might want to pass to the target compilation. Moreover, it doesn't seem special in what regard.

We have -Xclang and -mllvm, etc. to pass flags through to other stages of compilation. Could we do something similar here? Maybe something like: `-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7`. That's unfortunately long, but if there's only one target, we could omit the triple?

The triple could be omitted, absolutely.

If you have the following:

-fopenmp-targets=openmp-powerpc64le-ibm-linux-gnu `-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7 -Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr8`

This would end up having a toolchain called for each one of the -Xopenmp-target sets of flags even though a single triple was specified under the -fopenmp-targets. Would this be ok?

Why? That does not sound desirable. And could you even use these multiple outputs? I think you'd want to pass all of the arguments for each target triple to the one toolchain invocation for that target triple. Is that possible?

Jun 30 2017, 7:41 AM · Restricted Project

Jun 29 2017

gtbercea added a comment to D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.

What happens if you have multiple targets? Maybe this should be -fopenmp-targets-arch=foo,bar,whatever?

Once this all lands, please make sure that you add additional test cases here. Make sure that the arch is passed through to the ptx and cuda tools as it should be. Make sure that the defaults work. Make sure that something reasonable happens if the user specifies the option more than once (if they're all the same).

Hi Hal,

At the moment only one arch is supported and it would apply to all the target triples under -fopenmp-targets.

I was planning to address the multiple archs problem in a future patch.

I am assuming that in the case of multiple archs, each arch in -fopenmp-targets-arch=A1,A2,A3 will bind to a corresponding triple in -fopenmp-targets=T1,T2,T3 like so: T1 with A1, T2 with A2 etc. Is this a practical interpretation of what should happen?

Yea, that's what I was thinking. I'm a bit concerned that none of this generalizes well. To take a step back, under what circumstances do we support multiple targets right now?

We allow -fopenmp-targets to get a list of triples. I am not aware of any limitations in terms of how many of these triples you can have. Even in the test file of this patch we have the following: "-targets=openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu,host-powerpc64le--linux"

Regarding tests: more tests can be added as a separate patch once offloading is enabled by the patch following this one (i.e. D29654). There actually is a test in D29654 where I check that the arch is passed to ptxas and nvlink correctly using this flag. I will add some more test cases to cover the other situations you mentioned.

Sounds good.

Thanks,

--Doru

In our previous solution there might be a problem. The same triple might be used multiple times just so that you can have several archs in the other flag (T1 and T2 being the same). There are some alternatives which I have discussed with @ABataev.

One solution could be to associate an arch with each triple to avoid positional matching of triples in one flag with archs in another flag:

-fopenmp-targets=T1:A1,T2,T3:A2

":A1" is optional, also, in the future, we can pass other things to the toolchain such as "-L/a/b/c/d":

-fopenmp-targets=T1:A1: -L/a/b/c/d,T2,T3:A2

Okay, good, this is exactly where I was going when I said I was worried about generalization. -march seems like one of many flags I might want to pass to the target compilation. Moreover, it doesn't seem special in what regard.

We have -Xclang and -mllvm, etc. to pass flags through to other stages of compilation. Could we do something similar here? Maybe something like: `-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7`. That's unfortunately long, but if there's only one target, we could omit the triple?

Jun 29 2017, 3:42 PM · Restricted Project
gtbercea added a comment to D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.

What happens if you have multiple targets? Maybe this should be -fopenmp-targets-arch=foo,bar,whatever?

Once this all lands, please make sure that you add additional test cases here. Make sure that the arch is passed through to the ptx and cuda tools as it should be. Make sure that the defaults work. Make sure that something reasonable happens if the user specifies the option more than once (if they're all the same).

Hi Hal,

At the moment only one arch is supported and it would apply to all the target triples under -fopenmp-targets.

I was planning to address the multiple archs problem in a future patch.

I am assuming that in the case of multiple archs, each arch in -fopenmp-targets-arch=A1,A2,A3 will bind to a corresponding triple in -fopenmp-targets=T1,T2,T3 like so: T1 with A1, T2 with A2 etc. Is this a practical interpretation of what should happen?

Yea, that's what I was thinking. I'm a bit concerned that none of this generalizes well. To take a step back, under what circumstances do we support multiple targets right now?

Jun 29 2017, 2:36 PM · Restricted Project
gtbercea added a comment to D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.

What happens if you have multiple targets? Maybe this should be -fopenmp-targets-arch=foo,bar,whatever?

Once this all lands, please make sure that you add additional test cases here. Make sure that the arch is passed through to the ptx and cuda tools as it should be. Make sure that the defaults work. Make sure that something reasonable happens if the user specifies the option more than once (if they're all the same).

Jun 29 2017, 9:35 AM · Restricted Project
gtbercea closed D29645: [OpenMP] Pass -fopenmp-is-device to preprocessing and machine specific code generation stages.
Jun 29 2017, 8:59 AM
gtbercea closed D29339: [OpenMP] Add support for auxiliary triple specification.
Jun 29 2017, 8:49 AM
gtbercea updated the diff for D29339: [OpenMP] Add support for auxiliary triple specification.

Rebase

Jun 29 2017, 8:27 AM

Jun 28 2017

gtbercea updated the diff for D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.

[Update regression tests] Add a test for propagating the compute capability to the OpenMP device offloading toolchain which targets NVIDIA GPUs.
This is a test for patch D34784 which is enabled by this patch.

Jun 28 2017, 4:06 PM
gtbercea updated the diff for D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.
Jun 28 2017, 3:45 PM · Restricted Project
gtbercea created D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading.
Jun 28 2017, 3:44 PM · Restricted Project
gtbercea updated the diff for D29647: [OpenMP] Extend CLANG target options with device offloading kind..

Split previous diff into a "device offloading kind" patch (show here) and a new patch D34784 which relies on a new compiler flag.

Jun 28 2017, 3:27 PM
gtbercea added inline comments to D29647: [OpenMP] Extend CLANG target options with device offloading kind..
Jun 28 2017, 11:38 AM
gtbercea added inline comments to D29647: [OpenMP] Extend CLANG target options with device offloading kind..
Jun 28 2017, 11:19 AM
gtbercea abandoned D29651: [OpenMP] Consider LIBRARY_PATH when selecting library paths for NVPTX targets in OpenMP mode..

Not needed.
These changes are related to looking up the .bc library for inlining purposes. I believe @arpith-jacob has already handled this in trunk. Therefore this is obsolete code.

Jun 28 2017, 9:07 AM
gtbercea added inline comments to D29647: [OpenMP] Extend CLANG target options with device offloading kind..
Jun 28 2017, 8:37 AM
gtbercea updated the diff for D29647: [OpenMP] Extend CLANG target options with device offloading kind..

Updated diff to address comments.

Jun 28 2017, 8:34 AM
gtbercea added inline comments to D29647: [OpenMP] Extend CLANG target options with device offloading kind..
Jun 28 2017, 8:02 AM
gtbercea added inline comments to D29647: [OpenMP] Extend CLANG target options with device offloading kind..
Jun 28 2017, 7:41 AM
gtbercea added inline comments to D29647: [OpenMP] Extend CLANG target options with device offloading kind..
Jun 28 2017, 7:29 AM

May 12 2017

gtbercea updated the diff for D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.

Address comments.

May 12 2017, 7:52 AM

May 9 2017

gtbercea added a comment to D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.

ping

May 9 2017, 1:15 PM

May 2 2017

gtbercea added a reviewer for D29647: [OpenMP] Extend CLANG target options with device offloading kind.: Hahnfeld.
May 2 2017, 2:07 PM

May 1 2017

gtbercea added a comment to D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.

@rnk any further thought on the changes to this patch? Thanks :)

May 1 2017, 12:35 PM
gtbercea updated the diff for D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.

Change output of unbundler to produce cubin files when using OpenMP to offload to NVIDIA GPUs.

May 1 2017, 12:33 PM

Apr 21 2017

gtbercea updated the diff for D29659: [OpenMP] Add flag for disabling the default generation of relocatable OpenMP target code for NVIDIA GPUs..

Refactor if condition.

Apr 21 2017, 8:05 AM

Apr 20 2017

gtbercea added a reviewer for D29659: [OpenMP] Add flag for disabling the default generation of relocatable OpenMP target code for NVIDIA GPUs.: Hahnfeld.
Apr 20 2017, 11:26 AM

Apr 19 2017

gtbercea updated the diff for D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.

Avoid renaming by enabling PTXAS to generate an output file with the appropriate extension, in this case a cubin extension.

Apr 19 2017, 7:22 AM

Apr 17 2017

gtbercea updated the diff for D32035: [OpenMP] Error when trying to offload to an unsupported architecture.

Merge IF statements.

Apr 17 2017, 11:22 AM

Apr 13 2017

gtbercea updated the diff for D32035: [OpenMP] Error when trying to offload to an unsupported architecture.

Re-use an already existing flag rather than creating a new one.

Apr 13 2017, 1:46 PM
gtbercea added inline comments to D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.
Apr 13 2017, 11:21 AM
gtbercea updated the diff for D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.

Use the rename() utility function of LLVM for renaming the PTXAS output before invoking NVLINK.

Apr 13 2017, 11:19 AM
gtbercea created D32035: [OpenMP] Error when trying to offload to an unsupported architecture.
Apr 13 2017, 11:15 AM
gtbercea updated the diff for D29658: [OpenMP] Customize CUDA-based tool chain selection.

Remove tests which belong into a different patch.

Apr 13 2017, 11:10 AM
gtbercea updated the diff for D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.

Fix.

Apr 13 2017, 8:02 AM

Apr 12 2017

gtbercea updated the diff for D29905: [OpenMP] Pass argument to device kernel by reference when map is used. .

Update check before loop.

Apr 12 2017, 10:00 AM
gtbercea updated the diff for D29905: [OpenMP] Pass argument to device kernel by reference when map is used. .

Fix for loop range.

Apr 12 2017, 9:57 AM
gtbercea updated the diff for D29905: [OpenMP] Pass argument to device kernel by reference when map is used. .

Refactor code.

Apr 12 2017, 9:41 AM