Page MenuHomePhabricator

[OpenMP] enable bc file compilation using the latest clang
ClosedPublic

Authored by guansong on Mar 28 2018, 11:42 AM.

Diff Detail

Repository
rOMP OpenMP

Event Timeline

guansong created this revision.Mar 28 2018, 11:42 AM

What is the problem with building the bc library with the latest clang? I cannot reproduce the issue.

I mean using -DLIBOMPTARGET_NVPTX_ENABLE_BCLIB=1 on cmake command line, to build the bc libraries. Did you use that flag?

By latest clang I mean

/builds/guansong/llvm.trunk/llvm

  • 9273bb3 2018-03-27 | Use .set instead of = when printing assignment in assembly output (HEAD -> master, origin/master, origin/HEAD) [Krzysztof Parzyszek]

/builds/guansong/llvm.trunk/llvm/tools/clang

  • a4c6f25 2018-03-27 | [clang] Change std::sort to llvm::sort in response to r327219 (HEAD -> master, origin/master, origin/HEAD) [Mandeep Singh Grang]

And the OpenMP runtime I used

  • 24e7752 2018-03-26 | Move blocktime_str variable right before its first use (origin/master, origin/HEAD, master) [Jonathan Peyton]

Right, I remember this issue. It is caused by the fact that clang does not support __shared__ variables to be extern. This restriction was explicitly introduced here: https://reviews.llvm.org/D25125.

When upstreaming the nvptx RTL, this issue was discussed here: https://reviews.llvm.org/D14254?id=121993#inline-347991. That is why building the bc lib has been disabled by default. Back then you had mentioned that you had a fix for clang. What is the status of that fix?

I am really hesitant to accept the proposed solution because rdc is known to slow down CUDA code considerably. A clang-side fix would be much more preferable.

this fix I had is similar to 751ae2ab8, which is exactly for this.

I don't think this will create different LLVM IR code, except allow the shared extern, which is needed here. It is a proper flag to use for multiple compilation unit.

Maybe my search is missing something, but the only place I see CUDARelocatableDeviceCode is in lib/Sema/SemaDeclAttr.cpp to allow for extern shared. How could this be causing slowness? I would think forcing extern to be global would be slower.

In my search for cuda_rdc, I only see code to forward the option to clang cc1 if set in the driver. So again, I don't see it affecting anything but SemaDeclAttr.cpp.

Greg

My search got the same conclusion as Greg did, I don't think -fcuda-rdc will change the IR code, except allowing multiple compilation units as specified in the options.td

572 def fcuda_rdc : Flag<["-"], "fcuda-rdc">, Flags<[CC1Option, HelpHidden]>,
573 HelpText<"Generate relocatable device code, also known as separate compilation mode.">;

I believe this mode is needed for the code used in the deviceRTL, as we do want to create a bc library through multiple .cu files.

grokos accepted this revision.Apr 2 2018, 1:51 PM

OK, so we can proceed with this solution and if we observe any performance problem in the future then I will push a different fix (instead of declaring __shared__ variables as extern, we can have getter functions in the .cu file in which the variable is defined which other .cu files can call to get a pointer/reference to the variable).

This revision is now accepted and ready to land.Apr 2 2018, 1:51 PM
This revision was automatically updated to reflect the committed changes.

Post-commit because your commit didn't trigger an email (please subscribe to openmp-commits!).

IMO this is wrong and should be reverted. What should be done instead is detect whether the compiler supports that flag because it was only added recently. Older compilers (pre 4.0?) are able to build bclib without that flag. In both cases, the build system should enable the bclib by default because it's sensible to do.

Regarding performance: Relocatable code is possibly slower, but we only use that flag to produce bitcode which is inlined by the compiler. However, I'm still working on that feature (see D42922) and I can't say for sure that we won't end up emitting different IR once the support matures. So this definitely needs to be considered. That's also the reason why I didn't submit a patch yet.

Post-commit because your commit didn't trigger an email (please subscribe to openmp-commits!).

IMO this is wrong and should be reverted. What should be done instead is detect whether the compiler supports that flag because it was only added recently. Older compilers (pre 4.0?) are able to build bclib without that flag. In both cases, the build system should enable the bclib by default because it's sensible to do.

Regarding performance: Relocatable code is possibly slower, but we only use that flag to produce bitcode which is inlined by the compiler. However, I'm still working on that feature (see D42922) and I can't say for sure that we won't end up emitting different IR once the support matures. So this definitely needs to be considered. That's also the reason why I didn't submit a patch yet.

I agree. I tried using this flag prior to this being committed and I was unable to successfully compile the BC lib.

I agree. I tried using this flag prior to this being committed and I was unable to successfully compile the BC lib.

Yes, it's only in trunk and no released version (yet). It should work with current trunk though, however http://lab.llvm.org:8011/builders/openmp-clang-ppc64le-linux-debian seems offline...

I see. We compared the change in (4.0?) and after, realized the extern shared is turned off. We reenabled (basically allow it to pass parsing) and everything is expected.

We noticed the change 751ae2ab8 also reenable this under a flag, with its comments for separate compilation units. Based on that, I think this is the right thing to do, although you mentioned cuda-rdc may have profound effects then what it has now.

Thinking differently, nvcc allows extern shared, does that mean we should allow this without a special flag be used? Or have a dedicated flag, which means allow extern shared, to match nvcc's behavior for this kind of code?

I see. We compared the change in (4.0?) and after, realized the extern shared is turned off. We reenabled (basically allow it to pass parsing) and everything is expected.

This also was my original intention when implementing rdc, I just hadn't finished yet to write a patch for libomptarget.

We noticed the change 751ae2ab8 also reenable this under a flag, with its comments for separate compilation units. Based on that, I think this is the right thing to do, although you mentioned cuda-rdc may have profound effects then what it has now.

Please do use SVN revisions which is still the canonical revision control system. For reference, this is rC325136.

Thinking differently, nvcc allows extern shared, does that mean we should allow this without a special flag be used? Or have a dedicated flag, which means allow extern shared, to match nvcc's behavior for this kind of code?

No, I think Clang does the reasonable thing and emits an error for a declaration that is disallowed by the CUDA programming manual itself.

So , will the deviceRTLs/nvptx change? Instead of extern shared, what will it use for those data structures?

So , will the deviceRTLs/nvptx change? Instead of extern __shared__, what will it use for those data structures?

I don't question the flag, as I said libomptarget-nvptx was the main reason to implement that functionality. What's bad about this is adding this flag unconditionally, the build system needs to properly detect if the compiler supports it.

If you can't work on that right now, I hereby request this change to be reverted as it results in a build regression.

So , will the deviceRTLs/nvptx change? Instead of extern __shared__, what will it use for those data structures?

I don't question the flag, as I said libomptarget-nvptx was the main reason to implement that functionality. What's bad about this is adding this flag unconditionally, the build system needs to properly detect if the compiler supports it.

If you can't work on that right now, I hereby request this change to be reverted as it results in a build regression.

I see. I will take your advice to guard this flag with clang version, so we can have a way to test the bc file path with the latest clang (7.0). This is an important path for device which will take advantage of llvm linker/link time optimizer.

I don't think this is a regression through, as it is broken right now unless one uses an older (less 4.0) clang. Besides the bc file path is only enabled under -DLIBOMPTARGET_NVPTX_ENABLE_BCLIB=1.

I don't think this is a regression through, as it is broken right now unless one uses an older (less 4.0) clang.

... which is enough to satisfy the definition of a regression. Yes, Clang also regressed but that's no excuse to regress the runtime library. Please revert this change or let me know if you can't and I'll do it.

I don't think this is a regression through, as it is broken right now unless one uses an older (less 4.0) clang.

... which is enough to satisfy the definition of a regression. Yes, Clang also regressed but that's no excuse to regress the runtime library. Please revert this change or let me know if you can't and I'll do it.

I can add this flag conditionally as you suggested, how about something like this? where we check llvm version?

# Set flags for Clang cuda compilation.  Only Clang is supported because there is
# no other compiler capable of generating bitcode from cuda sources.
if (LLVM_VERSION_MAJOR GREATER 6)
  set(CUDA_FLAGS
    -emit-llvm
    -fcuda-rdc
    -O1
    -Xclang -target-feature
    -Xclang +${CUDA_PTX_VERSION}
    --cuda-device-only
    -DOMPTARGET_NVPTX_TEST=0 -DOMPTARGET_NVPTX_DEBUG=0
    )
else()
  set(CUDA_FLAGS
    -emit-llvm
    -O1
    -Xclang -target-feature
    -Xclang +${CUDA_PTX_VERSION}
    --cuda-device-only
    -DOMPTARGET_NVPTX_TEST=0 -DOMPTARGET_NVPTX_DEBUG=0
    )
endif()

I don't think this is a regression through, as it is broken right now unless one uses an older (less 4.0) clang.

... which is enough to satisfy the definition of a regression. Yes, Clang also regressed but that's no excuse to regress the runtime library. Please revert this change or let me know if you can't and I'll do it.

I can add this flag conditionally as you suggested, how about something like this? where we check llvm version?

# Set flags for Clang cuda compilation.  Only Clang is supported because there is
# no other compiler capable of generating bitcode from cuda sources.
if (LLVM_VERSION_MAJOR GREATER 6)
  set(CUDA_FLAGS
    -emit-llvm
    -fcuda-rdc
    -O1
    -Xclang -target-feature
    -Xclang +${CUDA_PTX_VERSION}
    --cuda-device-only
    -DOMPTARGET_NVPTX_TEST=0 -DOMPTARGET_NVPTX_DEBUG=0
    )
else()
  set(CUDA_FLAGS
    -emit-llvm
    -O1
    -Xclang -target-feature
    -Xclang +${CUDA_PTX_VERSION}
    --cuda-device-only
    -DOMPTARGET_NVPTX_TEST=0 -DOMPTARGET_NVPTX_DEBUG=0
    )
endif()

No, it needs to be a compiler check. LLVM_VERSION won't be set if building openmp standalone.

I don't think this is a regression through, as it is broken right now unless one uses an older (less 4.0) clang.

... which is enough to satisfy the definition of a regression. Yes, Clang also regressed but that's no excuse to regress the runtime library. Please revert this change or let me know if you can't and I'll do it.

I can add this flag conditionally as you suggested, how about something like this? where we check llvm version?

# Set flags for Clang cuda compilation.  Only Clang is supported because there is
# no other compiler capable of generating bitcode from cuda sources.
if (LLVM_VERSION_MAJOR GREATER 6)
  set(CUDA_FLAGS
    -emit-llvm
    -fcuda-rdc
    -O1
    -Xclang -target-feature
    -Xclang +${CUDA_PTX_VERSION}
    --cuda-device-only
    -DOMPTARGET_NVPTX_TEST=0 -DOMPTARGET_NVPTX_DEBUG=0
    )
else()
  set(CUDA_FLAGS
    -emit-llvm
    -O1
    -Xclang -target-feature
    -Xclang +${CUDA_PTX_VERSION}
    --cuda-device-only
    -DOMPTARGET_NVPTX_TEST=0 -DOMPTARGET_NVPTX_DEBUG=0
    )
endif()

No, it needs to be a compiler check. LLVM_VERSION won't be set if building openmp standalone.

Currently we only enable the bc file path when we specified on the cmake command line with these flags -DLIBOMPTARGET_NVPTX_CUDA_COMPILER and -DLIBOMPTARGET_NVPTX_SELECTED_BC_LINKER. What we need is to check those compiler and linker's llvm version, and then test that llvm version here.

Currently we only enable the bc file path when we specified on the cmake command line with these flags -DLIBOMPTARGET_NVPTX_CUDA_COMPILER and -DLIBOMPTARGET_NVPTX_SELECTED_BC_LINKER.

We already enable it as soon as the user says -DLIBOMPTARGET_NVPTX_ENABLE_BCLIB and we find the Clang compiler suitable. This change breaks the build with Clang 3.9.1 and I will revert it early next week to fix this regression.

What we need is to check those compiler and linker's llvm version, and then test that llvm version here.

We shouldn't hardcode the versions. I'd propose the following:

  1. Can LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER compile extern __shared__ without additional flag? (pre Clang 4.0)
  2. If we add -fcuda-rdc will it work then? (Clang trunk and later on released version 7.0)

In both cases, we should default LIBOMPTARGET_NVPTX_ENABLE_BCLIB to On. (All these checks should probably reside in some config-ix.cmake...)

Currently we only enable the bc file path when we specified on the cmake command line with these flags -DLIBOMPTARGET_NVPTX_CUDA_COMPILER and -DLIBOMPTARGET_NVPTX_SELECTED_BC_LINKER.

We already enable it as soon as the user says -DLIBOMPTARGET_NVPTX_ENABLE_BCLIB and we find the Clang compiler suitable. This change breaks the build with Clang 3.9.1 and I will revert it early next week to fix this regression.

Understand your situation now, if you have 3.9.1 clang used as runtime build compiler, then this will cause issues when you specify ENABLE_BCLIB.

What we need is to check those compiler and linker's llvm version, and then test that llvm version here.

We shouldn't hardcode the versions. I'd propose the following:

  1. Can LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER compile extern __shared__ without additional flag? (pre Clang 4.0)
  2. If we add -fcuda-rdc will it work then? (Clang trunk and later on released version 7.0)

    In both cases, we should default LIBOMPTARGET_NVPTX_ENABLE_BCLIB to On. (All these checks should probably reside in some config-ix.cmake...)

Checking that way is good, though personally I am in favor of using find llvm package to check LLVM version. Regardless I think we want the same thing here, i.e. to enable the bc lib path by default.

We shouldn't hardcode the versions. I'd propose the following:

  1. Can LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER compile extern __shared__ without additional flag? (pre Clang 4.0)
  2. If we add -fcuda-rdc will it work then? (Clang trunk and later on released version 7.0)

    In both cases, we should default LIBOMPTARGET_NVPTX_ENABLE_BCLIB to On. (All these checks should probably reside in some config-ix.cmake...)

Checking that way is good, though personally I am in favor of using find llvm package to check LLVM version. Regardless I think we want the same thing here, i.e. to enable the bc lib path by default.

Again: I'm against any hardcoding of compiler versions, support for flags can be detected dynamically. As such I don't see the point of using find_package, can you give an example which information you'd like to use from there?

I can see the advantage of dynamic testing. But to test everything dynamically maybe tedious. For bc file handling, I consider this is more of a project need to embed LLVM itself, using find_package is more suitable. http://llvm.org/docs/CMake.html#embedding

I have revert the change, let's continue to discuss how to enable bc file path better.

I can see the advantage of dynamic testing. But to test everything dynamically maybe tedious. For bc file handling, I consider this is more of a project need to embed LLVM itself, using find_package is more suitable. http://llvm.org/docs/CMake.html#embedding

proper link: http://llvm.org/docs/CMake.html#embedding-llvm-in-your-project

I have revert the change, let's continue to discuss how to enable bc file path better.

! In D44992#1061781, @guansong wrote:
I have revert the change, let's continue to discuss how to enable bc file path better.

You are still not subscribed to openmp-commits, so again no commit email! For reference, the commit is rL329576 (and you shouldn't quote git commit hashes, there is a script in LLVM to revert SVN changes from git-svn.)

I can see the advantage of dynamic testing. But to test everything dynamically maybe tedious. For bc file handling, I consider this is more of a project need to embed LLVM itself, using find_package is more suitable. http://llvm.org/docs/CMake.html#embedding

proper link: http://llvm.org/docs/CMake.html#embedding-llvm-in-your-project

We probably don't need to test all flags on their own, we could group the basic flags together. And I still don't get which variable you want to use from that CMake integration?

! In D44992#1061781, @guansong wrote:
I have revert the change, let's continue to discuss how to enable bc file path better.

You are still not subscribed to openmp-commits, so again no commit email! For reference, the commit is rL329576 (and you shouldn't quote git commit hashes, there is a script in LLVM to revert SVN changes from git-svn.)

I was trying to use my office email to subscribe the mailing list, it may not be approved yet. How can I check on that? Currently I only get openmp-commits message from my gmail.

For reverting, I revert the patch from git, and commit it use git svn dcommit. I guess you are referring the following comment in the log

This reverts commit 6849e31c36d712d97433bca9af39b7a09c8c1207.

generated during the process. What is the proper script to use?

I can see the advantage of dynamic testing. But to test everything dynamically maybe tedious. For bc file handling, I consider this is more of a project need to embed LLVM itself, using find_package is more suitable. http://llvm.org/docs/CMake.html#embedding

proper link: http://llvm.org/docs/CMake.html#embedding-llvm-in-your-project

We probably don't need to test all flags on their own, we could group the basic flags together. And I still don't get which variable you want to use from that CMake integration?

Unless the grouped flags get accepted or rejected together by the compiler, we won't know which flag caused the failure. For the moment, I intend to use LLVM_VERSION_MAJOR and LLVM_INSTALL_PREFIX/LLVM_BUILD_BINARY_DIR only. But I can see more if we further explore the bc files.

! In D44992#1061781, @guansong wrote:
I have revert the change, let's continue to discuss how to enable bc file path better.

You are still not subscribed to openmp-commits, so again no commit email! For reference, the commit is rL329576 (and you shouldn't quote git commit hashes, there is a script in LLVM to revert SVN changes from git-svn.)

I was trying to use my office email to subscribe the mailing list, it may not be approved yet. How can I check on that? Currently I only get openmp-commits message from my gmail.

For reverting, I revert the patch from git, and commit it use git svn dcommit. I guess you are referring the following comment in the log

This reverts commit 6849e31c36d712d97433bca9af39b7a09c8c1207.

generated during the process. What is the proper script to use?

There is utils/git-svn/git-svnrevert in the main LLVM repository.

I can see the advantage of dynamic testing. But to test everything dynamically maybe tedious. For bc file handling, I consider this is more of a project need to embed LLVM itself, using find_package is more suitable. http://llvm.org/docs/CMake.html#embedding

proper link: http://llvm.org/docs/CMake.html#embedding-llvm-in-your-project

We probably don't need to test all flags on their own, we could group the basic flags together. And I still don't get which variable you want to use from that CMake integration?

Unless the grouped flags get accepted or rejected together by the compiler, we won't know which flag caused the failure. For the moment, I intend to use LLVM_VERSION_MAJOR and LLVM_INSTALL_PREFIX/LLVM_BUILD_BINARY_DIR only. But I can see more if we further explore the bc files.

I'd group the current set of flags and add a second check for -fcuda-rdc. I'm saying this a third time: There should be no hardcoding of versions!

! In D44992#1061781, @guansong wrote:
I have revert the change, let's continue to discuss how to enable bc file path better.

You are still not subscribed to openmp-commits, so again no commit email! For reference, the commit is rL329576 (and you shouldn't quote git commit hashes, there is a script in LLVM to revert SVN changes from git-svn.)

I was trying to use my office email to subscribe the mailing list, it may not be approved yet. How can I check on that? Currently I only get openmp-commits message from my gmail.

For reverting, I revert the patch from git, and commit it use git svn dcommit. I guess you are referring the following comment in the log

This reverts commit 6849e31c36d712d97433bca9af39b7a09c8c1207.

generated during the process. What is the proper script to use?

There is utils/git-svn/git-svnrevert in the main LLVM repository.

I can see the advantage of dynamic testing. But to test everything dynamically maybe tedious. For bc file handling, I consider this is more of a project need to embed LLVM itself, using find_package is more suitable. http://llvm.org/docs/CMake.html#embedding

proper link: http://llvm.org/docs/CMake.html#embedding-llvm-in-your-project

We probably don't need to test all flags on their own, we could group the basic flags together. And I still don't get which variable you want to use from that CMake integration?

Unless the grouped flags get accepted or rejected together by the compiler, we won't know which flag caused the failure. For the moment, I intend to use LLVM_VERSION_MAJOR and LLVM_INSTALL_PREFIX/LLVM_BUILD_BINARY_DIR only. But I can see more if we further explore the bc files.

I'd group the current set of flags and add a second check for -fcuda-rdc. I'm saying this a third time: There should be no hardcoding of versions!

Why using version is not acceptable? It may not be desirable, but comparing to dynamic testing, it has its own advantages?

Has this been implemented elsewhere already? Last I tried this flag was still needed here in order to build the bitcode library compilation.

Has this been implemented elsewhere already? Last I tried this flag was still needed here in order to build the bitcode library compilation.

No, at least I'm not aware of a new patch adding this functionality. I explained what I think should be done above to make this work in a CMake-ish way to avoid regressions. I might try to tackle this myself, but no guarantee

Thanks for the response. Can you point me to the final solution for this or re-explain it? From the comments I'm not sure I can distill the solution you want.

Thanks for the response. Can you point me to the final solution for this or re-explain it? From the comments I'm not sure I can distill the solution you want.

My understanding from Jonas is that he prefer to use cmake test to find out if this flag works, and add the flag if it does.

My suggestion will be to use LLVM version number, which Jonas does not agree. I think he has a reason for that, though personally I don't feel the reason is strong enough.

I've posted D46901 which dynamically checks supported flags, compiles with Clang 3.9 and current trunk and also allows us to enable libomptarget-nvptx by default.