This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][NFC] Remove unused (always fixed) arguments
ClosedPublic

Authored by jdoerfert on Jul 6 2020, 6:05 PM.

Details

Summary

There are various runtime calls in the device runtime with unused, or
always fixed, arguments. This is bad for all sorts of reasons. Clean up
two before as we match them in OpenMPOpt now.

Diff Detail

Event Timeline

jdoerfert created this revision.Jul 6 2020, 6:05 PM
Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald TranscriptJul 6 2020, 6:05 PM
Hahnfeld added a subscriber: Hahnfeld.EditedJul 7 2020, 12:10 AM

This is definitely not NFC and breaks API compatibility (but apparently nobody cares anymore?).

ABataev added a subscriber: ABataev.Jul 7 2020, 4:35 AM

This is definitely not NFC and breaks API compatibility (but apparently nobody cares anymore?).

+1. Better to introduce new entry points and mark these ones as deprecated.

This is definitely not NFC and breaks API compatibility (but apparently nobody cares anymore?).

This is the device RTL. I am not aware we (want to) keep the API stable. If we are, I'm not sure why:

  • Dynamic linking (among other things) is not really an option so people that link against the device runtime (should) do so statically.
  • Linking against an old device runtime with a new clang seems unreasonable to me. If you replace clang you must replace the static runtime as the new clang might use new functions.

This is definitely not NFC and breaks API compatibility (but apparently nobody cares anymore?).

+1. Better to introduce new entry points and mark these ones as deprecated.

Same response as above. What is the use case here which we want to continue to support?

This is definitely not NFC and breaks API compatibility (but apparently nobody cares anymore?).

This is the device RTL. I am not aware we (want to) keep the API stable. If we are, I'm not sure why:

  • Dynamic linking (among other things) is not really an option so people that link against the device runtime (should) do so statically.
  • Linking against an old device runtime with a new clang seems unreasonable to me. If you replace clang you must replace the static runtime as the new clang might use new functions.

This is definitely not NFC and breaks API compatibility (but apparently nobody cares anymore?).

+1. Better to introduce new entry points and mark these ones as deprecated.

Same response as above. What is the use case here which we want to continue to support?

Use of the new library with the previous version of the compiler.

ABataev added inline comments.Jul 7 2020, 5:56 AM
clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
42

I think, instead the optimizer can try to detect if the runtime library is used by the kernel and switch this flag to 0 if no runtime calls are used in the kernel. For non-SPMD mode in most cases, the runtime is required, but in some cases, it can be disabled.

I'm not sure we have a consensus on api stability. Usually llvm allows mixing libraries and compilers from different sources, e.g. the various libunwind or compiler-rt vs libgcc. Libomptarget in general appears to be considered fixed and has external users (intel, maybe gcc).

The device runtime would be ill served by this default. This is the only openmp device runtime library which works with llvm. It's statically linked, usually as bitcode when performance is important. The code used to handle target offloading for nvptx is spread across the compiler and the runtime, probably not optimally.

I'm not familiar with the gcc-nvptx-openmp implementation. Reading through https://gcc.gnu.org/wiki/Offloading strongly suggests a totally independent implementation to this one. I don't think gcc can be using this runtime library for nvptx. It definitely doesn't for amdgcn. Proprietary compilers might be using this code, but we can have no duty of care to toolchains that use this code without telling us they're doing so.

Therefore the only backwards/forwards compatibility we can strive for is between different versions of clang and the device runtime. That seems potentially useful - one could use a release clang binary while working on the deviceRTL or vice versa. It's an expensive developer convenience though.

We would pay with things like the API rot fixed above. Introducing a faster lowering for an openmp construct would mean a redundant path through clang and some version checking to guess which runtime library we're targeting, which is not presently versioned. Likewise moving code between compiler and runtime becomes much more expensive to implement. Getting it wrong is inevitable given our test coverage.

I think the project is much better served by assuming that the runtime library used by clang is the one from the same hash in the monorepo. That leaves us free to fix debt and improve performance, at the price of needing to build clang from (near to) trunk while developing the rtl.

Perhaps we can embrace API stability later on, once we have things like versioning and a solid optimisation pipeline in place, especially if gcc wants to use the deviceRTL for nvptx. Now is too early.

JonChesterfield accepted this revision.Jul 7 2020, 6:25 AM

Aside from the API stability concern this looks uncontentious. Removes dead arguments, generally makes things simpler. Thus LGTM.

@Hahnfeld @ABataev - are you sufficiently persuaded that preserving the current interface is not worth the development cost?

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
42

If we can detect that no runtime calls are used, we should be able to do better than passing a different argument. E.g. delete some setup calls.

Failing that, if we want to pass an argument which says 'actually don't do any work', it shouldn't be the same argument used to check whether the runtime has been initialised.

This revision is now accepted and ready to land.Jul 7 2020, 6:25 AM

Aside from the API stability concern this looks uncontentious. Removes dead arguments, generally makes things simpler. Thus LGTM.

@Hahnfeld @ABataev - are you sufficiently persuaded that preserving the current interface is not worth the development cost?

No, I'm not. Long before that, we relied on the API stability and already have some runtime calls marked as deprecated. Especially taking into account, that libomp can be built separately.

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
42

No, I don't think you can do this in all cases

jdoerfert marked an inline comment as done.Jul 7 2020, 6:36 AM

I don't think gcc can be using this runtime library for nvptx.

Yes, and: We are (going to) use clang specific intrinsics to avoid CUDA (soon).

Use of the new library with the previous version of the compiler.

Except that you cannot generally expect this to work. In our supported use case the library is kept as bitcode (LLVM-IR). Bitcode is not backward compatible. An old toolchain (clang, llvm-link, ...) cannot be fed new IR and be expected to work. So, we are already not able to give a stability guarantee here, why pretend we do. The bitcode runtime has to be kept in-sync with the toolchain.

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
42

We can detect all we want but switching it *does not have any effect*.

@Hahnfeld @ABataev - are you sufficiently persuaded that preserving the current interface is not worth the development cost?

No, I'm not. Long before that, we relied on the API stability and already have some runtime calls marked as deprecated. Especially taking into account, that libomp can be built separately.

Yes, the existing v# naming and deprecated comments should also go.

What can libomp be built by separately? Nvcc and gcc don't use this runtime. That leaves us with downstream proprietary compilers derived from clang that are already stuck carrying extensive compatibility patches and usually ship as one large toolchain blob which only needs to be internally self consistent.

Especially taking into account, that libomp can be built separately.

This is *not* affecting libomp in any way.

Especially taking into account, that libomp can be built separately.

This is *not* affecting libomp in any way.

libomptarget and device runtimes are part of libomp. If you're going to remove some params, you'll need to modify the runtime functions too, no?

@Hahnfeld @ABataev - are you sufficiently persuaded that preserving the current interface is not worth the development cost?

No, I'm not. Long before that, we relied on the API stability and already have some runtime calls marked as deprecated. Especially taking into account, that libomp can be built separately.

Yes, the existing v# naming and deprecated comments should also go.

What can libomp be built by separately? Nvcc and gcc don't use this runtime. That leaves us with downstream proprietary compilers derived from clang that are already stuck carrying extensive compatibility patches and usually ship as one large toolchain blob which only needs to be internally self consistent.

Answered already: the previous version of the compiler with the new version of the runtime.

What can libomp be built by separately? Nvcc and gcc don't use this runtime. That leaves us with downstream proprietary compilers derived from clang that are already stuck carrying extensive compatibility patches and usually ship as one large toolchain blob which only needs to be internally self consistent.

Answered already: the previous version of the compiler with the new version of the runtime.

Still cannot be expected to work: https://reviews.llvm.org/D83268#2135951
Are there other use cases?

Especially taking into account, that libomp can be built separately.

This is *not* affecting libomp in any way.

libomptarget and device runtimes are part of libomp. If you're going to remove some params, you'll need to modify the runtime functions too, no?

No they are not.

Especially taking into account, that libomp can be built separately.

This is *not* affecting libomp in any way.

libomptarget and device runtimes are part of libomp. If you're going to remove some params, you'll need to modify the runtime functions too, no?

No they are not.

llvm-project/openmp/libomptarget

I don't think gcc can be using this runtime library for nvptx.

Yes, and: We are (going to) use clang specific intrinsics to avoid CUDA (soon).

Use of the new library with the previous version of the compiler.

Except that you cannot generally expect this to work. In our supported use case the library is kept as bitcode (LLVM-IR). Bitcode is not backward compatible. An old toolchain (clang, llvm-link, ...) cannot be fed new IR and be expected to work. So, we are already not able to give a stability guarantee here, why pretend we do. The bitcode runtime has to be kept in-sync with the toolchain.

There is still compatibility between clang10 and clang11. Or they are incompatible in LLVM IR level? Also, there was a mode (I don't remember if it was removed or not) where the runtime library could be linked as .a library, without LLVM IR inlining.

There is still compatibility between clang10 and clang11. Or they are incompatible in LLVM IR level?

That is the point. They might or might not be, right? There is no guarantee they are.

Also, there was a mode (I don't remember if it was removed or not) where the runtime library could be linked as .a library, without LLVM IR inlining.

That mode is deprecated.

llvm-project/openmp/libomptarget

Please use more words.

llvm-project/openmp/libomptarget

Please use more words.

libomptarget is part of libomp

There is still compatibility between clang10 and clang11. Or they are incompatible in LLVM IR level?

That is the point. They might or might not be, right? There is no guarantee they are.

Better to ask the users. Maybe, send an RFC to openmp-devs?

Also, there was a mode (I don't remember if it was removed or not) where the runtime library could be linked as .a library, without LLVM IR inlining.

That mode is deprecated.

llvm-project/openmp/libomptarget

Please use more words.

libomptarget is part of libomp

As mentioned before, no it is not.

Despite the similarity in name, libomp and libomptarget are distinct libraries, this was a conscious design choice.
FWIW, this patch does *not* modify libomptarget either. This modifies *the device runtime*, aka. libomptarget-nvptx-sm_XXX.bc.

Hahnfeld removed a subscriber: Hahnfeld.Jul 7 2020, 7:21 AM

Aside from the API stability concern this looks uncontentious. Removes dead arguments, generally makes things simpler. Thus LGTM.

@Hahnfeld @ABataev - are you sufficiently persuaded that preserving the current interface is not worth the development cost?

I'm neither, and I've long argued that being able to build the OpenMP runtime(s) without Clang trunk is an important use case. These arguments have gone largely unheard, so I'll not join this discussion once more.

Better to ask the users. Maybe, send an RFC to openmp-devs?

Sure: http://lists.llvm.org/pipermail/openmp-dev/2020-July/003531.html

__kmpc_spmd_kernel_init is always called with RequiresDataSharing == 0
Specifically, it's only called from clang, and emitSPMDEntryHeader unconditionally passes zero to it

I.e. I think there's more stuff that can be cleaned up in the theme of the above, suggest in later patches

This revision was automatically updated to reflect the committed changes.

This patch breaks compilation of previously working code.

I added the following to openmp/libomptarget/test/offloading/offloading_success.c:

+// RUN: %libomptarget-compile-run-and-check-nvptx64-nvidia-cuda

which results in

# command stderr:
ptxas offloading_success-openmp-nvptx64-nvidia-cuda.s, line 173; error   : Call has wrong number of parameters
ptxas fatal   : Ptx assembly aborted due to errors
clang-12: error: ptxas command failed with exit code 255 (use -v to see invocation)

The file offloading_success-openmp-nvptx64-nvidia-cuda.s contains:

.func  (.param .b32 func_retval0) __kmpc_kernel_parallel
(
        .param .b64 __kmpc_kernel_parallel_param_0,
        .param .b32 __kmpc_kernel_parallel_param_1
)
;

The same file then calls the function with one argument:

call.uni (retval0), 
__kmpc_kernel_parallel, 
(
param0
);

For the clang 11 release, we should either fix the codegen or revert this patch.

protze.joachim reopened this revision.Jul 24 2020, 10:18 AM

I carefully made sure, that the freshly built clang was used to execute the test. I opened https://bugs.llvm.org/show_bug.cgi?id=46836 to track the issue and made it release blocker.

This revision is now accepted and ready to land.Jul 24 2020, 10:18 AM

I carefully made sure, that the freshly built clang was used to execute the test. I opened https://bugs.llvm.org/show_bug.cgi?id=46836 to track the issue and made it release blocker.

The question is if you picked up a fresly build device runtime as well.

jdoerfert closed this revision.Jul 27 2020, 5:48 AM