This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
1
CodeGenAction.cpp
-
CodeGenModule.cpp
-
TargetInfo.h
11/28
TargetInfo.cpp
-
test/CodeGen/
-
CodeGen/
1/8
amdgcn-control-constants.c
2/9
amdgcn-link-control-constants.c

Differential D130096

[Clang][AMDGPU] Emit AMDGPU library control constants in clang
Needs ReviewPublic

Authored by jhuber6 on Jul 19 2022, 9:31 AM.

Download Raw Diff

Details

Reviewers

JonChesterfield
yaxunl
saiislam
arsenm
carlo.bertolli
MaskRay
jdoerfert
tianshilei1992
b-sumner

Summary

The AMDGPU library uses several control constants to change code paths
for the math functions and intrinsics. These are normally included using
several individual bitcode libraries at link time. However, this is
problematic because it requires us to know the AMDGPU architecture at
link time which should not be strictly necessary. This patch adds new
code that emits the constants that would normally be included by the
bitcode libraries. This removes around six libraries we would otherwise
need to include and now we can link these libraries in unconditionally
like we do with libdevice.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Jul 19 2022, 9:31 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2022, 9:31 AM

Herald added subscribers: kosarev, StephenFan, t-tye and 4 others. · View Herald Transcript

jhuber6 requested review of this revision.Jul 19 2022, 9:31 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2022, 9:31 AM

Herald added subscribers: cfe-commits, wdng. · View Herald Transcript

Let me know if I should move this code somewhere else, or if there are problems. One change I made is that the constant is weak_odr and hidden instead of linkonce_odr and protected. This is so this constant is alive until link time, AMDGPU pretty much always uses LTO so these should be optimized out when we internalize symbols. I'm assuming we don't need protected visibility as these shouldn't be read from another executable (e.g. the host).

Tagging Brian as the code owner of rocm device libs - emitting these in clang would simplify that library.

Currently clang reads these commandline flags and conditionally links in bitcode files to introduce these symbols. There's existing command line flags for controlling which files are linked. I think this patch should probably use the existing flags to choose which values to set and delete the existing handling.

As written I think this is a no op, in that the libraries will currently be linked anyway and override the symbols clang has injected

I've thought that directly emitting these constants would be better. This will also make it so you can't try to continue using llvm-link for these libraries, which is a plus since it doesn't have the same necessary attribute propagation clang does when linking these

In D130096#3663010, @JonChesterfield wrote:

Tagging Brian as the code owner of rocm device libs - emitting these in clang would simplify that library.

Currently clang reads these commandline flags and conditionally links in bitcode files to introduce these symbols. There's existing command line flags for controlling which files are linked. I think this patch should probably use the existing flags to choose which values to set and delete the existing handling.

As written I think this is a no op, in that the libraries will currently be linked anyway and override the symbols clang has injected

Yeah, I wasn't sure if I should do some scan to check if we actually need these. Basically just check if any function declarations start with __ocml. But that might untenable in the future as we try to move to a generic math library that doesn't eagerly emit target specific declarations in clang.

arsenm added inline comments.Jul 19 2022, 10:16 AM

clang/lib/CodeGen/TargetInfo.cpp
9436	Should use the address space enum
9443	Typo At

A safer bet is to use the current control flow that links in specific bitcode files, but create the global directly instead of linking in the file. That'll give us zero semantic change and a clang that ignores those bitcode files if present.

In D130096#3663062, @JonChesterfield wrote:

A safer bet is to use the current control flow that links in specific bitcode files, but create the global directly instead of linking in the file. That'll give us zero semantic change and a clang that ignores those bitcode files if present.

Do we expect those libraries to be linked per-TU via -mlink-builtin-bitcode? The usage I see passes them to lld directly.

Harbormaster completed remote builds in B176277: Diff 445849.Jul 19 2022, 10:41 AM

There is no constant propagation for globals with weak linage, right? Otherwise, it won't work. My concern is that there may be optimization passes which do not respect the weak linkage and uses the incorrect default value for OpenCL or HIP. Therefore I am not very confident to enable this for OpenCL or HIP unless these variables have the correct value based on the compilation options.

clang/lib/CodeGen/TargetInfo.cpp
9460	should be determined by the code object version option.
clang/test/CodeGen/amdgcn-occl-constants.c
8 ↗	(On Diff #445849)	need a check for __oclc_wavefrontsize64=0 for gfx1030

In D130096#3663295, @yaxunl wrote:

There is no constant propagation for globals with weak linage, right? Otherwise, it won't work. My concern is that there may be optimization passes which do not respect the weak linkage and uses the incorrect default value for OpenCL or HIP. Therefore I am not very confident to enable this for OpenCL or HIP unless these variables have the correct value based on the compilation options.

Yes, the problem is that linkonce_odr can be removed and as-such isn't usable for linking libraries late like we want to. You are correct that weak_odr normally cannot be propagated as another TU could potentially change it, but if we're linking this via LTO like AMDGPU does it should always be internalized in the linker. The OpenMP runtime has a similar weak_odr variable that gets internalized when we do LTO so it should apply here as well. Although my assumption is that AMDGPU always feeds bitcode directly to either lld or clang-linker-wrapper without invoking llc manually, I may be wrong there.

clang/lib/CodeGen/TargetInfo.cpp
9460	Yes I wasn't sure about this one. Could you elaborate where we derive that?

In D130096#3663295, @yaxunl wrote:

There is no constant propagation for globals with weak linage, right? Otherwise, it won't work. My concern is that there may be optimization passes which do not respect the weak linkage and uses the incorrect default value for OpenCL or HIP. Therefore I am not very confident to enable this for OpenCL or HIP unless these variables have the correct value based on the compilation options.

Instead of weak_odr we could probably use add this to compiler used instead if that's an issue.

In D130096#3663398, @jhuber6 wrote:

In D130096#3663295, @yaxunl wrote:

There is no constant propagation for globals with weak linage, right? Otherwise, it won't work. My concern is that there may be optimization passes which do not respect the weak linkage and uses the incorrect default value for OpenCL or HIP. Therefore I am not very confident to enable this for OpenCL or HIP unless these variables have the correct value based on the compilation options.

Instead of weak_odr we could probably use add this to compiler used instead if that's an issue.

the libraries get internalized as-is. Why does this need to be weak_odr?

In D130096#3663411, @arsenm wrote:

In D130096#3663398, @jhuber6 wrote:

In D130096#3663295, @yaxunl wrote:

There is no constant propagation for globals with weak linage, right? Otherwise, it won't work. My concern is that there may be optimization passes which do not respect the weak linkage and uses the incorrect default value for OpenCL or HIP. Therefore I am not very confident to enable this for OpenCL or HIP unless these variables have the correct value based on the compilation options.

Instead of weak_odr we could probably use add this to compiler used instead if that's an issue.

the libraries get internalized as-is. Why does this need to be weak_odr?

It depends where we want to do the linking. For my purposes I'd like to be able to link in these libraries at link time. This allows us to link in target specific libraries as-needed so we can make generated code more generic until linking or the backend. The problem with linkonce_odr is that it does not need to emit a symbol, so it will usually be optimized out by clang. E.g. the following won't work because these generated globals will be optimized out completely before we have any library to use them.

clang amdgpu.c -c -O3
clang amdgpu.o <link ocml.bc>

In D130096#3663062, @JonChesterfield wrote:

A safer bet is to use the current control flow that links in specific bitcode files, but create the global directly instead of linking in the file. That'll give us zero semantic change and a clang that ignores those bitcode files if present.

I think I understand what you're saying better now. We should instead have this controlled as a flag via clang that the driver will add. This will just tell us to trigger some backend utility to emit the same code. I can look into doing that, will make it easier to just have the clang driver state that we should emit this for HIP / OpenMP unless nogpulib is passed for example.

In D130096#3663411, @arsenm wrote:

In D130096#3663398, @jhuber6 wrote:

In D130096#3663295, @yaxunl wrote:

There is no constant propagation for globals with weak linage, right? Otherwise, it won't work. My concern is that there may be optimization passes which do not respect the weak linkage and uses the incorrect default value for OpenCL or HIP. Therefore I am not very confident to enable this for OpenCL or HIP unless these variables have the correct value based on the compilation options.

Instead of weak_odr we could probably use add this to compiler used instead if that's an issue.

the libraries get internalized as-is. Why does this need to be weak_odr?

The current patch does not consider HIP/OpenCL compile options, therefore the value of these variables are not correct for OpenCL/HIP. They need to be overridden by the variables with the same name in device libraries by clang through -mlink-builtin-bitcode.

If the patch check HIP/OpenCL compilation options to set the correct value for these variables, then it does not need weak linkage.

yaxunl added inline comments.Jul 20 2022, 10:20 AM

clang/lib/CodeGen/TargetInfo.cpp
9460	Yes I wasn't sure about this one. Could you elaborate where we derive that? CGM.getTarget().getTargetOpts().CodeObjectVersion

In D130096#3666155, @yaxunl wrote:

The current patch does not consider HIP/OpenCL compile options, therefore the value of these variables are not correct for OpenCL/HIP. They need to be overridden by the variables with the same name in device libraries by clang through -mlink-builtin-bitcode.

If the patch check HIP/OpenCL compilation options to set the correct value for these variables, then it does not need weak linkage.

Is we instead add it to compiler.used it should be propagated while staying alive for the linker https://godbolt.org/z/MG5n1MWWj. The downside is that this symbol will not be removed and a symbol to it will live in the binary. The symbol will have weak binding, so it won't cause any linker errors. But it's a little annoying to have things stick around like that. I'm considering making this code generation be controlled by a clang driver flag so we could potentially change behavior as needed there.

Adjusting, adding code generation options for the other constants and changing to use linkonce ODR linkage.

I attempted to follow Jon's suggestion and group it with the existing code. but all the existing handling for this occurs in the driver. So I don't think there's a convenient way to drop in this functionality without adding a new function as in this patch.

Harbormaster completed remote builds in B181539: Diff 453021.Aug 16 2022, 9:14 AM

lamb-j added a subscriber: lamb-j.Aug 16 2022, 9:25 AM

yaxunl added inline comments.Aug 22 2022, 4:36 PM

clang/lib/CodeGen/TargetInfo.cpp
9435	This does not support per-TU control variables. Probably should use internal linkage.
clang/lib/Frontend/CompilerInvocation.cpp
1679–1682 ↗	(On Diff #453021)	For OpenCL, it should be determined by options::OPT_cl_denorms_are_zero
clang/test/CodeGen/amdgcn-control-constants.c
8	need a test for -target-cpu gfx1030 -target-feature +wavefrontsize64 and check __oclc_wavefrontsize64 to be 1.
9	need an OpenCL test for -cl-denorms-are-zero
11	need OpenCL tests for -cl-finite-math-only and -cl-fast-relaxed-math
12	need OpenCL tests for -cl-unsafe-math-optimizations and -cl-fast-relaxed-math
13	need an OpenCL test for -cl-fp32-correctly-rounded-divide-sqrt. If it needs CodeGenOpt you may need to re-use the option for HIP.

Updating. I realized all of the math-related ones are already covered by driver options for AMDGPU passing the appropriate fp contract to the frontend. This patch gets rid of most of that handling and just uses those directly. Also makes it easier to test.

We also check if the +wavefront64 feature was explicitly turned on as part of @yaxunl's suggestion.

jhuber6 added inline comments.Aug 29 2022, 12:43 PM

clang/lib/CodeGen/TargetInfo.cpp
9435	The AMDGPU device libraries use `linkone_odr` so I figured it was the most appropriate here. It should mean that we can have multiple identical definitions and they don't clash. There's also no requirement for these to be emitted as symbols AFAIK.

yaxunl added inline comments.Aug 29 2022, 1:10 PM

clang/lib/CodeGen/TargetInfo.cpp
9435	The AMDGPU device libraries use `linkone_odr` so I figured it was the most appropriate here. It should mean that we can have multiple identical definitions and they don't clash. There's also no requirement for these to be emitted as symbols AFAIK. clang uses -mlink-builtin-bitcode to link these device libraries for HIP and OpenCL. Then the linkage of these variables becomes internal linkage. That's why it works for per-TU control.

yaxunl added inline comments.Aug 29 2022, 1:13 PM

clang/lib/CodeGen/TargetInfo.cpp
9435	The AMDGPU device libraries use `linkone_odr` so I figured it was the most appropriate here. It should mean that we can have multiple identical definitions and they don't clash. There's also no requirement for these to be emitted as symbols AFAIK. clang uses -mlink-builtin-bitcode to link these device libraries for HIP and OpenCL. Then the linkage of these variables becomes internal linkage. That's why it works for per-TU control. You may let HIP and OpenCL use internal linkage and C/C++/OpenMP use linkonce_odr since only HIP and OpenCL toolchain use -mlink-builtin-bitcode to link these device libraries

jhuber6 added inline comments.Aug 29 2022, 1:20 PM

clang/lib/CodeGen/TargetInfo.cpp
9435	I see, `linkonce_odr` implies that these should all have the same value which isn't necessarily true after linking. I'll change it to use private linkage. OpenMP right now links everything late which means that we don't allow these to be defined differently per-TU. This may be incorrect given this new method as each TU will have different things set. I can change OpenMP to use the `mlink` method after this patch which may be more strictly correct.

Changing to private linkage.

For OpenMP we could either make this use weak_odr so we have a single
definition surviving until link time for us to use. Or we could change OpenMP to
link in the bitcode libraries per-TU via -mlink-builtin-bitcode.

Harbormaster completed remote builds in B184003: Diff 456450.Aug 29 2022, 3:17 PM

If you want to overwrite them, weak/linkonce will work (no _odr). Private/internal will not be overwritten but existing uses will keep the private/internal version, IIRC. I assume you want the former.

Remove unused code gen option.

Harbormaster completed remote builds in B184055: Diff 456520.Aug 29 2022, 7:34 PM

yaxunl added inline comments.Aug 30 2022, 7:59 AM

clang/lib/CodeGen/TargetInfo.cpp
9435	I see, `linkonce_odr` implies that these should all have the same value which isn't necessarily true after linking. I'll change it to use private linkage. OpenMP right now links everything late which means that we don't allow these to be defined differently per-TU. This may be incorrect given this new method as each TU will have different things set. I can change OpenMP to use the `mlink` method after this patch which may be more strictly correct. On second thoughts, the idea for letting clang to emit these control variables might not work for HIP and OpenCL. The reason is that to support per-TU control variables, these variables need to be internal or private linkage, however, that means they cannot be used by other device library functions which are expecting non-internal linkage for them. Those device library functions will end up using control variables from device library bitcode any way. For OpenMP, it may be necessary to emit them as linkonce_odr, otherwise device library functions may not find them.

jhuber6 added inline comments.Aug 30 2022, 8:06 AM

clang/lib/CodeGen/TargetInfo.cpp
9435	On second thoughts, the idea for letting clang to emit these control variables might not work for HIP and OpenCL. The reason is that to support per-TU control variables, these variables need to be internal or private linkage, however, that means they cannot be used by other device library functions which are expecting non-internal linkage for them. Those device library functions will end up using control variables from device library bitcode any way. Right now we include each file per-TU using `-mlink-builtin-bitcode` which converts `linkonce_odr` to `private` linkage. Shouldn't this be equivalent? It may be possible to make some test showing a user of these constants to verify they get picked up correctly. If you're worried about these getting removed we may be able to stash them in `compiler.used`, that shouldn't impede the necessary constant propagation. Side note, OpenCL seems to optimize these out without `-disable-llvm-optzns` while HIP will not. Does OpenCL use some mandatory passes to ensure that these control variables get handled? This method of using control constants in general is somewhat problematic as it hides invalid code behind some mandatory CP and DCE passes. For OpenMP right now we just generate one version for each architecture, which is wasteful but somewhat easier to work with.

yaxunl added inline comments.Aug 31 2022, 7:05 AM

clang/lib/CodeGen/TargetInfo.cpp
9435	On second thoughts, the idea for letting clang to emit these control variables might not work for HIP and OpenCL. The reason is that to support per-TU control variables, these variables need to be internal or private linkage, however, that means they cannot be used by other device library functions which are expecting non-internal linkage for them. Those device library functions will end up using control variables from device library bitcode any way. Right now we include each file per-TU using `-mlink-builtin-bitcode` which converts `linkonce_odr` to `private` linkage. Shouldn't this be equivalent? It may be possible to make some test showing a user of these constants to verify they get picked up correctly. If you're worried about these getting removed we may be able to stash them in `compiler.used`, that shouldn't impede the necessary constant propagation. Let's assume the main program calls `foo()` and `foo()` uses a control variable `bar`. `foo()` is in a bitcode linked in by -mlink-builtin-bitcode. clang emits the control variable `bar` with private linkage in the main module. When clang tries to link `foo()`, it needs to resolve `bar`, but it cannot use the `bar` in the main module because `bar` has private linkage. Then `bar` becomes unresolved. Side note, OpenCL seems to optimize these out without `-disable-llvm-optzns` while HIP will not. Does OpenCL use some mandatory passes to ensure that these control variables get handled? This method of using control constants in general is somewhat problematic as it hides invalid code behind some mandatory CP and DCE passes. For OpenMP right now we just generate one version for each architecture, which is wasteful but somewhat easier to work with. Are you using clang -cc1 without other options? There are LLVM passes by default, but they should not depend on language. You can see which pass is removing them by -mllvm -print-after-all.

jhuber6 added inline comments.Sep 1 2022, 10:45 AM

clang/lib/CodeGen/TargetInfo.cpp
9435	Let's assume the main program calls foo() and foo() uses a control variable bar. foo() is in a bitcode linked in by -mlink-builtin-bitcode. clang emits the control variable bar with private linkage in the main module. When clang tries to link foo(), it needs to resolve bar, but it cannot use the bar in the main module because bar has private linkage. Then bar becomes unresolved. This is a good point, we link only used definitions when using `-mlink-builtin-bitcode`. I think we link via `-mlink-builtin-bitcode` prior to running the backend, so this will be after we create these definitions. In this case we will import a definition like the following: @__oclc_wavefrontsize64 = external local_unnamed_addr addrspace(4) constant i8, align 1 which has the same name, but cannot bind to the `private` variable. I think this is what `linkonce` linkage is supposed to provide, but I'm not overly familiar with the semantics. Are you using clang -cc1 without other options? There are LLVM passes by default, but they should not depend on language. You can see which pass is removing them by -mllvm -print-after-all. The sample tests in this patch show the `-x hip` version does not require `-disable-llvm-optzns` while the `-x cl` version does.

Changing to linkonce linkage. According to the LLVM spec this should have the
expected behaviour where a single definition is kept at link-time for each
module. I tested this with a sample HIP program and it had the desired
behaviour. I could add a test attempting to show this if needed.

Harbormaster completed remote builds in B185234: Diff 458186.Sep 6 2022, 10:06 AM

arsenm added inline comments.Sep 15 2022, 3:41 PM

clang/lib/CodeGen/TargetInfo.cpp
9448–9449	Do we really have to scan through the features too? This seems broken
9454	s/DenormAtZero/DenormAreZero/?
9457	or doesn't look right. finite only is no infinities and no nans (not sure why the library control merges the two)
9472–9474	These probably should use linkonce_odr

jhuber6 marked 3 inline comments as done.Sep 16 2022, 9:05 AM

jhuber6 added inline comments.

clang/lib/CodeGen/TargetInfo.cpp
9448–9449	@yaxunl wanted this so we didn't emit the global if the user manually overrode the features via `-Xclang` or similar.

Addressing comments.

yaxunl added inline comments.Sep 16 2022, 10:05 AM

clang/lib/CodeGen/TargetInfo.cpp
9467–9471	we need to disable emitting these variables for HIP -fgpu-rdc mode and OpenCL since they will break per-TU control variable. Other cases are OK.

arsenm added inline comments.Sep 16 2022, 10:05 AM

clang/lib/CodeGen/TargetInfo.cpp
9467	wavefrontsize belongs with the system ones

jhuber6 added inline comments.Sep 16 2022, 10:07 AM

clang/lib/CodeGen/TargetInfo.cpp
9467–9471	But the code would still depend on these and they wouldn't be present right

yaxunl added inline comments.Sep 16 2022, 10:13 AM

clang/lib/CodeGen/TargetInfo.cpp
9467	wavefrontsize belongs with the system ones You are right. `__oclc_wavefrontsize64` should always be emitted with linkonce_odr linkage since they need to be consistent among TU's. Therefore they should always be emitted. `__oclc_daz_opt`, `__oclc_finite_only_opt`, `__oclc_unsafe_math_opt`, and `__oclc_correctly_rounded_sqrt32` can be different per TU, therefore they should not be emitted for HIP `-fgpu-rdc` and OpenCL.

Harbormaster completed remote builds in B187177: Diff 460812.Sep 16 2022, 10:29 AM

jhuber6 added inline comments.Sep 16 2022, 10:49 AM

clang/lib/CodeGen/TargetInfo.cpp
9467	I'm still unsure, if we do not emit any of those control variables how will we use the device libraries for those builds.

yaxunl added inline comments.Sep 16 2022, 3:34 PM

clang/lib/CodeGen/TargetInfo.cpp
9467	I'm still unsure, if we do not emit any of those control variables how will we use the device libraries for those builds. In those cases, we will use -mlink-builtin-bitcode to get those variables from device libs, as we did before. They will have internal linkage after linking, therefore are per-TU.

Adding an extra check in CodeGenAction.cpp that forcibly internalizes these if we link in any modules in RDC mode. This is a considerable hack, but should solve the problem. It's not a great solution, so let me know if you think that this is a leser evil than linking in many bitcode files as we do now.

To reiterate, what this patch offers is.
+ Reduces number of files needed to link in, (no on/off files, only ocml.bc and ockl.bc are needed).
+ Enforces that the architecture constants are the same across the compilation
And I think negatively,

Requires a hack to internalize some variables to prevent linking problems
Some extra code in Clang

The best solution would be to handle these per-TU variables in the backend. Or maybe even all of these could be placed in the backend where the code paths that currently require a control constant could be a simple attribute that the backend will use to control code emission.

Harbormaster completed remote builds in B188735: Diff 462948.Sep 26 2022, 10:57 AM

In D130096#3815529, @jhuber6 wrote:

The best solution would be to handle these per-TU variables in the backend. Or maybe even all of these could be placed in the backend where the code paths that currently require a control constant could be a simple attribute that the backend will use to control code emission.

I'd prefer to avoid spreading special treatment of the device libraries into the backend. The contract is poorly defined and spread around too much as it is

yaxunl added inline comments.Sep 26 2022, 2:26 PM

clang/lib/CodeGen/CodeGenAction.cpp
299–308	need a test. Probably let clang generate a bitcode containing a function using these control vars, then link the bitcode by -mlink-builtin-bitcode, then check the linkage of these control vars.

Adding test

Harbormaster completed remote builds in B188804: Diff 463042.Sep 26 2022, 4:35 PM

ping

yaxunl added inline comments.Oct 3 2022, 7:45 AM

clang/test/CodeGen/amdgcn-link-control-constants.c
2–3	This is compiling HIP as host. Please add -fcuda-is-device.
7	use `__constant__` instead
9	add `__device__`
15	add `__device__`
17	add `__device__`

jhuber6 added inline comments.Oct 3 2022, 8:00 AM

clang/test/CodeGen/amdgcn-link-control-constants.c
2–3	This test should only require that the triple is `amdgcn`. I could potentially make the generation of the constants require HIP or OpenMPDevice, or OpenCL is enabled if you think that's bad.
2–3	I can also change it to just `-x c` if the HIP is the problem.

In D130096#3816149, @arsenm wrote:

I'd prefer to avoid spreading special treatment of the device libraries into the backend. The contract is poorly defined and spread around too much as it is

clang/test/CodeGen/amdgcn-link-control-constants.c
2–3	We probably want these magic constants for C++ code as well, so keying it off the triple (at least triple + that we're using rocm / compute stuff, which I think is adequately indicated by hsa in the triple) is better. And likewise don't want to emit these constants for non-gpu code, e.g. x64 host hip doesn't need the daz_opt constant, which also suggests triple is the right hook.

yaxunl added inline comments.Oct 3 2022, 9:00 AM

clang/test/CodeGen/amdgcn-link-control-constants.c
2–3	We don't officially support C on amdgcn but we officially support HIP. I would suggest move this to CodeGenCUDA and compile it as HIP, and use HIP syntax.

Moving test

Harbormaster completed remote builds in B190037: Diff 464767.Oct 3 2022, 1:00 PM

yaxunl added inline comments.Oct 3 2022, 2:05 PM

clang/test/CodeGen/amdgcn-control-constants.c
9	still missing this test, and some other tests for -cl-* options as commented below. Also, missing a HIP test for -ffast-math

jhuber6 added inline comments.Oct 3 2022, 2:13 PM

clang/test/CodeGen/amdgcn-control-constants.c
9	The cc1 math options tested individually should be enabled by `-ffast-math`.

yaxunl added inline comments.Oct 4 2022, 1:52 PM

clang/test/CodeGen/amdgcn-control-constants.c
9	Since we cannot test -ffast-math directly, can we add a driver test to ensure we are not missing any -cc1 options needed by the control variables when -ffast-math is specified for the driver? Thanks. Also, the -cl-* options are -cc1 options. We need to test them.

I don't like the fact that we need to have two different kinds of control constants, one per-TU and others per-link job. I'm wondering how difficult it would be to make the fast versions of the math calls use different entry points. That way we could handle this in the math header wrappers.

In D130096#3850472, @jhuber6 wrote:

I don't like the fact that we need to have two different kinds of control constants, one per-TU and others per-link job. I'm wondering how difficult it would be to make the fast versions of the math calls use different entry points. That way we could handle this in the math header wrappers.

That's really how the C linkage model wants you to handle this. I also would like to have FP value tracking optimizations take care of the special cases in the library code

In D130096#3850473, @arsenm wrote:

In D130096#3850472, @jhuber6 wrote:

I don't like the fact that we need to have two different kinds of control constants, one per-TU and others per-link job. I'm wondering how difficult it would be to make the fast versions of the math calls use different entry points. That way we could handle this in the math header wrappers.

That's really how the C linkage model wants you to handle this. I also would like to have FP value tracking optimizations take care of the special cases in the library code

There's the "small matter" of implementing the new device library functions. Why is all that more likeable than two kinds of control constants?

In D130096#3850550, @b-sumner wrote:

There's the "small matter" of implementing the new device library functions. Why is all that more likeable than two kinds of control constants?

Different functions providing different behaviors can be handled at link time like any other function, instead of the same functions providing different behaviors per translation unit and requires cloning. The current scheme transfers complexity from the device library build system into the driver and user binaries

In D130096#3850628, @arsenm wrote:

In D130096#3850550, @b-sumner wrote:

There's the "small matter" of implementing the new device library functions. Why is all that more likeable than two kinds of control constants?

Different functions providing different behaviors can be handled at link time like any other function, instead of the same functions providing different behaviors per translation unit and requires cloning. The current scheme transfers complexity from the device library build system into the driver and user binaries

Another benefit of this would be that linking could be done only once in a sound manner rather than requiring an instance of the ROCm device libraries to be included for each TU. Although we would probably still need the attribute propagation that -mlink-builtin-bitcode offers, so it wouldn't be quite as easy that throwing the device libs into the lld invocation.

Different functions providing different behaviors can be handled at link time like any other function, instead of the same functions providing different behaviors per translation unit and requires cloning. The current scheme transfers complexity from the device library build system into the driver and user binaries

OK, but we are talking about trading a solved problem with a solution working for years for adding a large amount of new work and new maintenance and new bugs. Does this need to be done now, or at all?

In D130096#3850708, @b-sumner wrote:

Different functions providing different behaviors can be handled at link time like any other function, instead of the same functions providing different behaviors per translation unit and requires cloning. The current scheme transfers complexity from the device library build system into the driver and user binaries

OK, but we are talking about trading a solved problem with a solution working for years for adding a large amount of new work and new maintenance and new bugs. Does this need to be done now, or at all?

I wouldn't really call it a solved problem when only one of the many users is currently linking these libraries correctly

We should just do this now. clang shouldn't have to dig around on disk to emit a constant definition for a constant it already knows, and we have a clear path to removing these globals altogether. I have adequate patches to completely delete __oclc_daz_opt today. __oclc_finite_only_opt should be deleteable as soon as nofpclass is inferred by default. Deleting __oclc_correctly_rounded_sqrt32 and __oclc_unsafe_math_opt require more work, but are basically the same thing and require extending the libcall optimizer pass.

It will be easier to delete these from the library as they become unnecessary if clang stops enforcing these files exists like it does today, and it's easier to just stop using them entirely than to delete them one at a time

arsenm added inline comments.Jul 26 2023, 3:04 PM

clang/lib/CodeGen/TargetInfo.cpp
9461–9463	Can we move this into something more proper in LangOpts?
9467	I'd hope you don't have to check relaxed math, finite only should suffice
9476	This should probably get an __llvm_amdgcn prefix and be renamed

arsenm mentioned this in D154129: [mlir][ROCDL] Adds the ROCDL target attribute..Aug 1 2023, 12:29 PM

saiislam mentioned this in D139730: [OpenMP][DeviceRTL][AMDGPU] Support code object version 5.Aug 21 2023, 11:25 AM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

16 lines

1 line

3 lines

76 lines

test/

CodeGen/

amdgcn-control-constants.c

49 lines

amdgcn-link-control-constants.c

42 lines

Diff 463042

clang/lib/CodeGen/CodeGenAction.cpp

Show First 20 Lines • Show All 284 Lines • ▼ Show 20 Lines	bool LinkInModules() {
} else {		} else {
Err = Linker::linkModules(*getModule(), std::move(LM.Module),		Err = Linker::linkModules(*getModule(), std::move(LM.Module),
LM.LinkFlags);		LM.LinkFlags);
}		}

if (Err)		if (Err)
return true;		return true;
}		}

		// If we linked in AMDGCN device libraries in RDC-mode we need these
		// constants to be internal to each TU. This is required as these
		// variables control math settings which can change per-TU and conflict
		// after linking.
		// TODO: This should be handled in the backend instead.
		if (!LinkModules.empty() && Gen->CGM().getTriple().isAMDGCN()) {
		const StringRef GVS[] = {"__oclc_daz_opt", "__oclc_unsafe_math_opt",
		"__oclc_finite_only_opt",
		"__oclc_correctly_rounded_sqrt32"};
		for (StringRef Name : GVS) {
		if (llvm::GlobalVariable *GV = getModule()->getGlobalVariable(Name))
		GV->setLinkage(llvm::GlobalValue::InternalLinkage);
		}
		}

		yaxunlUnsubmitted Not Done Reply Inline Actions need a test. Probably let clang generate a bitcode containing a function using these control vars, then link the bitcode by -mlink-builtin-bitcode, then check the linkage of these control vars. yaxunl: need a test. Probably let clang generate a bitcode containing a function using these control…
return false; // success		return false; // success
}		}

void HandleTranslationUnit(ASTContext &C) override {		void HandleTranslationUnit(ASTContext &C) override {
{		{
llvm::TimeTraceScope TimeScope("Frontend");		llvm::TimeTraceScope TimeScope("Frontend");
PrettyStackTraceString CrashInfo("Per-file LLVM IR generation");		PrettyStackTraceString CrashInfo("Per-file LLVM IR generation");
if (TimerIsEnabled) {		if (TimerIsEnabled) {
▲ Show 20 Lines • Show All 946 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 925 Lines • ▼ Show 20 Lines	void CodeGenModule::Release() {
if (getCodeGenOpts().StackProtectorGuardOffset != INT_MAX)		if (getCodeGenOpts().StackProtectorGuardOffset != INT_MAX)
getModule().setStackProtectorGuardOffset(		getModule().setStackProtectorGuardOffset(
getCodeGenOpts().StackProtectorGuardOffset);		getCodeGenOpts().StackProtectorGuardOffset);
if (getCodeGenOpts().StackAlignment)		if (getCodeGenOpts().StackAlignment)
getModule().setOverrideStackAlignment(getCodeGenOpts().StackAlignment);		getModule().setOverrideStackAlignment(getCodeGenOpts().StackAlignment);
if (getCodeGenOpts().SkipRaxSetup)		if (getCodeGenOpts().SkipRaxSetup)
getModule().addModuleFlag(llvm::Module::Override, "SkipRaxSetup", 1);		getModule().addModuleFlag(llvm::Module::Override, "SkipRaxSetup", 1);

		getTargetCodeGenInfo().emitTargetGlobals(*this);
getTargetCodeGenInfo().emitTargetMetadata(*this, MangledDeclNames);		getTargetCodeGenInfo().emitTargetMetadata(*this, MangledDeclNames);

EmitBackendOptionsMetadata(getCodeGenOpts());		EmitBackendOptionsMetadata(getCodeGenOpts());

// If there is device offloading code embed it in the host now.		// If there is device offloading code embed it in the host now.
EmbedObject(&getModule(), CodeGenOpts, getDiags());		EmbedObject(&getModule(), CodeGenOpts, getDiags());

// Set visibility from DLL storage class		// Set visibility from DLL storage class
▲ Show 20 Lines • Show All 6,149 Lines • Show Last 20 Lines

clang/lib/CodeGen/TargetInfo.h

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	virtual void setTargetAttributes(const Decl D, llvm::GlobalValue GV,
CodeGen::CodeGenModule &M) const {}		CodeGen::CodeGenModule &M) const {}

/// emitTargetMetadata - Provides a convenient hook to handle extra		/// emitTargetMetadata - Provides a convenient hook to handle extra
/// target-specific metadata for the given globals.		/// target-specific metadata for the given globals.
virtual void emitTargetMetadata(		virtual void emitTargetMetadata(
CodeGen::CodeGenModule &CGM,		CodeGen::CodeGenModule &CGM,
const llvm::MapVector<GlobalDecl, StringRef> &MangledDeclNames) const {}		const llvm::MapVector<GlobalDecl, StringRef> &MangledDeclNames) const {}

		/// Provides a convenient hook to handle extra target-specific globals.
		virtual void emitTargetGlobals(CodeGen::CodeGenModule &CGM) const {}

/// Any further codegen related checks that need to be done on a function call		/// Any further codegen related checks that need to be done on a function call
/// in a target specific manner.		/// in a target specific manner.
virtual void checkFunctionCallABI(CodeGenModule &CGM, SourceLocation CallLoc,		virtual void checkFunctionCallABI(CodeGenModule &CGM, SourceLocation CallLoc,
const FunctionDecl *Caller,		const FunctionDecl *Caller,
const FunctionDecl *Callee,		const FunctionDecl *Callee,
const CallArgList &Args) const {}		const CallArgList &Args) const {}

/// Determines the size of struct _Unwind_Exception on this platform,		/// Determines the size of struct _Unwind_Exception on this platform,
▲ Show 20 Lines • Show All 298 Lines • Show Last 20 Lines

clang/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 27 Lines
#include "llvm/ADT/StringSwitch.h"		#include "llvm/ADT/StringSwitch.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/IntrinsicsNVPTX.h"		#include "llvm/IR/IntrinsicsNVPTX.h"
#include "llvm/IR/IntrinsicsS390.h"		#include "llvm/IR/IntrinsicsS390.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
		#include "llvm/Support/TargetParser.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <algorithm>		#include <algorithm>

using namespace clang;		using namespace clang;
using namespace CodeGen;		using namespace CodeGen;

// Helper for coercing an aggregate argument or return value into an integer		// Helper for coercing an aggregate argument or return value into an integer
// array of the same size (including padding) and alignment. This alternate		// array of the same size (including padding) and alignment. This alternate
▲ Show 20 Lines • Show All 9,238 Lines • ▼ Show 20 Lines
class AMDGPUTargetCodeGenInfo : public TargetCodeGenInfo {		class AMDGPUTargetCodeGenInfo : public TargetCodeGenInfo {
public:		public:
AMDGPUTargetCodeGenInfo(CodeGenTypes &CGT)		AMDGPUTargetCodeGenInfo(CodeGenTypes &CGT)
: TargetCodeGenInfo(std::make_unique<AMDGPUABIInfo>(CGT)) {}		: TargetCodeGenInfo(std::make_unique<AMDGPUABIInfo>(CGT)) {}

void setFunctionDeclAttributes(const FunctionDecl FD, llvm::Function F,		void setFunctionDeclAttributes(const FunctionDecl FD, llvm::Function F,
CodeGenModule &CGM) const;		CodeGenModule &CGM) const;

		void emitTargetGlobals(CodeGen::CodeGenModule &CGM) const override;

void setTargetAttributes(const Decl D, llvm::GlobalValue GV,		void setTargetAttributes(const Decl D, llvm::GlobalValue GV,
CodeGen::CodeGenModule &M) const override;		CodeGen::CodeGenModule &M) const override;
unsigned getOpenCLKernelCallingConv() const override;		unsigned getOpenCLKernelCallingConv() const override;

llvm::Constant *getNullPointer(const CodeGen::CodeGenModule &CGM,		llvm::Constant *getNullPointer(const CodeGen::CodeGenModule &CGM,
llvm::PointerType *T, QualType QT) const override;		llvm::PointerType *T, QualType QT) const override;

LangAS getASTAllocaAddressSpace() const override {		LangAS getASTAllocaAddressSpace() const override {
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
if (const auto *Attr = FD->getAttr<AMDGPUNumVGPRAttr>()) {		if (const auto *Attr = FD->getAttr<AMDGPUNumVGPRAttr>()) {
uint32_t NumVGPR = Attr->getNumVGPR();		uint32_t NumVGPR = Attr->getNumVGPR();

if (NumVGPR != 0)		if (NumVGPR != 0)
F->addFnAttr("amdgpu-num-vgpr", llvm::utostr(NumVGPR));		F->addFnAttr("amdgpu-num-vgpr", llvm::utostr(NumVGPR));
}		}
}		}

		/// Emits control constants used to change per-architecture behaviour in the
		/// AMDGPU ROCm device libraries.
		void AMDGPUTargetCodeGenInfo::emitTargetGlobals(
		CodeGen::CodeGenModule &CGM) const {
		if (!CGM.getTriple().isAMDGCN())
		return;
		StringRef CPU = CGM.getTarget().getTargetOpts().CPU;
		llvm::AMDGPU::GPUKind Kind = llvm::AMDGPU::parseArchAMDGCN(CPU);
		unsigned Features = llvm::AMDGPU::getArchAttrAMDGCN(Kind);
		if (Kind == llvm::AMDGPU::GK_NONE)
		return;

		unsigned Minor;
		unsigned Major;
		StringRef Identifier = CPU.drop_while([](char C) { return !isDigit(C); });
		if (Identifier.take_back(2).getAsInteger(16, Minor) \|\|
		Identifier.drop_back(2).getAsInteger(10, Major))
		return;

		auto AddGlobal = [&](StringRef Name, unsigned Value, unsigned Size,
		llvm::GlobalValue::LinkageTypes Linkage =
		llvm::GlobalValue::LinkOnceAnyLinkage) {
		if (CGM.getModule().getNamedGlobal(Name))
		return;

		auto *Type =
		llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), Size);
		auto *GV = new llvm::GlobalVariable(
		yaxunlUnsubmitted Not Done Reply Inline Actions This does not support per-TU control variables. Probably should use internal linkage. yaxunl: This does not support per-TU control variables. Probably should use internal linkage.
		jhuber6AuthorUnsubmitted Done Reply Inline Actions The AMDGPU device libraries use `linkone_odr` so I figured it was the most appropriate here. It should mean that we can have multiple identical definitions and they don't clash. There's also no requirement for these to be emitted as symbols AFAIK. jhuber6: The AMDGPU device libraries use `linkone_odr` so I figured it was the most appropriate here. It…
		yaxunlUnsubmitted Not Done Reply Inline Actions The AMDGPU device libraries use `linkone_odr` so I figured it was the most appropriate here. It should mean that we can have multiple identical definitions and they don't clash. There's also no requirement for these to be emitted as symbols AFAIK. clang uses -mlink-builtin-bitcode to link these device libraries for HIP and OpenCL. Then the linkage of these variables becomes internal linkage. That's why it works for per-TU control. yaxunl: > The AMDGPU device libraries use `linkone_odr` so I figured it was the most appropriate here.
		yaxunlUnsubmitted Not Done Reply Inline Actions The AMDGPU device libraries use `linkone_odr` so I figured it was the most appropriate here. It should mean that we can have multiple identical definitions and they don't clash. There's also no requirement for these to be emitted as symbols AFAIK. clang uses -mlink-builtin-bitcode to link these device libraries for HIP and OpenCL. Then the linkage of these variables becomes internal linkage. That's why it works for per-TU control. You may let HIP and OpenCL use internal linkage and C/C++/OpenMP use linkonce_odr since only HIP and OpenCL toolchain use -mlink-builtin-bitcode to link these device libraries yaxunl: > > The AMDGPU device libraries use `linkone_odr` so I figured it was the most appropriate here.
		jhuber6AuthorUnsubmitted Done Reply Inline Actions I see, `linkonce_odr` implies that these should all have the same value which isn't necessarily true after linking. I'll change it to use private linkage. OpenMP right now links everything late which means that we don't allow these to be defined differently per-TU. This may be incorrect given this new method as each TU will have different things set. I can change OpenMP to use the `mlink` method after this patch which may be more strictly correct. jhuber6: I see, `linkonce_odr` implies that these should all have the same value which isn't necessarily…
		yaxunlUnsubmitted Not Done Reply Inline Actions I see, `linkonce_odr` implies that these should all have the same value which isn't necessarily true after linking. I'll change it to use private linkage. OpenMP right now links everything late which means that we don't allow these to be defined differently per-TU. This may be incorrect given this new method as each TU will have different things set. I can change OpenMP to use the `mlink` method after this patch which may be more strictly correct. On second thoughts, the idea for letting clang to emit these control variables might not work for HIP and OpenCL. The reason is that to support per-TU control variables, these variables need to be internal or private linkage, however, that means they cannot be used by other device library functions which are expecting non-internal linkage for them. Those device library functions will end up using control variables from device library bitcode any way. For OpenMP, it may be necessary to emit them as linkonce_odr, otherwise device library functions may not find them. yaxunl: > I see, `linkonce_odr` implies that these should all have the same value which isn't…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions On second thoughts, the idea for letting clang to emit these control variables might not work for HIP and OpenCL. The reason is that to support per-TU control variables, these variables need to be internal or private linkage, however, that means they cannot be used by other device library functions which are expecting non-internal linkage for them. Those device library functions will end up using control variables from device library bitcode any way. Right now we include each file per-TU using `-mlink-builtin-bitcode` which converts `linkonce_odr` to `private` linkage. Shouldn't this be equivalent? It may be possible to make some test showing a user of these constants to verify they get picked up correctly. If you're worried about these getting removed we may be able to stash them in `compiler.used`, that shouldn't impede the necessary constant propagation. Side note, OpenCL seems to optimize these out without `-disable-llvm-optzns` while HIP will not. Does OpenCL use some mandatory passes to ensure that these control variables get handled? This method of using control constants in general is somewhat problematic as it hides invalid code behind some mandatory CP and DCE passes. For OpenMP right now we just generate one version for each architecture, which is wasteful but somewhat easier to work with. jhuber6: > On second thoughts, the idea for letting clang to emit these control variables might not work…
		yaxunlUnsubmitted Not Done Reply Inline Actions On second thoughts, the idea for letting clang to emit these control variables might not work for HIP and OpenCL. The reason is that to support per-TU control variables, these variables need to be internal or private linkage, however, that means they cannot be used by other device library functions which are expecting non-internal linkage for them. Those device library functions will end up using control variables from device library bitcode any way. Right now we include each file per-TU using `-mlink-builtin-bitcode` which converts `linkonce_odr` to `private` linkage. Shouldn't this be equivalent? It may be possible to make some test showing a user of these constants to verify they get picked up correctly. If you're worried about these getting removed we may be able to stash them in `compiler.used`, that shouldn't impede the necessary constant propagation. Let's assume the main program calls `foo()` and `foo()` uses a control variable `bar`. `foo()` is in a bitcode linked in by -mlink-builtin-bitcode. clang emits the control variable `bar` with private linkage in the main module. When clang tries to link `foo()`, it needs to resolve `bar`, but it cannot use the `bar` in the main module because `bar` has private linkage. Then `bar` becomes unresolved. Side note, OpenCL seems to optimize these out without `-disable-llvm-optzns` while HIP will not. Does OpenCL use some mandatory passes to ensure that these control variables get handled? This method of using control constants in general is somewhat problematic as it hides invalid code behind some mandatory CP and DCE passes. For OpenMP right now we just generate one version for each architecture, which is wasteful but somewhat easier to work with. Are you using clang -cc1 without other options? There are LLVM passes by default, but they should not depend on language. You can see which pass is removing them by -mllvm -print-after-all. yaxunl: > > On second thoughts, the idea for letting clang to emit these control variables might not…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions Let's assume the main program calls foo() and foo() uses a control variable bar. foo() is in a bitcode linked in by -mlink-builtin-bitcode. clang emits the control variable bar with private linkage in the main module. When clang tries to link foo(), it needs to resolve bar, but it cannot use the bar in the main module because bar has private linkage. Then bar becomes unresolved. This is a good point, we link only used definitions when using `-mlink-builtin-bitcode`. I think we link via `-mlink-builtin-bitcode` prior to running the backend, so this will be after we create these definitions. In this case we will import a definition like the following: @__oclc_wavefrontsize64 = external local_unnamed_addr addrspace(4) constant i8, align 1 which has the same name, but cannot bind to the `private` variable. I think this is what `linkonce` linkage is supposed to provide, but I'm not overly familiar with the semantics. Are you using clang -cc1 without other options? There are LLVM passes by default, but they should not depend on language. You can see which pass is removing them by -mllvm -print-after-all. The sample tests in this patch show the `-x hip` version does not require `-disable-llvm-optzns` while the `-x cl` version does. jhuber6: >Let's assume the main program calls foo() and foo() uses a control variable bar. foo() is in a…
		CGM.getModule(), Type, true, Linkage,
		arsenmUnsubmitted Not Done Reply Inline Actions Should use the address space enum arsenm: Should use the address space enum
		llvm::ConstantInt::get(Type, Value), Name, nullptr,
		llvm::GlobalValue::ThreadLocalMode::NotThreadLocal,
		CGM.getContext().getTargetAddressSpace(LangAS::opencl_constant));
		GV->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::Local);
		GV->setVisibility(llvm::GlobalValue::VisibilityTypes::HiddenVisibility);
		GV->setAlignment(CGM.getDataLayout().getABITypeAlign(Type));
		};
		arsenmUnsubmitted Not Done Reply Inline Actions Typo At arsenm: Typo At

		// The wavefront size is 64 if defined by the target or explicitly specified
		// by the user.
		bool Wavefront64 =
		!(Features & llvm::AMDGPU::FEATURE_WAVE32) \|\|
		llvm::is_contained(CGM.getTarget().getTargetOpts().FeaturesAsWritten,
		arsenmUnsubmitted Not Done Reply Inline Actions Do we really have to scan through the features too? This seems broken arsenm: Do we really have to scan through the features too? This seems broken
		jhuber6AuthorUnsubmitted Done Reply Inline Actions @yaxunl wanted this so we didn't emit the global if the user manually overrode the features via `-Xclang` or similar. jhuber6: @yaxunl wanted this so we didn't emit the global if the user manually overrode the features via…
		"+wavefrontsize64");

		// Different math flags set by the current floating point contract.
		bool RelaxedMath = CGM.getLangOpts().FastMath;
		bool UnsafeMath = CGM.getLangOpts().UnsafeFPMath;
		arsenmUnsubmitted Done Reply Inline Actions s/DenormAtZero/DenormAreZero/? arsenm: s/DenormAtZero/DenormAreZero/?
		bool DenormAreZero = CGM.getCodeGenOpts().FP32DenormalMode ==
		llvm::DenormalMode::getPreserveSign();
		bool FiniteOnly =
		arsenmUnsubmitted Done Reply Inline Actions or doesn't look right. finite only is no infinities and no nans (not sure why the library control merges the two) arsenm: or doesn't look right. finite only is no infinities and no nans (not sure why the library…
		CGM.getLangOpts().NoHonorInfs && CGM.getLangOpts().NoHonorNaNs;

		// Set correct square root rounding depending on the target lanauge.
		yaxunlUnsubmitted Not Done Reply Inline Actions should be determined by the code object version option. yaxunl: should be determined by the code object version option.
		jhuber6AuthorUnsubmitted Done Reply Inline Actions Yes I wasn't sure about this one. Could you elaborate where we derive that? jhuber6: Yes I wasn't sure about this one. Could you elaborate where we derive that?
		yaxunlUnsubmitted Not Done Reply Inline Actions Yes I wasn't sure about this one. Could you elaborate where we derive that? CGM.getTarget().getTargetOpts().CodeObjectVersion yaxunl: > Yes I wasn't sure about this one. Could you elaborate where we derive that? CGM.getTarget…
		bool CorrectSqrt = CGM.getLangOpts().OpenCL
		? CGM.getCodeGenOpts().OpenCLCorrectlyRoundedDivSqrt
		: CGM.getCodeGenOpts().HIPCorrectlyRoundedDivSqrt;
		arsenmUnsubmitted Not Done Reply Inline Actions Can we move this into something more proper in LangOpts? arsenm: Can we move this into something more proper in LangOpts?

		// Control constants for math operations.
		AddGlobal("__oclc_daz_opt", DenormAreZero, /Size=/8);
		AddGlobal("__oclc_finite_only_opt", FiniteOnly \|\| RelaxedMath, /Size=/8);
		arsenmUnsubmitted Not Done Reply Inline Actions wavefrontsize belongs with the system ones arsenm: wavefrontsize belongs with the system ones
		yaxunlUnsubmitted Not Done Reply Inline Actions wavefrontsize belongs with the system ones You are right. `__oclc_wavefrontsize64` should always be emitted with linkonce_odr linkage since they need to be consistent among TU's. Therefore they should always be emitted. `__oclc_daz_opt`, `__oclc_finite_only_opt`, `__oclc_unsafe_math_opt`, and `__oclc_correctly_rounded_sqrt32` can be different per TU, therefore they should not be emitted for HIP `-fgpu-rdc` and OpenCL. yaxunl: > wavefrontsize belongs with the system ones You are right. `__oclc_wavefrontsize64` should…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions I'm still unsure, if we do not emit any of those control variables how will we use the device libraries for those builds. jhuber6: I'm still unsure, if we do not emit any of those control variables how will we use the device…
		yaxunlUnsubmitted Not Done Reply Inline Actions I'm still unsure, if we do not emit any of those control variables how will we use the device libraries for those builds. In those cases, we will use -mlink-builtin-bitcode to get those variables from device libs, as we did before. They will have internal linkage after linking, therefore are per-TU. yaxunl: > I'm still unsure, if we do not emit any of those control variables how will we use the device…
		arsenmUnsubmitted Not Done Reply Inline Actions I'd hope you don't have to check relaxed math, finite only should suffice arsenm: I'd hope you don't have to check relaxed math, finite only should suffice
		AddGlobal("__oclc_unsafe_math_opt", UnsafeMath \|\| RelaxedMath, /Size=/8);
		AddGlobal("__oclc_correctly_rounded_sqrt32", CorrectSqrt, /Size=/8);

		// Control constants for the system.
		yaxunlUnsubmitted Not Done Reply Inline Actions we need to disable emitting these variables for HIP -fgpu-rdc mode and OpenCL since they will break per-TU control variable. Other cases are OK. yaxunl: we need to disable emitting these variables for HIP -fgpu-rdc mode and OpenCL since they will…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions But the code would still depend on these and they wouldn't be present right jhuber6: But the code would still depend on these and they wouldn't be present right
		AddGlobal("__oclc_wavefrontsize64", Wavefront64, /Size=/8,
		llvm::GlobalValue::LinkOnceODRLinkage);
		AddGlobal("__oclc_ISA_version", Minor + Major * 1000, /Size=/32,
		arsenmUnsubmitted Done Reply Inline Actions These probably should use linkonce_odr arsenm: These probably should use linkonce_odr
		llvm::GlobalValue::LinkOnceODRLinkage);
		AddGlobal("__oclc_ABI_version",
		arsenmUnsubmitted Not Done Reply Inline Actions This should probably get an __llvm_amdgcn prefix and be renamed arsenm: This should probably get an __llvm_amdgcn prefix and be renamed
		CGM.getTarget().getTargetOpts().CodeObjectVersion, /Size=/32,
		llvm::GlobalValue::LinkOnceODRLinkage);
		}

void AMDGPUTargetCodeGenInfo::setTargetAttributes(		void AMDGPUTargetCodeGenInfo::setTargetAttributes(
const Decl D, llvm::GlobalValue GV, CodeGen::CodeGenModule &M) const {		const Decl D, llvm::GlobalValue GV, CodeGen::CodeGenModule &M) const {
if (requiresAMDGPUProtectedVisibility(D, GV)) {		if (requiresAMDGPUProtectedVisibility(D, GV)) {
GV->setVisibility(llvm::GlobalValue::ProtectedVisibility);		GV->setVisibility(llvm::GlobalValue::ProtectedVisibility);
GV->setDSOLocal(true);		GV->setDSOLocal(true);
}		}

if (GV->isDeclaration())		if (GV->isDeclaration())
▲ Show 20 Lines • Show All 2,965 Lines • Show Last 20 Lines

clang/test/CodeGen/amdgcn-control-constants.c

This file was added.

				// Check that we generate all the expected default features for the target.
				// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -S -emit-llvm -o - %s \| FileCheck %s --check-prefix=GFX90A
				// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -S -emit-llvm -o - %s \| FileCheck %s --check-prefix=GFX1030

				// GFX90A: @__oclc_daz_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
				// GFX90A: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
				// GFX90A: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
				// GFX90A: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
				yaxunlUnsubmitted Not Done Reply Inline Actions need a test for -target-cpu gfx1030 -target-feature +wavefrontsize64 and check __oclc_wavefrontsize64 to be 1. yaxunl: need a test for -target-cpu gfx1030 -target-feature +wavefrontsize64 and check…
				// GFX90A: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1
				yaxunlUnsubmitted Not Done Reply Inline Actions need an OpenCL test for -cl-denorms-are-zero yaxunl: need an OpenCL test for -cl-denorms-are-zero
				yaxunlUnsubmitted Not Done Reply Inline Actions still missing this test, and some other tests for -cl-* options as commented below. Also, missing a HIP test for -ffast-math yaxunl: still missing this test, and some other tests for -cl-* options as commented below. Also…
				jhuber6AuthorUnsubmitted Done Reply Inline Actions The cc1 math options tested individually should be enabled by `-ffast-math`. jhuber6: The cc1 math options tested individually should be enabled by `-ffast-math`.
				yaxunlUnsubmitted Not Done Reply Inline Actions Since we cannot test -ffast-math directly, can we add a driver test to ensure we are not missing any -cc1 options needed by the control variables when -ffast-math is specified for the driver? Thanks. Also, the -cl-* options are -cc1 options. We need to test them. yaxunl: Since we cannot test -ffast-math directly, can we add a driver test to ensure we are not…
				// GFX90A: @__oclc_ISA_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 9010
				// GFX90A: @__oclc_ABI_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 400
				yaxunlUnsubmitted Not Done Reply Inline Actions need OpenCL tests for -cl-finite-math-only and -cl-fast-relaxed-math yaxunl: need OpenCL tests for -cl-finite-math-only and -cl-fast-relaxed-math

				yaxunlUnsubmitted Not Done Reply Inline Actions need OpenCL tests for -cl-unsafe-math-optimizations and -cl-fast-relaxed-math yaxunl: need OpenCL tests for -cl-unsafe-math-optimizations and -cl-fast-relaxed-math
				// GFX1030: @__oclc_daz_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
				yaxunlUnsubmitted Not Done Reply Inline Actions need an OpenCL test for -cl-fp32-correctly-rounded-divide-sqrt. If it needs CodeGenOpt you may need to re-use the option for HIP. yaxunl: need an OpenCL test for -cl-fp32-correctly-rounded-divide-sqrt. If it needs CodeGenOpt you may…
				// GFX1030: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
				// GFX1030: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
				// GFX1030: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
				// GFX1030: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 0
				// GFX1030: @__oclc_ISA_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 10048
				// GFX1030: @__oclc_ABI_version = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i32 400

				// Check that we can override the wavefront features.
				// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx1030 -target-feature +wavefrontsize64 \
				// RUN: -S -emit-llvm -o - %s \| FileCheck %s --check-prefix=WAVEFRONT
				// WAVEFRONT: @__oclc_wavefrontsize64 = linkonce_odr hidden local_unnamed_addr addrspace(4) constant i8 1

				// Check that we can enable denormalization at zero.
				// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -fdenormal-fp-math-f32=preserve-sign,preserve-sign \
				// RUN: -S -emit-llvm -o - %s \| FileCheck %s --check-prefix=DENORM-AT-ZERO
				// DENORM-AT-ZERO: @__oclc_daz_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1

				// Check that we can enable finite math.
				// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -ffinite-math-only \
				// RUN: -S -emit-llvm -o - %s \| FileCheck %s --check-prefix=FINITE-MATH
				// FINITE-MATH: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1
				// FINITE-MATH: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0

				// Check that we can enable unsafe math.
				// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -menable-unsafe-fp-math \
				// RUN: -S -emit-llvm -o - %s \| FileCheck %s --check-prefix=UNSAFE-MATH
				// UNSAFE-MATH: @__oclc_finite_only_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
				// UNSAFE-MATH: @__oclc_unsafe_math_opt = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1

				// Check that we can disable/enable correctly rounded square roots.
				// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -fno-hip-fp32-correctly-rounded-divide-sqrt \
				// RUN: -S -emit-llvm -o - %s \| FileCheck %s --check-prefix=CORRECT-SQRT
				// CORRECT-SQRT: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 0
				// RUN: %clang_cc1 -x cl -triple amdgcn-amd-amdhsa -target-cpu gfx90a -cl-fp32-correctly-rounded-divide-sqrt \
				// RUN: -disable-llvm-optzns -S -emit-llvm -o - %s \| FileCheck %s --check-prefix=CL-CORRECT-SQRT
				// CL-CORRECT-SQRT: @__oclc_correctly_rounded_sqrt32 = linkonce hidden local_unnamed_addr addrspace(4) constant i8 1

clang/test/CodeGen/amdgcn-link-control-constants.c

This file was added.

				// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --check-globals --include-generated-funcs --global-value-regex "__oclc_daz_opt"
				// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -emit-llvm-bc -o %t.bc -DLIBRARY %s
				// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx90a -mlink-builtin-bitcode %t.bc -S -emit-llvm -o - %s \| FileCheck %s
				yaxunlUnsubmitted Not Done Reply Inline Actions This is compiling HIP as host. Please add -fcuda-is-device. yaxunl: This is compiling HIP as host. Please add -fcuda-is-device.
				jhuber6AuthorUnsubmitted Done Reply Inline Actions This test should only require that the triple is `amdgcn`. I could potentially make the generation of the constants require HIP or OpenMPDevice, or OpenCL is enabled if you think that's bad. jhuber6: This test should only require that the triple is `amdgcn`. I could potentially make the…
				jhuber6AuthorUnsubmitted Done Reply Inline Actions I can also change it to just `-x c` if the HIP is the problem. jhuber6: I can also change it to just `-x c` if the HIP is the problem.
				yaxunlUnsubmitted Not Done Reply Inline Actions We don't officially support C on amdgcn but we officially support HIP. I would suggest move this to CodeGenCUDA and compile it as HIP, and use HIP syntax. yaxunl: We don't officially support C on amdgcn but we officially support HIP. I would suggest move…
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions We probably want these magic constants for C++ code as well, so keying it off the triple (at least triple + that we're using rocm / compute stuff, which I think is adequately indicated by hsa in the triple) is better. And likewise don't want to emit these constants for non-gpu code, e.g. x64 host hip doesn't need the daz_opt constant, which also suggests triple is the right hook. JonChesterfield: We probably want these magic constants for C++ code as well, so keying it off the triple (at…

				#ifdef LIBRARY

				extern unsigned char [[clang::address_space(5)]] __oclc_daz_opt;
				yaxunlUnsubmitted Not Done Reply Inline Actions use `__constant__` instead yaxunl: use `__constant__` instead

				int foo(void) {
				yaxunlUnsubmitted Not Done Reply Inline Actions add `__device__` yaxunl: add `__device__`
				return __oclc_daz_opt ? 1 : 0;
				}

				#else

				extern int foo(void);
				yaxunlUnsubmitted Not Done Reply Inline Actions add `__device__` yaxunl: add `__device__`

				void bar(void) {
				yaxunlUnsubmitted Not Done Reply Inline Actions add `__device__` yaxunl: add `__device__`
				foo();
				}

				#endif
				//.
				// CHECK: @__oclc_daz_opt = internal local_unnamed_addr addrspace(4) constant i8 0, align 1
				//.
				// CHECK-LABEL: define {{[^@]+}}@_Z3barv
				// CHECK-SAME: () #[[ATTR0:[0-9]+]] {
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[CALL:%.*]] = call noundef i32 @_Z3foov()
				// CHECK-NEXT: ret void
				//
				//
				// CHECK-LABEL: define {{[^@]+}}@_Z3foov
				// CHECK-SAME: () #[[ATTR0]] {
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
				// CHECK-NEXT: [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
				// CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr addrspace(5) addrspacecast (ptr addrspace(4) @__oclc_daz_opt to ptr addrspace(5)), align 1
				// CHECK-NEXT: [[TOBOOL:%.*]] = icmp ne i8 [[TMP0]], 0
				// CHECK-NEXT: [[TMP1:%.*]] = zext i1 [[TOBOOL]] to i64
				// CHECK-NEXT: [[COND:%.*]] = select i1 [[TOBOOL]], i32 1, i32 0
				// CHECK-NEXT: ret i32 [[COND]]
				//