This is an archive of the discontinued LLVM Phabricator instance.

[Clang][AMDGPU] Fix handling of -mcode-object-version=none arg
AbandonedPublic

Authored by saiislam on Aug 2 2023, 11:30 AM.

Details

Summary

-mcode-object-version=none is a special argument which allows
abi-agnostic code to be generated for device runtime libraries.

Diff Detail

Event Timeline

saiislam created this revision.Aug 2 2023, 11:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2023, 11:30 AM
saiislam requested review of this revision.Aug 2 2023, 11:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2023, 11:30 AM
arsenm added a subscriber: arsenm.Aug 2 2023, 11:33 AM

missing tests

clang/lib/Driver/ToolChains/Clang.cpp
1066

don't need to go through std::string? stick with Twine everywhere?

clang/lib/Driver/ToolChains/CommonArgs.cpp
2309

missing space after if, also return on separate line. Also why starts with, and not ==?

jhuber6 added inline comments.Aug 2 2023, 11:46 AM
clang/include/clang/Basic/TargetOptions.h
90

Typically we just put a COV_LAST to indicate that it's over the accepted enumerations.

clang/lib/Driver/ToolChain.cpp
1364

Is this flag not in the m group? It should be caught here right?

clang/lib/Driver/ToolChains/Clang.cpp
1058

Use clang-format.

1066

You shouldn't assign to a Twine, but in general I think we should probably put this ternary in-line with the other stuff to avoid the temporary.

The handling here is a little confusing, we do

Args.getLastArg(options::OPT_mcode_object_version_EQ);

Which expects a number, if it's not present we get an empty string which default converts to zero which we then convert into "none"?

-mcode-object-version=none was intentionally designed to work with clang -cc1 only, since it does not work with clang driver if users link with device library. Device library can still use it by using it with -Xclang.

-mcode-object-version=none was intentionally designed to work with clang -cc1 only, since it does not work with clang driver if users link with device library. Device library can still use it by using it with -Xclang.

If the intended use is the deviceRTL then that should be sufficient.

saiislam marked 2 inline comments as done.Aug 4 2023, 11:44 AM

-mcode-object-version=none was intentionally designed to work with clang -cc1 only, since it does not work with clang driver if users link with device library. Device library can still use it by using it with -Xclang.

Thanks for the tip @yaxunl . I will abandon this revision and use Xclang for passing cov_none to devicertl.

saiislam abandoned this revision.Aug 4 2023, 11:45 AM

What does code objects version= none mean?

arsenm added a comment.Aug 4 2023, 2:05 PM

What does code objects version= none mean?

Handle any version

What does code objects version= none mean?

Handle any version

So... That should be the default, right? Emit IR that the back end specialises. Or, ideally, the only behaviour as far as the front end is concerned.

yaxunl added a comment.Aug 4 2023, 2:51 PM

What does code objects version= none mean?

Handle any version

So... That should be the default, right? Emit IR that the back end specialises. Or, ideally, the only behaviour as far as the front end is concerned.

Code in the device library depends on a control variable about the code object version. Specifying the code object version in Clang allows internalizing that variable and optimizing code depending on it as early as possible. Not specifying it with Clang will require an LLVM pass in amdgpu backend to define that control variable after linking and it has to have an external linkage. This may lose optimization. Also, you need a way to not specify it in FE but specify it in BE. This just complicates things without much benefits.

yaxunl added a comment.Aug 4 2023, 3:15 PM

What does code objects version= none mean?

Handle any version

So... That should be the default, right? Emit IR that the back end specialises. Or, ideally, the only behaviour as far as the front end is concerned.

Code in the device library depends on a control variable about the code object version. Specifying the code object version in Clang allows internalizing that variable and optimizing code depending on it as early as possible. Not specifying it with Clang will require an LLVM pass in amdgpu backend to define that control variable after linking and it has to have an external linkage. This may lose optimization. Also, you need a way to not specify it in FE but specify it in BE. This just complicates things without much benefits.

On second thoughts, this may inspire us about eliminating not just the code object control variable but all device library control variables.

Basically in Clang we can emit a module flag about required control variables and do not link the device libs that implement these control variables.

Then we add an LLVM pass at the very beginning of the optimization pipeline which checks that module flag and defines those control variables with internal linkage. This way, we should be able to get rid of those control variable device libs without losing performance.

Or, the front end could define those objects directly, without importing IR files that define the objects with the content clang used to choose the object file. E.g. instead of the argument daz=off (spelled differently) finding a file called daz.off.ll that defines variable called daz with a value 0, that argument could define that variable. I think @jhuber6 has a partial patch trying to do that.

If we were more ambitious, we could use intrinsics that are folded reliably at O0 instead of magic variables that hopefully get constant folded. That would kill a bunch of O0 bugs.

In general though, splicing magic variables in the front end seems unlikely to be performance critical relative to splicing them in at the start of the backend.

yaxunl added a comment.Aug 4 2023, 5:07 PM

Or, the front end could define those objects directly, without importing IR files that define the objects with the content clang used to choose the object file. E.g. instead of the argument daz=off (spelled differently) finding a file called daz.off.ll that defines variable called daz with a value 0, that argument could define that variable. I think @jhuber6 has a partial patch trying to do that.

If we were more ambitious, we could use intrinsics that are folded reliably at O0 instead of magic variables that hopefully get constant folded. That would kill a bunch of O0 bugs.

In general though, splicing magic variables in the front end seems unlikely to be performance critical relative to splicing them in at the start of the backend.

Some control variables are per-module. Clang cannot emit control variables that have different values for different modules. Intrinsics should work since it can take an argument as its value.

JonChesterfield added a comment.EditedAug 4 2023, 5:45 PM

Tangent below. Main thrust is this =none feature should be called "default", not none, and be the default, and there should be no feature called ABI=none.

Some control variables are per-module. Clang cannot emit control variables that have different values for different modules. Intrinsics should work since it can take an argument as its value.

Sure it can. In the most limited case, we could exactly replace the variables imported from source IR in the device libs based on command line variables with the same globals with the same value and attributes. That would be NFC except it would no longer matter if the devicertl source was present.

There's just also other better things to be done as well, like not burning ABI decisions in the front end, or not using magic variables at all. But it's definitely not necessary to write constants in IR files that encode the same information as the command line flag. That's a whole indirect no-op that doesn't need to be there.

Or, the front end could define those objects directly, without importing IR files that define the objects with the content clang used to choose the object file. E.g. instead of the argument daz=off (spelled differently) finding a file called daz.off.ll that defines variable called daz with a value 0, that argument could define that variable. I think @jhuber6 has a partial patch trying to do that.

If we were more ambitious, we could use intrinsics that are folded reliably at O0 instead of magic variables that hopefully get constant folded. That would kill a bunch of O0 bugs.

In general though, splicing magic variables in the front end seems unlikely to be performance critical relative to splicing them in at the start of the backend.

I think @saiislam is working on a patch that will handle that. We'll have clang emit some global that OpenMP uses.

Or, the front end could define those objects directly, without importing IR files that define the objects with the content clang used to choose the object file. E.g. instead of the argument daz=off (spelled differently) finding a file called daz.off.ll that defines variable called daz with a value 0, that argument could define that variable. I think @jhuber6 has a partial patch trying to do that.

If we were more ambitious, we could use intrinsics that are folded reliably at O0 instead of magic variables that hopefully get constant folded. That would kill a bunch of O0 bugs.

In general though, splicing magic variables in the front end seems unlikely to be performance critical relative to splicing them in at the start of the backend.

I think @saiislam is working on a patch that will handle that. We'll have clang emit some global that OpenMP uses.

Thanks Joseph.
Yes, I have abandoned this patch and using -Xclang -mcode-object-version=none option in the patch to enable cov5 support for OpenMP.