-mcode-object-version=none is a special argument which allows
abi-agnostic code to be generated for device runtime libraries.
Details
- Reviewers
jhuber6 yaxunl JonChesterfield
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/include/clang/Basic/TargetOptions.h | ||
---|---|---|
90 | Typically we just put a COV_LAST to indicate that it's over the accepted enumerations. | |
clang/lib/Driver/ToolChain.cpp | ||
1364 | Is this flag not in the m group? It should be caught here right? | |
clang/lib/Driver/ToolChains/Clang.cpp | ||
1058 | Use clang-format. | |
1066 | You shouldn't assign to a Twine, but in general I think we should probably put this ternary in-line with the other stuff to avoid the temporary. The handling here is a little confusing, we do Args.getLastArg(options::OPT_mcode_object_version_EQ); Which expects a number, if it's not present we get an empty string which default converts to zero which we then convert into "none"? |
-mcode-object-version=none was intentionally designed to work with clang -cc1 only, since it does not work with clang driver if users link with device library. Device library can still use it by using it with -Xclang.
Thanks for the tip @yaxunl . I will abandon this revision and use Xclang for passing cov_none to devicertl.
So... That should be the default, right? Emit IR that the back end specialises. Or, ideally, the only behaviour as far as the front end is concerned.
Code in the device library depends on a control variable about the code object version. Specifying the code object version in Clang allows internalizing that variable and optimizing code depending on it as early as possible. Not specifying it with Clang will require an LLVM pass in amdgpu backend to define that control variable after linking and it has to have an external linkage. This may lose optimization. Also, you need a way to not specify it in FE but specify it in BE. This just complicates things without much benefits.
On second thoughts, this may inspire us about eliminating not just the code object control variable but all device library control variables.
Basically in Clang we can emit a module flag about required control variables and do not link the device libs that implement these control variables.
Then we add an LLVM pass at the very beginning of the optimization pipeline which checks that module flag and defines those control variables with internal linkage. This way, we should be able to get rid of those control variable device libs without losing performance.
Or, the front end could define those objects directly, without importing IR files that define the objects with the content clang used to choose the object file. E.g. instead of the argument daz=off (spelled differently) finding a file called daz.off.ll that defines variable called daz with a value 0, that argument could define that variable. I think @jhuber6 has a partial patch trying to do that.
If we were more ambitious, we could use intrinsics that are folded reliably at O0 instead of magic variables that hopefully get constant folded. That would kill a bunch of O0 bugs.
In general though, splicing magic variables in the front end seems unlikely to be performance critical relative to splicing them in at the start of the backend.
Some control variables are per-module. Clang cannot emit control variables that have different values for different modules. Intrinsics should work since it can take an argument as its value.
Tangent below. Main thrust is this =none feature should be called "default", not none, and be the default, and there should be no feature called ABI=none.
Sure it can. In the most limited case, we could exactly replace the variables imported from source IR in the device libs based on command line variables with the same globals with the same value and attributes. That would be NFC except it would no longer matter if the devicertl source was present.
There's just also other better things to be done as well, like not burning ABI decisions in the front end, or not using magic variables at all. But it's definitely not necessary to write constants in IR files that encode the same information as the command line flag. That's a whole indirect no-op that doesn't need to be there.
I think @saiislam is working on a patch that will handle that. We'll have clang emit some global that OpenMP uses.
Thanks Joseph.
Yes, I have abandoned this patch and using -Xclang -mcode-object-version=none option in the patch to enable cov5 support for OpenMP.
Typically we just put a COV_LAST to indicate that it's over the accepted enumerations.