Byval requires allocating additional stack space, and always requires an implicit copy to be inserted in codegen,
where it can be difficult to optimize.  In this work, we use byref/IndirectAliased promotion method instead of
byval with the implicit copy semantics.
Details
- Reviewers
- arsenm - bcahoon - jdoerfert 
- Group Reviewers
- Restricted Project 
- Commits
- rGd77c62053c94: [clang][AMDGPU]: Don't use byval for struct arguments in function ABI
Diff Detail
Event Timeline
| clang/lib/CodeGen/Targets/AMDGPU.cpp | ||
|---|---|---|
| 253 | Why does this need the type checks? Can this just go under the isIndirect handling? | |
| clang/test/CodeGenOpenCL/addr-space-struct-arg.cl | ||
| 124 | These test checks are pretty thin, I'd like to see the memcpys in the IR. In a pre-commit, can you switch these tests to generated checks? | |
Should also get a mention in the release notes (not sure how much ABI detail we have in AMDGPUUsage too)
I would have assumed it makes more sense to have an IR pass that changes the ABI of functions with local linkage. The problem with this is that other languages might still emit byval, etc.
I think this would be good as an alternative optimization, but this wouldn't change the C ABI to avoid byval. I think we don't want byval for whenever we have an object linkable ABI
I have two points, I should have been more clear:
- This is dangerous since we change something that is tested. Interoperability might not be the biggest thing yet, but special casing clangs behavior seems a slippery slope towards rust/fortran/Julia/... code not being able to talk to C/C++/HIP/... code when both are compiled for AMDGPUs.
- We get more bang and less (potential) problems if we transform the IR whenever we can prove that we can, e.g., we see all the call sites. It should be overall better since we target a closed world, at least for the foreseeable future.
It's not special casing clang's behavior. We are defining the C ABI, a thing which currently does not exist. Other languages that want to be compatible with the C ABI have to follow, but they don't have to. This isn't a unique property, and every frontend has to do this for every target. I don't see why we need to carry the albatross of byval simply because llvm didn't well abstract ABI details.
We do not have a defined ABI. We do not have machine code linking. Calling functions from assembly requires considerable care already. We change things like which registers are live per-function.
The time will come where we are sufficiently constrained to be stuck with inefficiency in calling convention but we are not there yet. Other languages that want to link against IR emitted by clang can still do so, though they'll also need to change to the more efficient convention to do so.
You're saying this is not specializing it but it is. AMDGPU now emits different IR than NVPTX targets. Is that by itself a problem, no. Can I imagine this to be a problem down the line, yes.
That aside, I am not arguing this is by itself wrong. What I'm mostly trying to say is that there is a more generic alternative we should implement instead.
My goal is to get the same effect for all languages targeting AMDGPUs, not only the ones that go through Clang.
Anyway, I'm not blocking this. We'll can deal with the other languages and such later then.
We should optimise more aggressively within a module. There's an idea in mailing lists to deliberately split functions into local versions with a fast calling convention and externally visible shims used for address escapes. That's a good thing that we should do. It's not this thing.
This is a case where the module external functions can be quicker too. So we should do that. I hear that nvptx does the worse thing and you don't like divergence between the GPU targets. Ptx is the stable ABI there, perhaps we should fix nvptx to match in a later patch.
My goal is to get the same effect for all languages targeting AMDGPUs, not only the ones that go through Clang.
Those languages should also not use the convention in place before this patch. The existing behaviour is a design mistake. This frees them to do something better without generating shims to work around the C ABI.
It's not an alternative, it's an entirely orthogonal thing. We should have both. If we commit to a real ABI, it is not going to be limited by trying to pretend that target specific IR is portable, when that's never been the case. NVPTX can switch to match this if it wants. If we want similar IR why consolidate in the worse direction
- Move getIndirectAliased under isAggregateTypeForABI(Ty)
- update LIT addr-space-struct-arg.cl to check corresponding alloca and memcpy to the struct.
| clang/test/CodeGenOpenCL/addr-space-struct-arg.cl | ||
|---|---|---|
| 124 | Do you mean using "update_cc_test_checks.py" to generate the CHECKs? I am not sure why this does not work as expected. But I am including it as a separate file " addr-space-struct-arg-temp.cl" for reference. | |
| clang/test/CodeGenOpenCL/addr-space-struct-arg.cl | ||
|---|---|---|
| 124 | Yes. You might need to manually delete the checks that are already there | |
Does this patch cause the callee to skip making a local copy of the struct type argument? What if the callee makes changes to the argument? That is a common use case since users assume the function arguments in C/C++/HIP are passed by value.
The callee is supposed to use an explicit memcpy. The explicit memcpy can be optimized away most of the time. The current byval is an invisible copy on the caller side which is difficult to eliminate
Don't add the temporary test, pre-commit a switch of the existing test so it's easy to see the diff in the review.
add a pre-commit test, CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl, for convenience of review. This test will be added for checking upon commit.
Need to add to release notes and document in AMDGPUUsage
| clang/test/CodeGenOpenCL/byval.cl | ||
|---|---|---|
| 0–1 | The test name suggests we should test with a different target that does use byval here | |
add x86 target to check pass by value in byval.cl
remove -enable-var-scope in the pre-commit test.
TODO: document and release note
| llvm/docs/AMDGPUUsage.rst | ||
|---|---|---|
| 13812–13813 | s/to allocate memory/for allocating stack memory/ | |
| 13813 | Specify C ABI | |
| 13814 | copying the value of the struct if modified | |
| llvm/docs/AMDGPUUsage.rst | ||
|---|---|---|
| 13813 | Do not get what to do to "Specify C ABI"? Can you suggest explicitly? Thanks. | |
| llvm/docs/AMDGPUUsage.rst | ||
|---|---|---|
| 13813 | Add the letter C | |
| llvm/docs/AMDGPUUsage.rst | ||
|---|---|---|
| 13813 | ... in function C ABI? Or should we remove "function"? | |
| llvm/docs/AMDGPUUsage.rst | ||
|---|---|---|
| 13813 | Can just say C, doesn't really matter if you state function or not. function is implied | |
| clang/test/CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl | ||
|---|---|---|
| 257 | if (AI.getKind() == ABIArgInfo::Indirect) return "noundef" Should we add IndirectAlised check and include in the this same patch? Thanks | |
| clang/test/CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl | ||
|---|---|---|
| 257 | Don't understand this snippet, the attribute emission presumably comes from somewhere else | |
| clang/test/CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl | ||
|---|---|---|
| 257 | In function DetermineNoUndef. "noundef was also missing for kernel byref argument | |
| clang/test/CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl | ||
|---|---|---|
| 257 | Yes, that should probably also include indirect aliased. You should fix that in a second patch | |
Typo "in stead"