This is an archive of the discontinued LLVM Phabricator instance.

[clang][AMDGPU]: Don't use byval for struct arguments in function ABI
ClosedPublic

Authored by cfang on Jul 21 2023, 11:48 AM.

Details

Summary

Byval requires allocating additional stack space, and always requires an implicit copy to be inserted in codegen,
where it can be difficult to optimize. In this work, we use byref/IndirectAliased promotion method instead of
byval with the implicit copy semantics.

Diff Detail

Event Timeline

cfang created this revision.Jul 21 2023, 11:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2023, 11:48 AM
cfang requested review of this revision.Jul 21 2023, 11:48 AM
arsenm added inline comments.Jul 21 2023, 11:52 AM
clang/lib/CodeGen/Targets/AMDGPU.cpp
253

Why does this need the type checks? Can this just go under the isIndirect handling?

clang/test/CodeGenOpenCL/addr-space-struct-arg.cl
120

These test checks are pretty thin, I'd like to see the memcpys in the IR. In a pre-commit, can you switch these tests to generated checks?

Should also get a mention in the release notes (not sure how much ABI detail we have in AMDGPUUsage too)

arsenm added reviewers: Restricted Project, jdoerfert.Jul 21 2023, 12:07 PM

I would have assumed it makes more sense to have an IR pass that changes the ABI of functions with local linkage. The problem with this is that other languages might still emit byval, etc.

I would have assumed it makes more sense to have an IR pass that changes the ABI of functions with local linkage. The problem with this is that other languages might still emit byval, etc.

The backend still supports byval

I would have assumed it makes more sense to have an IR pass that changes the ABI of functions with local linkage. The problem with this is that other languages might still emit byval, etc.

I think this would be good as an alternative optimization, but this wouldn't change the C ABI to avoid byval. I think we don't want byval for whenever we have an object linkable ABI

I would have assumed it makes more sense to have an IR pass that changes the ABI of functions with local linkage. The problem with this is that other languages might still emit byval, etc.

I think this would be good as an alternative optimization, but this wouldn't change the C ABI to avoid byval. I think we don't want byval for whenever we have an object linkable ABI

I have two points, I should have been more clear:

  1. This is dangerous since we change something that is tested. Interoperability might not be the biggest thing yet, but special casing clangs behavior seems a slippery slope towards rust/fortran/Julia/... code not being able to talk to C/C++/HIP/... code when both are compiled for AMDGPUs.
  2. We get more bang and less (potential) problems if we transform the IR whenever we can prove that we can, e.g., we see all the call sites. It should be overall better since we target a closed world, at least for the foreseeable future.
  1. This is dangerous since we change something that is tested. Interoperability might not be the biggest thing yet, but special casing clangs behavior seems a slippery slope towards rust/fortran/Julia/... code not being able to talk to C/C++/HIP/... code when both are compiled for AMDGPUs.

It's not special casing clang's behavior. We are defining the C ABI, a thing which currently does not exist. Other languages that want to be compatible with the C ABI have to follow, but they don't have to. This isn't a unique property, and every frontend has to do this for every target. I don't see why we need to carry the albatross of byval simply because llvm didn't well abstract ABI details.

We do not have a defined ABI. We do not have machine code linking. Calling functions from assembly requires considerable care already. We change things like which registers are live per-function.

The time will come where we are sufficiently constrained to be stuck with inefficiency in calling convention but we are not there yet. Other languages that want to link against IR emitted by clang can still do so, though they'll also need to change to the more efficient convention to do so.

You're saying this is not specializing it but it is. AMDGPU now emits different IR than NVPTX targets. Is that by itself a problem, no. Can I imagine this to be a problem down the line, yes.
That aside, I am not arguing this is by itself wrong. What I'm mostly trying to say is that there is a more generic alternative we should implement instead.
My goal is to get the same effect for all languages targeting AMDGPUs, not only the ones that go through Clang.

Anyway, I'm not blocking this. We'll can deal with the other languages and such later then.

... though they'll also need to change to the more efficient convention to do so.

Exactly my argument.

That aside, I am not arguing this is by itself wrong. What I'm mostly trying to say is that there is a more generic alternative we should implement instead.

We should optimise more aggressively within a module. There's an idea in mailing lists to deliberately split functions into local versions with a fast calling convention and externally visible shims used for address escapes. That's a good thing that we should do. It's not this thing.

This is a case where the module external functions can be quicker too. So we should do that. I hear that nvptx does the worse thing and you don't like divergence between the GPU targets. Ptx is the stable ABI there, perhaps we should fix nvptx to match in a later patch.

My goal is to get the same effect for all languages targeting AMDGPUs, not only the ones that go through Clang.

Those languages should also not use the convention in place before this patch. The existing behaviour is a design mistake. This frees them to do something better without generating shims to work around the C ABI.

You're saying this is not specializing it but it is. AMDGPU now emits different IR than NVPTX targets. Is that by itself a problem, no. Can I imagine this to be a problem down the line, yes.
That aside, I am not arguing this is by itself wrong. What I'm mostly trying to say is that there is a more generic alternative we should implement instead.

It's not an alternative, it's an entirely orthogonal thing. We should have both. If we commit to a real ABI, it is not going to be limited by trying to pretend that target specific IR is portable, when that's never been the case. NVPTX can switch to match this if it wants. If we want similar IR why consolidate in the worse direction

cfang updated this revision to Diff 543703.Jul 24 2023, 2:46 PM
  1. Move getIndirectAliased under isAggregateTypeForABI(Ty)
  2. update LIT addr-space-struct-arg.cl to check corresponding alloca and memcpy to the struct.
cfang added inline comments.Jul 24 2023, 2:49 PM
clang/test/CodeGenOpenCL/addr-space-struct-arg.cl
120

Do you mean using "update_cc_test_checks.py" to generate the CHECKs? I am not sure why this does not work as expected. But I am including it as a separate file " addr-space-struct-arg-temp.cl" for reference.

cfang added a comment.Jul 24 2023, 2:50 PM

Should also get a mention in the release notes (not sure how much ABI detail we have in AMDGPUUsage too)

Still trying to figure out where to say what in the document.

arsenm added inline comments.Jul 24 2023, 2:50 PM
clang/test/CodeGenOpenCL/addr-space-struct-arg.cl
120

Yes. You might need to manually delete the checks that are already there

Does this patch cause the callee to skip making a local copy of the struct type argument? What if the callee makes changes to the argument? That is a common use case since users assume the function arguments in C/C++/HIP are passed by value.

Does this patch cause the callee to skip making a local copy of the struct type argument? What if the callee makes changes to the argument? That is a common use case since users assume the function arguments in C/C++/HIP are passed by value.

The callee is supposed to use an explicit memcpy. The explicit memcpy can be optimized away most of the time. The current byval is an invisible copy on the caller side which is difficult to eliminate

arsenm requested changes to this revision.Jul 28 2023, 11:50 AM

Don't add the temporary test, pre-commit a switch of the existing test so it's easy to see the diff in the review.

This revision now requires changes to proceed.Jul 28 2023, 11:50 AM
cfang updated this revision to Diff 546255.Aug 1 2023, 3:59 PM

add a pre-commit test, CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl, for convenience of review. This test will be added for checking upon commit.

arsenm added a comment.Aug 1 2023, 4:36 PM

Need to add to release notes and document in AMDGPUUsage

clang/test/CodeGenOpenCL/byval.cl
0–1

The test name suggests we should test with a different target that does use byval here

cfang updated this revision to Diff 546270.Aug 1 2023, 5:03 PM

add x86 target to check pass by value in byval.cl
remove -enable-var-scope in the pre-commit test.

TODO: document and release note

cfang updated this revision to Diff 546992.Aug 3 2023, 1:51 PM

Add comments in ReleaseNotes and AMDGPUUsage.

Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2023, 1:51 PM
arsenm added inline comments.Aug 10 2023, 2:41 PM
llvm/docs/AMDGPUUsage.rst
13812–13813

s/to allocate memory/for allocating stack memory/

13813

Specify C ABI

13814

copying the value of the struct if modified

cfang added inline comments.Aug 11 2023, 11:45 AM
llvm/docs/AMDGPUUsage.rst
13813

Do not get what to do to "Specify C ABI"? Can you suggest explicitly? Thanks.

arsenm added inline comments.Aug 11 2023, 12:01 PM
llvm/docs/AMDGPUUsage.rst
13813

Add the letter C

cfang added inline comments.Aug 11 2023, 12:05 PM
llvm/docs/AMDGPUUsage.rst
13813

... in function C ABI? Or should we remove "function"?

arsenm added inline comments.Aug 11 2023, 12:06 PM
llvm/docs/AMDGPUUsage.rst
13813

Can just say C, doesn't really matter if you state function or not. function is implied

cfang updated this revision to Diff 549487.Aug 11 2023, 12:16 PM
cfang marked 5 inline comments as done.

Update description in docs as suggested.

arsenm accepted this revision.Aug 11 2023, 2:43 PM

Should look into why noundef was lost, but that can be in a follow up

clang/lib/CodeGen/Targets/AMDGPU.cpp
252

Typo "in stead"

clang/test/CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl
257

This lost the noundef, shouldn't lose it

This revision is now accepted and ready to land.Aug 11 2023, 2:43 PM
cfang added inline comments.Aug 11 2023, 3:44 PM
clang/test/CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl
257

if (AI.getKind() == ABIArgInfo::Indirect)

return "noundef"

Should we add IndirectAlised check and include in the this same patch? Thanks

arsenm added inline comments.Aug 11 2023, 4:19 PM
clang/test/CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl
257

Don't understand this snippet, the attribute emission presumably comes from somewhere else

cfang updated this revision to Diff 549545.Aug 11 2023, 4:23 PM

Add "noundef" attribute for IndirectAlised.

cfang added inline comments.Aug 11 2023, 4:26 PM
clang/test/CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl
257

In function DetermineNoUndef. "noundef was also missing for kernel byref argument

arsenm added inline comments.Aug 11 2023, 4:28 PM
clang/test/CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl
257

Yes, that should probably also include indirect aliased. You should fix that in a second patch

This revision was landed with ongoing or failed builds.Aug 11 2023, 4:38 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 11 2023, 4:38 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Thanks! Happy to see function calls getting cheaper