This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Teach alloca replacement to handle `addrspacecast`
ClosedPublic

Authored by hliao on Mar 27 2023, 9:25 PM.

Details

Summary
  • As the address space cast may not be valid on a specific target, addrspacecast is not handled when an alloca is able to be replaced with the source of memcpy/memmove. This patch addresses that by querying a target hook on whether that address space cast is valid. For example, on most GPU targets, the cast from a global pointer to a generic pointer is valid.
  • If that cast is allowedd (by querying isValidAddrSpaceCast), the replacement is enhanced to handle that addrspacecast as well.

Diff Detail

Event Timeline

hliao created this revision.Mar 27 2023, 9:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2023, 9:25 PM
hliao requested review of this revision.Mar 27 2023, 9:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2023, 9:25 PM
yaxunl added inline comments.Mar 28 2023, 7:05 AM
llvm/test/Transforms/InstCombine/ptr-replace-alloca.ll
437

if half of %alloca is filled with @g1, and half of %alloca is filled with @g2, will it be replaced in the call of @readonly_caller? do we need a test for that?

hliao updated this revision to Diff 509024.Mar 28 2023, 8:18 AM

Add one more keep test

hliao added a comment.Mar 28 2023, 8:19 AM

a new keep test is added.

llvm/test/Transforms/InstCombine/ptr-replace-alloca.ll
437

it won't replace that alloca as isOnlyCopiedFromConstantMemory bails out if more than one copy to alloca (including pointers calculated from GEPs) is found. Even if we could find that's the safe case as the copies from g1 and g2 won't overlap, that initialization of the alloca is way complicated to be handled in this scenario (passing through a function call). For other scenarios, I hope SROA could slice that [32 x i8] alloca into 2 smaller ones. But, it still won't be handled here directly.

hliao updated this revision to Diff 509044.Mar 28 2023, 9:18 AM

Fix typos

hliao added a comment.Mar 29 2023, 9:36 AM

Ping for review

yaxunl added inline comments.Mar 29 2023, 11:28 AM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
266

can we make SrcAS a member of PointerReplacer since it seems not to change?

hliao updated this revision to Diff 509518.Mar 29 2023, 5:50 PM

Revise following reviewer's comment

hliao marked an inline comment as done.Mar 29 2023, 5:50 PM
nikic requested changes to this revision.Mar 30 2023, 8:18 AM
nikic added a subscriber: nikic.

As far as I can tell, your motivating test cases do not require the TTI hook. They only need trivial address space casts (to the same address space).

Using TTI inside InstCombine is forbidden as a matter of policy. You would need to make an argument for why this exception should be allowed (it would probably go along the lines of "this should really be part of data layout"), so I would recommend to only handle this trivial case first and bypass the TTI issue.

This revision now requires changes to proceed.Mar 30 2023, 8:18 AM

As far as I can tell, your motivating test cases do not require the TTI hook. They only need trivial address space casts (to the same address space).

Using TTI inside InstCombine is forbidden as a matter of policy. You would need to make an argument for why this exception should be allowed (it would probably go along the lines of "this should really be part of data layout"), so I would recommend to only handle this trivial case first and bypass the TTI issue.

Yes, that specific test doesn't need a TTI hook to enhance the instcombine. But, that TTI hook is provided to handle other valid address spaces per target. For example, on most GPU devices, the cast from a global pointer to a generic pointer is valid but not in the reverse direction. However, that needs to change the target code and target-specific test. I do not own any target code to test that hook further. If required, I could modify AMDGPU as an example.

hliao updated this revision to Diff 509785.Mar 30 2023, 12:34 PM

Add the target-specific usage

hliao added a comment.Mar 31 2023, 9:29 AM

Kindly PING for review

hliao updated this revision to Diff 510077.Mar 31 2023, 10:18 AM

Fix clang-format

hliao updated this revision to Diff 510225.Apr 1 2023, 7:18 AM

Rebase and kindling PING for review

hliao updated this revision to Diff 510350.Apr 2 2023, 7:19 AM

Rebase and kindly PING for review

yaxunl added inline comments.Apr 3 2023, 6:08 AM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
279

Do we really need to check whether the addrspacecast is valid here?

If it is invalid, then it is UB. Therefore we could just assume it is valid.

According to the LLVM IR manual, we can assume the memory pointed to by addrspacecasted pointers are the same, therefore we should be able to do this optimization always.

Another reason is that there should already be validity check for addrspacecast in some target specific passes, therefore there is no need to check it here.

arsenm added inline comments.Apr 3 2023, 7:02 AM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
99

The langref/verifier do not allow addrspacecast between the same address space

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
173–179

This misses a variety of valid casts. More or less anything except region should be a valid to flat case. Also the constant_32bit can cast to constant and to flat.

arsenm added a comment.Apr 3 2023, 7:32 AM

This is just reinterpreting bits of a pointer in memory. This is not an addrspacecast. You can only use an addrspacecast in the trivial cases, otherwise you need to use ptrtoint+inttoptr. This is also why we should allow bitcasts between any pointers with the same size https://discourse.llvm.org/t/rfc-llvm-ir-should-allow-bitcast-between-address-spaces-with-the-same-size/5759

hliao updated this revision to Diff 510511.Apr 3 2023, 8:32 AM
  • Update TTI hook isValidAddrSpaceCast to be consistent with LLVM IR verifier
  • Check the equal case in instcombine
  • Add other valid cases in AMDGPU
hliao added inline comments.Apr 3 2023, 8:37 AM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
279

We should not transform valid code with well-defined behavior into one with UB. It's valid for the developer to copy data from an address space (not eligible to be cast into generic address space) into private memory and pass that data into a routine accepting generic pointers only. For this kind of case, we should not transform that code or generate a UB address space cast.

hliao added inline comments.Apr 3 2023, 8:52 AM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
279

For the last comment, once the alloca is replaced or removed, a new addrspacecast would be created from the source pointer (copied into alloca) to the pointer (previously cast from that alloca pointer). We need to ensure the resulting addrspacecast is valid.

yaxunl added inline comments.Apr 3 2023, 9:30 AM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
279

I c. I thought only operands of load is replaced. I did not notice that other uses of the pointer are replaced.

Probably need a test for that.

hliao added inline comments.Apr 3 2023, 10:57 AM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
279

This patch only handles the extra addrspacecast user. It doesn't introduce new users.

hliao updated this revision to Diff 510805.Apr 4 2023, 7:50 AM

Rebase and kindly PING review. Please let me know any pending issue to be addressed.

hliao updated this revision to Diff 511115.Apr 5 2023, 8:29 AM

Rebase and kindly PING for review

nikic resigned from this revision.Apr 5 2023, 12:00 PM

I'm approving this usage of TTI in InstCombine on the following grounds:

  • This is a legality query, not a profitability query. It does not impact IR canonicality in any meaningful way.
  • While legality queries should really not be part of TTI, I also think that including this in DataLayout would be somewhat awkward, and making this a TTI query is the more pragmatic choice for the time being. I'd still be interested in @arsenm's opinion on whether it would make sense to make this a data layout property in the future.

I'll leave final patch review to AMDGPU maintainers, just explicitly withdrawing my objection here.

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
355

Please perform the cast to AddrSpaceCastInst here, not inside the function.

551

Use SrcAddrSpace defined above.

hliao updated this revision to Diff 511258.Apr 5 2023, 7:40 PM

Revise following comments from reviewers

hliao marked an inline comment as done.Apr 5 2023, 7:41 PM
hliao updated this revision to Diff 511468.Apr 6 2023, 10:24 AM

Rebase and kindly PING for review

hliao updated this revision to Diff 512202.Apr 10 2023, 11:37 AM

Rebase and kindly PING for review

yaxunl accepted this revision.Apr 11 2023, 7:30 AM

LGTM. Thanks.

This revision is now accepted and ready to land.Apr 11 2023, 7:30 AM