- As the address space cast may not be valid on a specific target, addrspacecast is not handled when an alloca is able to be replaced with the source of memcpy/memmove. This patch addresses that by querying a target hook on whether that address space cast is valid. For example, on most GPU targets, the cast from a global pointer to a generic pointer is valid.
- If that cast is allowedd (by querying isValidAddrSpaceCast), the replacement is enhanced to handle that addrspacecast as well.
if half of %alloca is filled with @g1, and half of %alloca is filled with @g2, will it be replaced in the call of @readonly_caller? do we need a test for that?
a new keep test is added.
it won't replace that alloca as isOnlyCopiedFromConstantMemory bails out if more than one copy to alloca (including pointers calculated from GEPs) is found. Even if we could find that's the safe case as the copies from g1 and g2 won't overlap, that initialization of the alloca is way complicated to be handled in this scenario (passing through a function call). For other scenarios, I hope SROA could slice that [32 x i8] alloca into 2 smaller ones. But, it still won't be handled here directly.
can we make SrcAS a member of PointerReplacer since it seems not to change?
As far as I can tell, your motivating test cases do not require the TTI hook. They only need trivial address space casts (to the same address space).
Using TTI inside InstCombine is forbidden as a matter of policy. You would need to make an argument for why this exception should be allowed (it would probably go along the lines of "this should really be part of data layout"), so I would recommend to only handle this trivial case first and bypass the TTI issue.
Yes, that specific test doesn't need a TTI hook to enhance the instcombine. But, that TTI hook is provided to handle other valid address spaces per target. For example, on most GPU devices, the cast from a global pointer to a generic pointer is valid but not in the reverse direction. However, that needs to change the target code and target-specific test. I do not own any target code to test that hook further. If required, I could modify AMDGPU as an example.
Do we really need to check whether the addrspacecast is valid here?
If it is invalid, then it is UB. Therefore we could just assume it is valid.
According to the LLVM IR manual, we can assume the memory pointed to by addrspacecasted pointers are the same, therefore we should be able to do this optimization always.
Another reason is that there should already be validity check for addrspacecast in some target specific passes, therefore there is no need to check it here.
The langref/verifier do not allow addrspacecast between the same address space
This misses a variety of valid casts. More or less anything except region should be a valid to flat case. Also the constant_32bit can cast to constant and to flat.
This is just reinterpreting bits of a pointer in memory. This is not an addrspacecast. You can only use an addrspacecast in the trivial cases, otherwise you need to use ptrtoint+inttoptr. This is also why we should allow bitcasts between any pointers with the same size https://discourse.llvm.org/t/rfc-llvm-ir-should-allow-bitcast-between-address-spaces-with-the-same-size/5759
- Update TTI hook isValidAddrSpaceCast to be consistent with LLVM IR verifier
- Check the equal case in instcombine
- Add other valid cases in AMDGPU
We should not transform valid code with well-defined behavior into one with UB. It's valid for the developer to copy data from an address space (not eligible to be cast into generic address space) into private memory and pass that data into a routine accepting generic pointers only. For this kind of case, we should not transform that code or generate a UB address space cast.
For the last comment, once the alloca is replaced or removed, a new addrspacecast would be created from the source pointer (copied into alloca) to the pointer (previously cast from that alloca pointer). We need to ensure the resulting addrspacecast is valid.
I c. I thought only operands of load is replaced. I did not notice that other uses of the pointer are replaced.
Probably need a test for that.
This patch only handles the extra addrspacecast user. It doesn't introduce new users.
I'm approving this usage of TTI in InstCombine on the following grounds:
- This is a legality query, not a profitability query. It does not impact IR canonicality in any meaningful way.
- While legality queries should really not be part of TTI, I also think that including this in DataLayout would be somewhat awkward, and making this a TTI query is the more pragmatic choice for the time being. I'd still be interested in @arsenm's opinion on whether it would make sense to make this a data layout property in the future.
I'll leave final patch review to AMDGPU maintainers, just explicitly withdrawing my objection here.
Please perform the cast to AddrSpaceCastInst here, not inside the function.
Use SrcAddrSpace defined above.