This patch adds on to the functionality implemented
in rG42ab5dc5a5dd6c79476104bdc921afa2a18559cf,
where PHI nodes are supported in the use-def traversal
algorithm to determine if an alloca ever overwritten
in addition to a memmove/memcpy. This patch implements
the support needed by the PointerReplacer to collect
all (indirect) users of the alloca in the case where a PHI
is involved. Finally, a new PHI is defined in the replace
method which takes in replaced incoming values and
updates the WorkMap accordingly.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
- fixed posion value returns
- fixed infinite loop cases by inserting a false check in PointerReplacer::collectUsers(Inst)
- simplified test case and added negative tests for non-simple load and no memcpy
- ran update_test_check.py
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
273 | Why do we need to add the phi incoming values to the worklist? Can't this use the same code as bitcast/GEP? Also, I think it would be good to add a test involving a loop phi, because I think as written, this code might go into infinite recursion in that case. |
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
273 | I suppose the incoming values would have already been inserted prior to collecting users of the PHI. Thanks! A loop test sounds like a great idea! |
Just took another look at this. I believe the change to isOnlyCopiedFromConstantMemory() is fine, but the changes to the PointerReplacer are not. Consider these test cases:
@g1 = constant [32 x i8] zeroinitializer @g2 = addrspace(1) constant [32 x i8] zeroinitializer define i32 @test1(i1 %c, ptr %ptr) { entry: %a = alloca [32 x i8] call void @llvm.memcpy.p0.p0.i64(ptr %a, ptr @g1, i64 32, i1 false) br i1 %c, label %if, label %join if: br label %join join: %phi = phi ptr [ %a, %if ], [ %ptr, %entry ] %v = load i32, ptr %phi ret i32 %v } define i32 @test2(i1 %c, ptr %ptr) { entry: %a = alloca [32 x i8] call void @llvm.memcpy.p0.p1.i64(ptr %a, ptr addrspace(1) @g2, i64 32, i1 false) br i1 %c, label %if, label %join if: br label %join join: %phi = phi ptr [ %a, %if ], [ %ptr, %entry ] %v = load i32, ptr %phi ret i32 %v } declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1) declare void @llvm.memcpy.p0.p1.i64(ptr, ptr addrspace(1), i64, i1)
The first one works, the second one (using PointerReplacer) will cause an assertion failure. The problem is that once we start to handle phis (or selects), additional operands become involved that might not be rooted in the alloca at all. To handle such cases, we need to make sure that all the phi argument are based on the alloca.
I'd personally recommend to split this patch in two parts. First do only the isOnlyCopiedFromConstantMemory() change (which is trivially correct), and then a separate one to deal with the more complex case of address space conversions.
While not strictly related to the patch, we probably need a limit on number of visited users in isOnlyCopiedFromConstantMemory() -- it's a pre-existing problem, but once we start looking through phis, it becomes more likely that we start looking through large use graphs.
llvm/test/Transforms/InstCombine/replace-alloca-phi.ll | ||
---|---|---|
3 | Please use opaque pointers for new tests ([32 x i8] addrspace(4)* -> ptr addrspace(4) etc). |
I believe the change to isOnlyCopiedFromConstantMemory() is fine
I think I spoke too soon ... I can't test this right now, but probably this minor variant is going to be miscompiled:
define i32 @test1(i1 %c, ptr %ptr) { entry: %a = alloca [32 x i8] br i1 %c, label %if, label %join if: br label %join join: %phi = phi ptr [ %a, %if ], [ %ptr, %entry ] call void @llvm.memcpy.p0.p0.i64(ptr %phi, ptr @g1, i64 32, i1 false) %v = load i32, ptr %phi ret i32 %v }
While we can replace uses of %a with @g1, dropping the memcpy would be incorrect, as it would also drop the potential write to %ptr, which is not covered by the replacement.
The minor variant compiles correctly actually, no transformation applies so the code stays the same.
I just tried this, and the memcpy does seem to get dropped:
@g1 = constant [32 x i8] zeroinitializer define i32 @test1(i1 %c, ptr %ptr) { entry: %a = alloca [32 x i8] br i1 %c, label %if, label %join if: br label %join join: %phi = phi ptr [ %a, %if ], [ %ptr, %entry ] call void @llvm.memcpy.p0.p0.i64(ptr %phi, ptr @g1, i64 32, i1 false) %v = load i32, ptr %phi ret i32 %v } declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1) ; RUN: opt -S -instcombine < %s @g1 = constant [32 x i8] zeroinitializer define i32 @test1(i1 %c, ptr %ptr) { entry: br i1 %c, label %if, label %join if: ; preds = %entry br label %join join: ; preds = %if, %entry %phi = phi ptr [ @g1, %if ], [ %ptr, %entry ] %v = load i32, ptr %phi, align 4 ret i32 %v } ; Function Attrs: argmemonly nocallback nofree nounwind willreturn declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #0
@nikic Since alloca instructions are iterated more than once, shouldn't the Worklist under PointerReplacer remove instructions that have already been replaced (new instruction inserted without removing the old instruction)? The first function goes through a strange transformation when opaque pointers are used, where the replace(Instruction&) function is called on a GEP instruction which was already replaced in a previous instruction. This somehow causes the removal of the new GEP instruction.
Not sure I follow, which IR function are you referring to here? Generally InstCombine will automatically clean up dead (side-effect-free) instructions, but it may be possible that this needs to be done explicitly once phis are involved (as they might no longer be trivially dead if there is a cycle).
llvm/test/Transforms/InstCombine/replace-alloca-phi.ll | ||
---|---|---|
4 | Is that a complete specification? I am getting the following error: error: Missing size specification for pointer in datalayout string |
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
61–68 | Cosmetic review: What do you think about using the varadic template version of isa ? You could collapse this into isa<BitCastInst, AddrSpaceCastInst, PHINode>(I) |
llvm/test/Transforms/InstCombine/replace-alloca-phi.ll | ||
---|---|---|
4 | target datalayout = "p5:32:32-A5" |
- In case one of the incoming values being replaced belongs to a different address space with respect to other incoming values, cast them to the target address space.
- Included tests suggested by @nikic
- Minor nits suggested by reviewers
- Eliminated noalias and readonly attributes for parameters as they are irrelevant to this patch
I just tested the updated patch, and this case still gets miscompiled:
@g1 = constant [32 x i8] zeroinitializer define i32 @test_memcpy_after_phi(i1 %c, ptr %ptr) { entry: %a = alloca [32 x i8] br i1 %c, label %if, label %join if: br label %join join: %phi = phi ptr [ %a, %if ], [ %ptr, %entry ] call void @llvm.memcpy.p0.p0.i64(ptr %phi, ptr @g1, i64 32, i1 false) %v = load i32, ptr %phi ret i32 %v } declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1)
The memcpy gets removed, even though it might be writing to the %ptr argument rather than the %a alloca.
llvm/test/Transforms/InstCombine/replace-alloca-phi.ll | ||
---|---|---|
4 | You should be able to avoid making this AMDGPU specific by copying from a constant global, instead of copying from a special AMDGPU address space. |
@nikic Great catch! The memcpy needs to be preserved in this case. I suppose isOnlyCopiedFromConstantMemory(..) should return false when this case is hit.
- When walking over a memtransfer inst in isOnlyCopiedFromConstantMemory(..), check if the destination of the instruction is a PHI node and return false if so.
- Added the @nikic 's test and another loop test
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
118 | This is not sufficient: E.g. consider doing the memcpy to an addrspacecast of the phi or similar. I think what we'd need instead is that set IsOffset=true for phi nodes, so we'll recursively reject memcpys into it. Of course we would need a different name than IsOffset in that case. It would more generally indicate that the pointer is not equivalent to the base. |
If an alloca has a PHI use which happens to be an ancestor of a memtransfer
instruction, then we cannot determine at compile time whether the result of the
PHI is the alloca or an alternative incoming value. As a result, either the
source or the destination is indeterminate. I added a third variable to
ValuesToInspect, to track whether we have already seen a PHI.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
348–349 | If an incoming value is eligible to be replaced with a pointer from a different addrspace, other incoming values likely do not belong to the addrspace. This makes addrspacecast unavoidable. |
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
348–349 | Right, so I'm saying that would just make this illegal and you need to check the address spaces match. Is there a situation where we would want this to happen, that also isn't possible after InferAddressSpaces? |
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
348–349 | The function @addrspace_cast_remove_alloca in the test I have attached addresses this issue. |
- Eliminated addrspace casts in the PointerReplacer::replace() function
- If the PHI node under inspection resides in the default addrspace, do not allow replacing a value which copied from a non-default addrspace.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
63 | Why do we care about default address space here? I think what you probably want to check is PHI->getType()->getPointerAddressSpace() != TheCopy->getPointerAddressSpace()? But also, this code assumes that TheCopy is available before the PHI is seen, and I don't think anything guarantees that -- order of uses is unpredictable. I think you'd want to perform the bailout in PointerReplacer::collectUsers() instead? Maybe I'm misunderstanding the purpose here. |
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
55 | Why do we need the LastPHI variable? I'd have expected that just setting IsOffset=true is equivalent? (With a rename IsOffset -> ForbidCopy or something?) |
- Eliminated LastPHI as it had no valuable impact
- Removed the addrspace cast checks in isOnlyCopiedFromConstantMemory(..) as pointed out. Instead, determine if the alloca is directly used as an incoming value in a PHI while collecting users of an alloca in PointerReplacer::collectUsers(..). If the address spaces of the PHI and the destination of the Copy inst differ and there is a direct use of the alloca, an addrspace cast will be required to further allow the PHI for replacement. Since we do not allow for addrspace casts, reject the PHI at that moment.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
63 | I implemented it for the function @addrspace_diff_keep_alloca in the test file. I improved upon the solution by moving the changes over to collectUsers() which does make a lot more sense wrt my previous solution. I hope it seems fine now. |
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
118 | I had a revision previously where I was tracking PHI nodes seperately, as opposed to IsOffset. I replaced it with my changes in PointerReplacer::collectUsers(..). |
- PHI's need to be kept track of to avoid inserting them repeatedly into ValuesToInspect.
- Insert PHI into ValuesToInspect only when it hasn't seen before. Otherwise, continue with the use-def traversal.
@gandhi21299 I only landed the basic isOnlyCopiedFromConstantMemory() support, but the PointerReplacer part is still open. I'm think the logic there is still not quite correct -- will take another look tomorrow.
Rebased patch over committed changes: https://gist.github.com/nikic/ce69a66fdc0985d1af7a56bf024bc860
With additional tests added (in particular @addrspace_diff_keep_alloca_extra_gep) the patch causes an assertion failure.
You are currently going over phi operands and checking whether one of them is the memcpy dest, but I'm not sure what this check is intended to achieve, semantically. I think it just works by accident on existing test cases.
The problem is that we need a closed graph to perform the pointer replacement (can't have any external inputs) and additionally the worklist also needs to be ordered correctly, so that inputs are always replaced before the using instruction. The former is a hard requirement for this to work, the latter is a requirement of the current algorithm.
If you don't want to change the current replacement approach, then I think a good way to enforce this would be to do the following: When processing a phi node, check whether all phi operands are already in the worklist. If not, don't visit the phi yet (wait until everything is in the worklist). Add the phi to some extra set, and after collectUsers(), check that all phis are in the worklist. This is to handle the case where an input was not part of the graph and the phi is never added to the worklist.
- Revised PointerReplacer::collectUsers(..) as requested.
- Passes all tests under check-llvm-transforms, including replace-alloca-phi.ll
- Eliminated the redundant loop as pointed out by reviewer
- Created a new method that drives the recursive algorithm for collecting users, this was necessary to avoid publicizing the member variables of PointerReplacer.
Logic looks correct to me now. Looks like the test changes got lost though?
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
15 | Shouldn't be needed (already used in file). | |
251 | Copy argument no longer needed | |
272 | Could make ValuesToRevisit use Instruction * rather than Value * and save this cast. | |
286 | Can use any_of instead of for here and avoid the ValueInserted variable. |
I rebased the changes and there weren't any test changes in the reduced version of this patch.
- Replaced calls from collectUsers to collectUsersRecursive, which now generates the code we expect for @loop_phi_remove_alloca and @loop_phi_late_memtransfer_remove_alloca.
- requested updates
Thanks a bunch for such in-depth review, I will commit this patch.
Shouldn't be needed (already used in file).