This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Handle PHI nodes in PtrReplacer
ClosedPublic

Authored by gandhi21299 on Oct 18 2022, 1:38 PM.

Details

Summary

This patch adds on to the functionality implemented
in rG42ab5dc5a5dd6c79476104bdc921afa2a18559cf,
where PHI nodes are supported in the use-def traversal
algorithm to determine if an alloca ever overwritten
in addition to a memmove/memcpy. This patch implements
the support needed by the PointerReplacer to collect
all (indirect) users of the alloca in the case where a PHI
is involved. Finally, a new PHI is defined in the replace
method which takes in replaced incoming values and
updates the WorkMap accordingly.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
nikic added inline comments.Oct 19 2022, 1:03 AM
llvm/test/Transforms/InstCombine/replace-alloca-phi.ll
2

Use update_test_checks.py.

70

This test is too complicated. Please reduce it to the minimum needed to show the new transform.

There also needs to be a negative test where there is an unsupported use of the phi.

This revision now requires changes to proceed.Oct 19 2022, 1:03 AM
gandhi21299 marked 5 inline comments as done.
  • fixed posion value returns
  • fixed infinite loop cases by inserting a false check in PointerReplacer::collectUsers(Inst)
  • simplified test case and added negative tests for non-simple load and no memcpy
  • ran update_test_check.py
  • combine two if statements due to common blocks
nikic added inline comments.Oct 21 2022, 12:56 PM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
269

Why do we need to add the phi incoming values to the worklist? Can't this use the same code as bitcast/GEP?

Also, I think it would be good to add a test involving a loop phi, because I think as written, this code might go into infinite recursion in that case.

  • removed #include "llvm/ADT/STLExtras.h"
gandhi21299 added inline comments.Oct 21 2022, 1:38 PM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
269

I suppose the incoming values would have already been inserted prior to collecting users of the PHI. Thanks!

A loop test sounds like a great idea!

gandhi21299 added inline comments.Oct 21 2022, 1:56 PM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
269

@nikic Never mind, the incoming values are missing in the WorkMap when PHIs are being replaced replace if I skip inserting them.

  • adding a loop test
gandhi21299 marked an inline comment as done.Oct 21 2022, 3:10 PM
gandhi21299 retitled this revision from [InstCombine] Replace alloca with phi uses to [InstCombine] Handle PHI nodes when eliminating constant memcpy.Oct 22 2022, 3:15 AM
  • insert an assertion in the replace() method
nikic requested changes to this revision.Oct 31 2022, 9:21 AM

Just took another look at this. I believe the change to isOnlyCopiedFromConstantMemory() is fine, but the changes to the PointerReplacer are not. Consider these test cases:

@g1 = constant [32 x i8] zeroinitializer
@g2 = addrspace(1) constant [32 x i8] zeroinitializer

define i32 @test1(i1 %c, ptr %ptr) {
entry:
  %a = alloca [32 x i8]
  call void @llvm.memcpy.p0.p0.i64(ptr %a, ptr @g1, i64 32, i1 false)
  br i1 %c, label %if, label %join

if:
  br label %join

join:
  %phi = phi ptr [ %a, %if ], [ %ptr, %entry ]
  %v = load i32, ptr %phi
  ret i32 %v
}

define i32 @test2(i1 %c, ptr %ptr) {
entry:
  %a = alloca [32 x i8]
  call void @llvm.memcpy.p0.p1.i64(ptr %a, ptr addrspace(1) @g2, i64 32, i1 false)
  br i1 %c, label %if, label %join

if:
  br label %join

join:
  %phi = phi ptr [ %a, %if ], [ %ptr, %entry ]
  %v = load i32, ptr %phi
  ret i32 %v
}

declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1)
declare void @llvm.memcpy.p0.p1.i64(ptr, ptr addrspace(1), i64, i1)

The first one works, the second one (using PointerReplacer) will cause an assertion failure. The problem is that once we start to handle phis (or selects), additional operands become involved that might not be rooted in the alloca at all. To handle such cases, we need to make sure that all the phi argument are based on the alloca.

I'd personally recommend to split this patch in two parts. First do only the isOnlyCopiedFromConstantMemory() change (which is trivially correct), and then a separate one to deal with the more complex case of address space conversions.

While not strictly related to the patch, we probably need a limit on number of visited users in isOnlyCopiedFromConstantMemory() -- it's a pre-existing problem, but once we start looking through phis, it becomes more likely that we start looking through large use graphs.

llvm/test/Transforms/InstCombine/replace-alloca-phi.ll
3

Please use opaque pointers for new tests ([32 x i8] addrspace(4)* -> ptr addrspace(4) etc).

This revision now requires changes to proceed.Oct 31 2022, 9:21 AM
nikic added a comment.Oct 31 2022, 9:44 AM

I believe the change to isOnlyCopiedFromConstantMemory() is fine

I think I spoke too soon ... I can't test this right now, but probably this minor variant is going to be miscompiled:

define i32 @test1(i1 %c, ptr %ptr) {
entry:
  %a = alloca [32 x i8]
  br i1 %c, label %if, label %join

if:
  br label %join

join:
  %phi = phi ptr [ %a, %if ], [ %ptr, %entry ]
  call void @llvm.memcpy.p0.p0.i64(ptr %phi, ptr @g1, i64 32, i1 false)
  %v = load i32, ptr %phi
  ret i32 %v
}

While we can replace uses of %a with @g1, dropping the memcpy would be incorrect, as it would also drop the potential write to %ptr, which is not covered by the replacement.

This comment was removed by gandhi21299.

The minor variant compiles correctly actually, no transformation applies so the code stays the same.

nikic added a comment.Nov 1 2022, 1:56 AM

The minor variant compiles correctly actually, no transformation applies so the code stays the same.

I just tried this, and the memcpy does seem to get dropped:

@g1 = constant [32 x i8] zeroinitializer

define i32 @test1(i1 %c, ptr %ptr) {
entry:
  %a = alloca [32 x i8]
  br i1 %c, label %if, label %join

if:
  br label %join

join:
  %phi = phi ptr [ %a, %if ], [ %ptr, %entry ]
  call void @llvm.memcpy.p0.p0.i64(ptr %phi, ptr @g1, i64 32, i1 false)
  %v = load i32, ptr %phi
  ret i32 %v
}
 
declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1)

; RUN: opt -S -instcombine < %s

@g1 = constant [32 x i8] zeroinitializer

define i32 @test1(i1 %c, ptr %ptr) {
entry:
  br i1 %c, label %if, label %join

if:                                               ; preds = %entry
  br label %join

join:                                             ; preds = %if, %entry
  %phi = phi ptr [ @g1, %if ], [ %ptr, %entry ]
  %v = load i32, ptr %phi, align 4
  ret i32 %v
}

; Function Attrs: argmemonly nocallback nofree nounwind willreturn
declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #0
arsenm added inline comments.Nov 1 2022, 1:50 PM
llvm/test/Transforms/InstCombine/replace-alloca-phi.ll
4

Drop the triple, this just needs target datalayout ="A5". Otherwise this will fail without amdgpu built

4

Probably should do "p5-32-A5"

@nikic Since alloca instructions are iterated more than once, shouldn't the Worklist under PointerReplacer remove instructions that have already been replaced (new instruction inserted without removing the old instruction)? The first function goes through a strange transformation when opaque pointers are used, where the replace(Instruction&) function is called on a GEP instruction which was already replaced in a previous instruction. This somehow causes the removal of the new GEP instruction.

nikic added a comment.Nov 3 2022, 3:50 AM

@nikic Since alloca instructions are iterated more than once, shouldn't the Worklist under PointerReplacer remove instructions that have already been replaced (new instruction inserted without removing the old instruction)? The first function goes through a strange transformation when opaque pointers are used, where the replace(Instruction&) function is called on a GEP instruction which was already replaced in a previous instruction. This somehow causes the removal of the new GEP instruction.

Not sure I follow, which IR function are you referring to here? Generally InstCombine will automatically clean up dead (side-effect-free) instructions, but it may be possible that this needs to be done explicitly once phis are involved (as they might no longer be trivially dead if there is a cycle).

gandhi21299 added inline comments.Nov 3 2022, 11:35 AM
llvm/test/Transforms/InstCombine/replace-alloca-phi.ll
4

Is that a complete specification? I am getting the following error:

error: Missing size specification for pointer in datalayout string

jmmartinez added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
60–64

Cosmetic review: What do you think about using the varadic template version of isa ? You could collapse this into isa<BitCastInst, AddrSpaceCastInst, PHINode>(I)

arsenm added inline comments.Nov 7 2022, 9:32 AM
llvm/test/Transforms/InstCombine/replace-alloca-phi.ll
4

target datalayout = "p5:32:32-A5"

gandhi21299 marked an inline comment as done.
  • In case one of the incoming values being replaced belongs to a different address space with respect to other incoming values, cast them to the target address space.
  • Included tests suggested by @nikic
  • Minor nits suggested by reviewers
  • removed the test backup file I added accidentally
gandhi21299 marked 3 inline comments as done.Nov 8 2022, 10:55 AM
gandhi21299 added inline comments.
llvm/test/Transforms/InstCombine/replace-alloca-phi.ll
4

isOnlyCopiedFromConstantMemory seems to depend on AMDGPUAliasAnalysis results. It returns false for the tests I added when I replace the triple string with the datalayout you suggested.

cc @bcahoon

@nikic Would you still prefer for this patch to be split?

  • removed Copy argument from PointerReplacer::collectUsers(...)
  • removed included headers
  • Eliminated noalias and readonly attributes for parameters as they are irrelevant to this patch
nikic requested changes to this revision.Nov 11 2022, 1:41 AM

I just tested the updated patch, and this case still gets miscompiled:

@g1 = constant [32 x i8] zeroinitializer

define i32 @test_memcpy_after_phi(i1 %c, ptr %ptr) {
entry:
  %a = alloca [32 x i8]
  br i1 %c, label %if, label %join

if:
  br label %join

join:
  %phi = phi ptr [ %a, %if ], [ %ptr, %entry ]
  call void @llvm.memcpy.p0.p0.i64(ptr %phi, ptr @g1, i64 32, i1 false)
  %v = load i32, ptr %phi
  ret i32 %v
}
 
declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1)

The memcpy gets removed, even though it might be writing to the %ptr argument rather than the %a alloca.

llvm/test/Transforms/InstCombine/replace-alloca-phi.ll
4

You should be able to avoid making this AMDGPU specific by copying from a constant global, instead of copying from a special AMDGPU address space.

This revision now requires changes to proceed.Nov 11 2022, 1:41 AM
gandhi21299 marked 2 inline comments as done.Nov 11 2022, 12:35 PM

@nikic Great catch! The memcpy needs to be preserved in this case. I suppose isOnlyCopiedFromConstantMemory(..) should return false when this case is hit.

  • When walking over a memtransfer inst in isOnlyCopiedFromConstantMemory(..), check if the destination of the instruction is a PHI node and return false if so.
  • Added the @nikic 's test and another loop test
  • replaced target triple with target datalayout for generalizing across targets.
nikic added inline comments.Nov 12 2022, 12:57 AM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
114

This is not sufficient: E.g. consider doing the memcpy to an addrspacecast of the phi or similar. I think what we'd need instead is that set IsOffset=true for phi nodes, so we'll recursively reject memcpys into it. Of course we would need a different name than IsOffset in that case. It would more generally indicate that the pointer is not equivalent to the base.

gandhi21299 added inline comments.Nov 14 2022, 10:18 AM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
114

@nikic What do you mean by "need a different name"? I could simply emplace_back a pair of the instruction and a true to ValuesToInspect vector right?

If an alloca has a PHI use which happens to be an ancestor of a memtransfer
instruction, then we cannot determine at compile time whether the result of the
PHI is the alloca or an alternative incoming value. As a result, either the
source or the destination is indeterminate. I added a third variable to
ValuesToInspect, to track whether we have already seen a PHI.

arsenm added inline comments.Nov 18 2022, 3:06 PM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
51–52

Use the new C++17 tuple syntax,

344–345

In general InstCombine can't introduce a new address space cast when one didn't exist in the first place

353

Ditto

gandhi21299 marked an inline comment as done.Nov 21 2022, 10:28 AM
gandhi21299 added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
344–345

If an incoming value is eligible to be replaced with a pointer from a different addrspace, other incoming values likely do not belong to the addrspace. This makes addrspacecast unavoidable.

Using C++17 syntax when storing value returned from the ValuesToInspect vector.

arsenm added inline comments.Nov 21 2022, 10:34 AM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
344–345

Right, so I'm saying that would just make this illegal and you need to check the address spaces match. Is there a situation where we would want this to happen, that also isn't possible after InferAddressSpaces?

gandhi21299 added inline comments.Nov 21 2022, 10:40 AM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
344–345

The function @addrspace_cast_remove_alloca in the test I have attached addresses this issue.

  • Eliminated addrspace casts in the PointerReplacer::replace() function
  • If the PHI node under inspection resides in the default addrspace, do not allow replacing a value which copied from a non-default addrspace.
  • Applied clang-format
nikic added inline comments.Dec 8 2022, 6:44 AM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
62

Why do we care about default address space here? I think what you probably want to check is PHI->getType()->getPointerAddressSpace() != TheCopy->getPointerAddressSpace()? But also, this code assumes that TheCopy is available before the PHI is seen, and I don't think anything guarantees that -- order of uses is unpredictable. I think you'd want to perform the bailout in PointerReplacer::collectUsers() instead? Maybe I'm misunderstanding the purpose here.

nikic added inline comments.Dec 8 2022, 7:06 AM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
54

Why do we need the LastPHI variable? I'd have expected that just setting IsOffset=true is equivalent? (With a rename IsOffset -> ForbidCopy or something?)

gandhi21299 updated this revision to Diff 481948.EditedDec 11 2022, 1:29 PM
gandhi21299 marked an inline comment as done.
  • Eliminated LastPHI as it had no valuable impact
  • Removed the addrspace cast checks in isOnlyCopiedFromConstantMemory(..) as pointed out. Instead, determine if the alloca is directly used as an incoming value in a PHI while collecting users of an alloca in PointerReplacer::collectUsers(..). If the address spaces of the PHI and the destination of the Copy inst differ and there is a direct use of the alloca, an addrspace cast will be required to further allow the PHI for replacement. Since we do not allow for addrspace casts, reject the PHI at that moment.
gandhi21299 marked 4 inline comments as done.Dec 11 2022, 1:33 PM
gandhi21299 added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
62

I implemented it for the function @addrspace_diff_keep_alloca in the test file. I improved upon the solution by moving the changes over to collectUsers() which does make a lot more sense wrt my previous solution. I hope it seems fine now.

gandhi21299 marked an inline comment as done.

removed dbg statements

arsenm added inline comments.Dec 13 2022, 3:17 PM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
114

I think I'm getting lost in the different revisions of the patch; not sure what this comment is referring to anymore. There isn't anything about phi handling right here?

269

Don't like auto for addrspace

Use unsigned instead of auto

gandhi21299 marked 2 inline comments as done.Dec 13 2022, 11:19 PM
gandhi21299 added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
114

I had a revision previously where I was tracking PHI nodes seperately, as opposed to IsOffset. I replaced it with my changes in PointerReplacer::collectUsers(..).

gandhi21299 marked an inline comment as done.

Allow PHI nodes only if the number of incoming values is less than 5

  • PHI's need to be kept track of to avoid inserting them repeatedly into ValuesToInspect.
  • Applied clang-format and rebased
  • Insert PHI into ValuesToInspect only when it hasn't seen before. Otherwise, continue with the use-def traversal.

@nikic Thanks for committing on my behalf, may I close this revision?

@gandhi21299 I only landed the basic isOnlyCopiedFromConstantMemory() support, but the PointerReplacer part is still open. I'm think the logic there is still not quite correct -- will take another look tomorrow.

nikic requested changes to this revision.Jan 12 2023, 5:32 AM

Rebased patch over committed changes: https://gist.github.com/nikic/ce69a66fdc0985d1af7a56bf024bc860

With additional tests added (in particular @addrspace_diff_keep_alloca_extra_gep) the patch causes an assertion failure.

You are currently going over phi operands and checking whether one of them is the memcpy dest, but I'm not sure what this check is intended to achieve, semantically. I think it just works by accident on existing test cases.

The problem is that we need a closed graph to perform the pointer replacement (can't have any external inputs) and additionally the worklist also needs to be ordered correctly, so that inputs are always replaced before the using instruction. The former is a hard requirement for this to work, the latter is a requirement of the current algorithm.

If you don't want to change the current replacement approach, then I think a good way to enforce this would be to do the following: When processing a phi node, check whether all phi operands are already in the worklist. If not, don't visit the phi yet (wait until everything is in the worklist). Add the phi to some extra set, and after collectUsers(), check that all phis are in the worklist. This is to handle the case where an input was not part of the graph and the phi is never added to the worklist.

This revision now requires changes to proceed.Jan 12 2023, 5:32 AM
  • Revised PointerReplacer::collectUsers(..) as requested.
  • Passes all tests under check-llvm-transforms, including replace-alloca-phi.ll
  • Rebased and removed an irrelevant comment.
nikic added inline comments.Jan 13 2023, 1:39 PM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
289

I don't think you need this check (and as such this whole loop can be dropped).

302

I think this should only be done after the whole collectUsers() phase, currently you do it on each recursive call.

gandhi21299 marked 2 inline comments as done.
  • Eliminated the redundant loop as pointed out by reviewer
  • Created a new method that drives the recursive algorithm for collecting users, this was necessary to avoid publicizing the member variables of PointerReplacer.
  • clang-format
gandhi21299 retitled this revision from [InstCombine] Handle PHI nodes when eliminating constant memcpy to [InstCombine] Handle PHI nodes in PtrReplacer.Jan 13 2023, 2:29 PM
gandhi21299 edited the summary of this revision. (Show Details)
nikic added a comment.Jan 13 2023, 2:38 PM

Logic looks correct to me now. Looks like the test changes got lost though?

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
15

Shouldn't be needed (already used in file).

247

Copy argument no longer needed

268

Could make ValuesToRevisit use Instruction * rather than Value * and save this cast.

282

Can use any_of instead of for here and avoid the ValueInserted variable.

gandhi21299 marked 3 inline comments as done.Jan 13 2023, 3:04 PM

Logic looks correct to me now. Looks like the test changes got lost though?

I rebased the changes and there weren't any test changes in the reduced version of this patch.

gandhi21299 marked an inline comment as done.Jan 13 2023, 9:30 PM
  • Changes as requested by reviewer, updated tests accordingly
nikic added inline comments.Jan 14 2023, 12:29 AM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
269

Drop llvm:: prefix.

276

Missing the insertion into ValuesToRevisit here?

  • Replaced calls from collectUsers to collectUsersRecursive, which now generates the code we expect for @loop_phi_remove_alloca and @loop_phi_late_memtransfer_remove_alloca.
gandhi21299 marked 2 inline comments as done.Jan 14 2023, 11:23 AM
  • removed #include "STLExtras.h"
nikic added inline comments.Jan 16 2023, 1:17 AM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
55

Spurious change.

276–282

The ValuesToRevisit update should happen outside the any_of.

438

Spurious change.

llvm/test/Transforms/InstCombine/replace-alloca-phi.ll
11

Broken check line (should be CHECK).

gandhi21299 marked 5 inline comments as done.
  • changes as suggested by reviewer
nikic accepted this revision.Jan 17 2023, 3:59 AM

LGTM

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
276

Unneeded Inst capture.

279

Spurious change

This revision is now accepted and ready to land.Jan 17 2023, 3:59 AM
gandhi21299 marked 2 inline comments as done.
  • requested updates

Thanks a bunch for such in-depth review, I will commit this patch.

This revision was landed with ongoing or failed builds.Jan 17 2023, 9:56 AM
This revision was automatically updated to reflect the committed changes.