This is an archive of the discontinued LLVM Phabricator instance.

[SROA] Fold a PHI node if all its incoming values are the same
ClosedPublic

Authored by jingyue on Jul 24 2014, 11:07 AM.

Details

Summary

Fixes PR20425.

During slice building, if all of the incoming values of a PHI node are the same, replace the PHI node with the common value. This simplification makes alloca's used by PHI nodes easier to promote.

Diff Detail

Event Timeline

jingyue updated this revision to Diff 11851.Jul 24 2014, 11:07 AM
jingyue retitled this revision from to [SROA] Simplify PHI nodes before promoting the alloca.
jingyue updated this object.
jingyue edited the test plan for this revision. (Show Details)
jingyue added reviewers: chandlerc, nlewycky, eliben, meheff.
jingyue added a subscriber: Unknown Object (MLST).
eliben added inline comments.Jul 24 2014, 11:35 AM
lib/Transforms/Scalar/SROA.cpp
3101

by "of the PHI nodes" do you mean "PN"?

3102

Please document all the arguments of this function

3140

Does the logic have to be inverted? If you only use this method in a "!" clause, can't it check positively - i.e. origination from only NewAI? This can make reasoning about it simpler.

3143

whitespace around ":"

meheff edited edge metadata.Jul 24 2014, 1:06 PM

Just a couple suggestions which might simplify the code.

lib/Transforms/Scalar/SROA.cpp
3106

Couple bike-shed suggestions:

Maybe name the function PHINodeEquivalentToSource?

You could also replace both ptr sets with a map<PHINode *, bool> where membership indicates it's been visited and the value indicates equivalence.

3141

If you use a map<PHINode *, bool> for equivalence as mentioned above, you can cleanly do away with ToDelete. Just iterate through the map and replace the nodes with a true value. I, think, necessarily those will all be in PHIUsers.

jingyue updated this revision to Diff 11855.Jul 24 2014, 2:36 PM
jingyue edited edge metadata.

Thanks Eli and Mark for your comments! I believe the logic is simpler now.

jingyue added inline comments.Jul 24 2014, 2:38 PM
lib/Transforms/Scalar/SROA.cpp
3101

Changed the comment to "incoming values of PN".

3102

done

3106

Merged these data structures to one DenseMap, and renamed the function to PHINodeEquivalentToNewAlloca.

3140

done

3141

Agreed. Done.

3143

Ack'ed.

chandlerc added inline comments.Jul 24 2014, 2:39 PM
lib/Transforms/Scalar/SROA.cpp
3097–3102

I think this is the wrong approach in several ways.

Depth first search when there is an early exit shortcut is usually more expensive than testing the predicate breadth first while adding the nodes to a worklist. Also you should use a worklist rather than recursion or you would easily blow out the stack. Finally, I agree that we shouldn't need a set for originating from other and just a visit....

But before we dive into the details to fix these issues -- why is this the right approach? Walking *up* from the PHI operands seems necessarily much more work than walking *down* from the alloca. In fact, we're already going to walk down from the alloca because we iterate on every non-promoted alloca. Why can't we fold no-op PHI nodes in the partition builder the same way we do no-op selects? I think that would solve the problem you have here with no extra use-list traversal.

For future reference, keep in mind that uselist traversal is a cache-hostile thing. We should minimize the number of traversals and merge traversals where possible.

3102

Wo don't actually insist on that usually. I'd rather the arguments have descriptive names than comments personally.

jingyue updated this revision to Diff 11909.Jul 25 2014, 9:28 PM
jingyue retitled this revision from [SROA] Simplify PHI nodes before promoting the alloca to [SROA] Fold a PHI node if all its incoming values are the same.
jingyue updated this object.

Chandler, thanks for your suggestions! I think moving the logic to SliceBuilder makes perfect sense.

After modifying SliceBuilder::visitPHINode, I found it almost the same as visitSelectInst. Therefore, I also refactored the code to merge visitPHINode and visitSelectInst to one function. Let me know if it looks good to you.

chandlerc edited edge metadata.Jul 25 2014, 10:45 PM

Yea, this looks nice. I have a bunch of nit-picky comments below... Nothing really significant.

lib/Transforms/Scalar/SROA.cpp
332

Just use nullptr as the initial value?

333

I prefer to only use "I" (capitalized) for iterators. I'd use 'i' or 'Idx' here.

(and we should really add an iterator and range adaptor set to PHINode for these so you can use a range based for loop...)

336–344

I think this would be slightly simpler as:

if (isa<UndefValue>(IV))
  continue;

if (!CommanValue)
  CommonValue = IV;
else if (CommonValue != IV)
  return nullptr;
671–679

I'd probably put this in a 'foldPHINodeORSelectInst' static helper so you can write the if below:

if (Value *Result = foldPHINodeOrSelectInst(I)) {
699

How about just "Size"?

test/Transforms/SROA/phi-and-select.ll
567–568

I like to have all my CHECKs within the function body so I can find them and they don't get spliced. I also try to attach them to the instruction they reference. Can you put the label at the start of the function and then the -NOT after the alloca?

jingyue updated this revision to Diff 11915.Jul 26 2014, 7:51 AM
jingyue updated this object.
jingyue edited edge metadata.

All comments addressed

jingyue added inline comments.Jul 26 2014, 7:55 AM
lib/Transforms/Scalar/SROA.cpp
332

That would make foldPHINode return null for phi(undef, undef).

333

Done.

I agree. Will add a range for PHINode in a separate patch.

336–344

Done

671–679

done

699

done.

test/Transforms/SROA/phi-and-select.ll
567–568

done

Hi Duncan,

I tried SimplifyPHINode and it worked pretty well. Thanks!

That makes me consider using SimplifySelectInst on select instructions too. However, I found one regression test PR16651.2 would fail after this potential modification. We would transform

define void @PR16651.2() {
; This test case caused a crash due to failing to promote given a select that
; can't be speculated. It shouldn't be promoted, but we missed that fact when
; analyzing whether we could form a vector promotion because that code didn't
; bail on select instructions.
;
; CHECK-LABEL: @PR16651.2(
; CHECK: alloca <2 x float>
; CHECK: ret void

entry:
  %tv1 = alloca { <2 x float>, <2 x float> }, align 8
  %0 = getelementptr { <2 x float>, <2 x float> }* %tv1, i64 0, i32 1
  store <2 x float> undef, <2 x float>* %0, align 8
  %1 = getelementptr inbounds { <2 x float>, <2 x float> }* %tv1, i64 0, i32 1, i64 0
  %cond105.in.i.i = select i1 undef, float* null, float* %1
  %cond105.i.i = load float* %cond105.in.i.i, align 8
  ret void
}

to

define void @PR16651.2() {
entry:
  %cond105.in.i.i = select i1 undef, float* null, float* undef
  %cond105.i.i = load float* %cond105.in.i.i, align 8
  ret void
}

Is this transformation on PR16651.2 valid? If no, can somebody help me understand why it isn't?

Thanks,
Jingyue

  • Original Message -----

From: "Jingyue Wu" <jingyue@google.com>
To: jingyue@google.com, nlewycky@google.com, eliben@google.com, meheff@google.com, chandlerc@gmail.com
Cc: llvm-commits@cs.uiuc.edu
Sent: Sunday, July 27, 2014 11:21:24 AM
Subject: Re: [PATCH] [SROA] Fold a PHI node if all its incoming values are the same

Hi Duncan,

I tried SimplifyPHINode and it worked pretty well. Thanks!

That makes me consider using SimplifySelectInst on select
instructions too. However, I found one regression test PR16651.2
would fail after this potential modification. We would transform

define void @PR16651.2() {
; This test case caused a crash due to failing to promote given a
select that
; can't be speculated. It shouldn't be promoted, but we missed that
fact when
; analyzing whether we could form a vector promotion because that
code didn't
; bail on select instructions.
;
; CHECK-LABEL: @PR16651.2(
; CHECK: alloca <2 x float>
; CHECK: ret void

entry:
  %tv1 = alloca { <2 x float>, <2 x float> }, align 8
  %0 = getelementptr { <2 x float>, <2 x float> }* %tv1, i64 0, i32 1
  store <2 x float> undef, <2 x float>* %0, align 8
  %1 = getelementptr inbounds { <2 x float>, <2 x float> }* %tv1, i64
  0, i32 1, i64 0
  %cond105.in.i.i = select i1 undef, float* null, float* %1
  %cond105.i.i = load float* %cond105.in.i.i, align 8
  ret void
}

to

define void @PR16651.2() {
entry:
  %cond105.in.i.i = select i1 undef, float* null, float* undef
  %cond105.i.i = load float* %cond105.in.i.i, align 8
  ret void
}

Is this transformation on PR16651.2 valid?

It looks like this transformation is valid only if loading from an undef address always produces an undef value (and never traps). In the original code, loading from %1 always produced an undef value (because undef was explicitly stored into that location, but would have been undef anyway because it is an otherwise-fresh alloca), but did not trap. So if we always replaced undef addresses with some equivalently-sized constant-pool load (or something like that), then this might work. I don't know that we do that, however.

-Hal

If no, can somebody help
me understand why it isn't?

Thanks,
Jingyue

http://reviews.llvm.org/D4659


llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

jingyue updated this revision to Diff 11968.Jul 28 2014, 6:40 PM
jingyue edited the test plan for this revision. (Show Details)

To avoid redundancy, use SimplifyInstruction to fold PHI nodes and select instructions. One tricky case is when clobbering one operand in

select undef, a, b

we need to have the select always return the other operand by setting the condition "undef" to 0 or 1; otherwise, the transformation of clobbering dead operands may introduce new undefined behavior. Added test @simplify_undef in phi-and-select.ll to cover this case.

Updated basictest.ll per this change.

Can anyone take a look at the latest diff?

In the latest diff, I use SimplifyInstruction to fold both PHI nodes and select instructions. Doing so requires a minor fix in the logic of clobbering dead uses in select instructions: to clobber x/y in "select undef, x, y", we should change the condition from "undef" to 0/1. Otherwise, "select undef, undef, y" or "select undef, x, undef" can introduce more undefined behavior. This wasn't an issue in the old code, because the old code does not fold a select with undef condition.

Sorry, I've been chasing other bugs. Will look today.

Hi Chandler,

I hope your bug chasing is going well. Wondering if you can get a chance to take a look at this patch.

Thanks a lot,
Jingyue

Sorry. I tried to get back to this a couple of times but it required a lot of thought. And then your pings kept hitting me at bad times. =/

lib/Transforms/Scalar/SROA.cpp
689–708

This is pretty horrible. It makes me question the entire approach of cleaning up instructions here.

I hate phase ordering.

If we're going to go down the rabbit hole here if using instsimplify within SROA to address phase ordering issues, I think it would be much cleaner to do as part of the recursive user walk for the pointer value, and to actually RAUW the values as they simplify rather than doing the dead-operand-tracking thing here. Consider -- if we don't RAUW then we won't notice if doing so would cause some user of the select (perhaps another select or PHI) to collapse to all inputs being the same.

But that requires parameterizing PtrUseVisitor, teaching it to use instsimplify, teaching it to track RAUWs correctly, etc etc etc. Yuck. Absolutely terrible. This is why phase ordering problems are hard.

Ultimately, I think the suggestion to use instsimplify is wrong here. We really only want to handle the simplifications that arise *because* of mem2reg, and the primary one there is when all inputs to the PHI node become the same because we promoted them all to the same register (rather than N different loads). We don't need or want to handle the more complex simplifications because other passes will.

If we want to make SROA really powerful against phase ordering issues like this, I think it would need serious architectural changes.

So can we go back to the simpler patch after my round of nit-picky comments? Because I suspect that one is essentially "ready to go".

jingyue updated this revision to Diff 12856.Aug 22 2014, 2:20 PM
jingyue edited the test plan for this revision. (Show Details)

Thanks for the review, Chandler.

I see what you are saying. I agree the current dead-operand-trackign mechanism is a little cumbersome for the use of SimplifyInstruction.

I reverted the use of SimplifyInstruction and left the architectural changes as TODO. PTAL.

chandlerc accepted this revision.Aug 22 2014, 2:29 PM
chandlerc edited edge metadata.

Looks great, thanks!

This revision is now accepted and ready to land.Aug 22 2014, 2:29 PM
jingyue closed this revision.Aug 22 2014, 3:55 PM