This is an archive of the discontinued LLVM Phabricator instance.

[ADCE] Add handling of PHI nodes when removing control flow
ClosedPublic

Authored by david2050 on Aug 23 2016, 6:54 PM.

Details

Summary

This is part of a series of patches to evolve ADCE.cpp to support
removing of unnecessary control flow.

This patch updates the propagation of liveness information to handle
special properties of PHI nodes.

We still force all terminators live for now until we add code to
handle removing control flow in a later patch.

No changes to effective behavior with this patch

Previous patches:

D23559 [ADCE] Add control dependence computation
D23225 [ADCE] Modify data structures to support removing control flow
D23065 [ADCE] Refactor anticipating new functionality (NFC)
D23102 [ADCE] Refactoring for new functionality (NFC)

Diff Detail

Event Timeline

david2050 updated this revision to Diff 69069.Aug 23 2016, 6:54 PM
david2050 retitled this revision from to [ADCE] Add handling of PHI nodes when removing control flow.
david2050 updated this object.
david2050 added reviewers: mehdi_amini, nadav, majnemer.
david2050 added subscribers: llvm-commits, freik, twoh.
majnemer added inline comments.Aug 23 2016, 7:56 PM
lib/Transforms/Scalar/ADCE.cpp
286

I'd use auto *.

329

Ditto.

443

Ditto.

david2050 updated this revision to Diff 69072.Aug 23 2016, 8:46 PM
david2050 updated this object.

Use auto keyword

mehdi_amini added inline comments.Aug 23 2016, 8:48 PM
lib/Transforms/Scalar/ADCE.cpp
116

Can you revisit the comment? I find it hard to figure what you want to do here.

Also, as a side note, "verify" makes me think of a "check", which does not go well with the name of the method prefixed with "mark".

448

This is not clear to me, I have no idea what is going on in this code?
I mean I can follow what your doing, but I don't know why.

david2050 updated this revision to Diff 69103.Aug 24 2016, 6:04 AM

Expand comments on actions taken by makeLivePhiNodeInputs

mehdi_amini added inline comments.Aug 25 2016, 11:12 AM
lib/Transforms/Scalar/ADCE.cpp
116

s/poeration/operation

121

s/hte/the

469

There is no br label %cond.false here.

470

Thanks for the extensive comment and the example, unfortunately this is not really the information I was looking for.
Phi nodes are dependent on the control flow, and we need to mark predecessors live in general, I got that part, but what isn't just clear to me is *why* only *some* blocks are marked live.

For instance:

define i8 @foo(i1 %cond1, i1 %cond2, i8 %a, i8 %b, i8 %c, i8 %d) nounwind {
entry:
  br i1 %cond1, label %first_true, label %first_false
first_true:
  br i1 %cond2, label %second_true, label %second_false
first_false:
  br label %end
second_true:
  br label %end
second_false:
  br label %end
end: 
  %result = phi i8 [ %a, %first_false ], [ %b, %second_true ], [ %c, %second_false ] 
  ret i8 %result
}

Assuming none of the br label %end is live, and considering the only phi, as written, all predecessors will be marked live.
If I replace the phi with phi i8 [ %a, %first_false ], [ %a, %second_true ], [ %c, %second_false ] , then only %second_false is marked live.

Now, what I find annoying is that with the same edges but just changing the order of the operands, you would get a different results. For instance: phi i8 [ %c, %second_false ], [ %a, %first_false ], [ %a, %second_true ] now %second_false is *not* marked live and instead %first_false and %second_true are.

472

s/CommonReachngDefintion/CommonReachingDefinition/

david2050 updated this revision to Diff 69277.Aug 25 2016, 11:47 AM
david2050 marked 9 inline comments as done.

Address comments.
Remove code that is functionally redundant.

mehdi_amini added inline comments.Aug 25 2016, 11:50 AM
lib/Transforms/Scalar/ADCE.cpp
465

Sorry missed this one: S/tis/this

david2050 added inline comments.Aug 25 2016, 12:34 PM
lib/Transforms/Scalar/ADCE.cpp
470

You are correct the it is sensitive to the order. Marking one is sufficient for correctness. I have considered marking the lexically first one as live which would provide more stability but it is a rare occurrence so it is not clear if it is worth fixing. I don't have any collected data on this.

david2050 marked an inline comment as done.Sep 1 2016, 9:10 AM
david2050 added inline comments.
lib/Transforms/Scalar/ADCE.cpp
470

I ran an experiment by prioritize that we keep the lexically last defined value. This should bias toward forcing live earlier branch decisions which are likely to be forced live anyway if we mark a later branch live. This had minimal impact on the my set of C++ tests (two changes, in different directions).

dberlin added inline comments.
lib/Transforms/Scalar/ADCE.cpp
483

I'm curious why you bother with this detection?
Pretty much every other pass we have can already do a better job of it :)
That is, they can determine not just if they are dead or trivially equivalent, but produce equivalent values.

You are basically detecting the block is not necessary to produce that value to the phi node.
That's, IMHO, more of a redundancy elimination technique than a dead code one.
The code isn't dead, it's just equivalent to some other code.
Or am i missing something?

Do you have examples where the standard pass pipeline doesn't remove the block producing the equivalent value?

david2050 added inline comments.Sep 5 2016, 10:32 AM
lib/Transforms/Scalar/ADCE.cpp
483

This step is necessary otherwise we delete a branch which determines a live value. It is an artifact of the semantics of PHI nodes. This is about preserving existing structure not translating to a new form.

And Yes, outside of loops, the baseline code catches almost all cases. The primary motivation was dead, may-be-infinute loops.

dberlin added inline comments.Sep 5 2016, 11:17 AM
lib/Transforms/Scalar/ADCE.cpp
483

I understand the semantics of phi nodes :)

I'm asking why you don't just mark them *all* live.
Traditionally, in DCE, what would happen is that if the phi is live, you mark all values that are incoming to the phi as live (and in a control dependence DCE, any control dependences of the necessary edges)

Here, it seems like you are trying to figure out if the phi is useless by seeing if all branches have the same incoming value, and if so, you are arbitrarily choosing one so you can delete at least one branch.
That seems .. pointless.

It should only also matter in the case of either predecessors that only exist for control flow (IE empty basic blocks), or critical edges.

The only interesting loop case i'm aware of where this matters is:
if (b)
{
int i;
for (i = 0; i<1000; ++i);
j = 0;
}
return j;
}

But your current code won't allow deleting the loop in this case, AFAICT.
(among other things, you would need to determine what the control dependence would look like if the critical edge was split without splitting it. This is what GCC does)

dberlin added inline comments.Sep 5 2016, 7:20 PM
lib/Transforms/Scalar/ADCE.cpp
483

(and to be super clear, when i say "I'm asking why you don't just mark them *all* live.", i mean "all the arguments and edges of a given phi node", not "all phis").

david2050 added inline comments.Sep 6 2016, 9:12 AM
lib/Transforms/Scalar/ADCE.cpp
483

It is not all branches, it is all branches from dead predecessors since in the case that we should remove all such predecessors to be replaced with a single edge, we verify that there is a unique value for that edge. If not we select one predecessor to keep live. Consider

a = 0
if (x) { 
      if (y) a =1
      else  a = 2
 }
 else a  =3
 phi(...)

The phi node following this block has tree edges labeled with values 1,2,3 respectively.
The guarded basic blocks are empty because the assignments are folded into the phi node.
The branch decision at which tests 'y' is dead at this point because there are no live operations
in any block it controls.
Here we determine that we need to keep at least one of the empty blocks around to distinguish between 1 and 2, which models the behavior if the assignments to 'a' were instead explicitly in the original blocks.

Given a more complex example we might have

a = 0
if (x) { 
      if (z)  
         a=1
      else { 
         if (y)  { }  
         else { } 
      }
 }
 else a  =3
 phi(...)

Here there are 4 predecessors to the join, three are 2 but only two distinct values and we only
need to have the branch on 'z' live. The branch on 'y' need not be. This is why the code does not mark
all incoming edges live but does not do the extra work to try and pick and optimal choice (the path with a=1 in this case). The scenario is rare enough that it is not clear extra work is worth the effort.

In your example, I believe the current code will allow the remove the loop since the assignment to J is not control dependent on the loop but only on the branch testing 'b'. The subset of the CFG corresponding to the loop would be all marked dead and removed and the edge leaving the block holding the test on 'b' redirected to the block holding 'j=0'. I don't understand your reference to splitting a critical edge.

dberlin added inline comments.Sep 6 2016, 10:30 AM
lib/Transforms/Scalar/ADCE.cpp
483

(Sorry if this is short, i lost this comment once already).
Okay, we are talking past each other.
I understand what you are doing, and why you are trying to mark only certain values of a phi live.
I'm saying "this is not worth it, and is complex" *compared to just marking all of them*. You produced an example showing it removes a single block. I agree it can remove that block.
Instead of arguing by example, i produced data:

I ran your patch on ~5000 C++ packages as is, and then ran it on ~5000 C++ packages with it changed to just mark all phi node inputs as live when the phi node is live.

The number of cases i find where it makes a difference is ~0.01%.
The average size of the binaries in which it does something are ~16 bytes smaller (which, for those binaries, amounts to an average size change of 0.00005%)

That seems "not worth it" to me :)

Now, i'm just trying to provide data not actually to say "we shouldn't do it this way", but more "if we are going to add complexity, let's do it where we have data that shows it's worth it". We both can clearly come up with contrived examples where it matters.

So if you have data showing trying to only mark single branches of unique values live is worth it, great, add the data and let's stop arguing, and let's do it.

"In your example, I believe the current code will allow the remove the loop since the assignment to J is not control dependent on the loop but only on the branch testing 'b'. The subset of the CFG corresponding to the loop would be all marked dead and removed and the edge leaving the block holding the test on 'b' redirected to the block holding 'j=0'. I don't understand your reference to splitting a critical edge."

You should try it before you knock it :)

You may have to change it to j=b or add a new parameter somewhere, depending on what passes you run before it (otherwise the phi ends up with the constant propagated into it).

You will end up with a phi node in the end block that merges the else-condition value (IE from outside the condition entirely) of jwith the inside-condition value. Control-dependence will force you to keep the empty loop live due to the critical edge (again, whether the edge stays critical or not depends on where you end up in the pass pipeline)

This case is more common, it affects at least ~1% of the *packages*.
I did not fix it, so i can't say how much it saves or not saves.
It is common enough that gcc special cases it (as i said) and has regression tests to make sure the loop is eliminated by DCE.

david2050 added inline comments.Sep 6 2016, 11:12 AM
lib/Transforms/Scalar/ADCE.cpp
483

Thanks for great due diligence for the patch.

The experiment you ran seems doomed because the patches submitted so far are incomplete. If you look around line 305 there is loop which forces all branches live. This is temporary waiting for the next patch which includes the code to delete branches. Right now the patch reduces to prior behavior of not actually deleting in branches although it takes a long way to get there. Thus any changes you see based on your edit are curious since it should not have any impact on the generated code at all.

I certainly agree that even the relatively simple pick-one-value here may have no value. Perhaps we can accept this change as is, with a TODO to evaluate this point after the next (and final) patch is up? (Assuming there is nothing else to change along the way)

dberlin added inline comments.Sep 6 2016, 12:00 PM
lib/Transforms/Scalar/ADCE.cpp
483

I feel like you are implying something here.

In this case, I simply hacked in the right parts of the rest of the old patch (and a little work) before it was broken up before running the experiment. (You could have simply asked for the code if you are concerned).

You are talking about something that is < 50 lines of code to make work (control dependence DCE is not that hard, nor is this even the first implementation of it in LLVM!), so i'm not sure why you would think i didn't run the experiment properly?
(I could also point out i don't actually care about correctness of the generated code, only the maximum optimization value)

If you would like to actually run *your own experiment* show that this is worth it, you are, as i said, welcome. You may want to focus on that instead of trying to pick mine apart ;)

Right now, like I said, i see *nothing* that makes it seem like this is worth it, and your response here doesn't help that. If you want to produce data that says it's worth it, great!. Otherwise, I'm loathe to start by adding complexity and then try to take it away later. That never works.

david2050 added inline comments.Sep 6 2016, 12:30 PM
lib/Transforms/Scalar/ADCE.cpp
483

No implications, and apologies for suggesting you were incomplete in your analysis.

To be clear then, in your variant, you marked every basic block which is a predecessor of a live-phi as live?

I ran this test:

int foo(int a, int b, int N) {
  int j = 0;
  if (b) {
    int i;
    for (i = 0; i < N; i++)
      ;
    j = 1;
  }
  return j;
}

compiled to bc and then processed: opt -sroa -adce -adce-remove-loops and it did successfully remove the loop. I also tried this variant

int foo(int b, int N) {
  int j = 0;
  if (b) {
    int i;
    j = 1;
    for (i = 0; i < N; i++)
      ;
  }
  return j;
}

The final output for that one looked like:

define i32 @foo(i32 %b, i32 %N) #0 {
  %1 = icmp ne i32 %b, 0
  br i1 %1, label %2, label %3

; <label>:2:                                      ; preds = %0
  br label %3

; <label>:3:                                      ; preds = %2, %0
  %j.0 = phi i32 [ 0, %0 ], [ 1, %2 ]
  ret i32 %j.0
}

This snippet

int foo(int b, int j, int N) {
  if (b) {
    int i;
    for (i = 0; i < N; i++)
      ;
    j =	0;
  }
  return j;
}

via

clang -c -emit-llvm -O0 ...
opt -sroa -adce -adce-remove-loops -S ...

generates

define i32 @foo(i32 %b, i32 %j, i32 %N) #0 {
  %1 = icmp ne i32 %b, 0
  br i1 %1, label %2, label %3

; <label>:2:                                      ; preds = %0
  br label %3

; <label>:3:                                      ; preds = %2, %0
  %.0 = phi i32 [ %j, %0 ], [ 0, %2 ]
  ret i32 %.0
}
david2050 updated this revision to Diff 70635.Sep 7 2016, 7:13 PM

Change handling of PHI nodes to force predecessors live

david2050 updated this revision to Diff 70829.Sep 9 2016, 6:58 AM

Introduce CFLive field to mark blocks whose control dependence sources should be live; use this for PHI predecessors and blocks with live operations

mehdi_amini accepted this revision.Sep 19 2016, 8:20 AM
mehdi_amini edited edge metadata.

A lot simpler now!

LGTM, thanks.

This revision is now accepted and ready to land.Sep 19 2016, 8:20 AM
majnemer added inline comments.Sep 19 2016, 10:10 AM
lib/Transforms/Scalar/ADCE.cpp
391

I don't think you need llvm:: here.

394

Needs a space after 'if'

david2050 marked 2 inline comments as done.Sep 19 2016, 10:35 AM
david2050 added inline comments.
lib/Transforms/Scalar/ADCE.cpp
391

stupid xcode autocompletion :-)

david2050 updated this revision to Diff 71847.Sep 19 2016, 10:37 AM
david2050 marked an inline comment as done.
david2050 edited edge metadata.

Respond to David's comments

majnemer accepted this revision.Sep 19 2016, 10:51 AM
majnemer edited edge metadata.

LGTM

david2050 closed this revision.Sep 19 2016, 4:25 PM