Page MenuHomePhabricator

[InstCombine] Fold add/sub over phi node
Needs ReviewPublic

Authored by inouehrs on Mar 6 2018, 6:21 AM.



This patch adds FoldPHIUserOpIntoPred, which finds a phi node that is

  1. used by only one add/sub instruction and
  2. all its incoming values are ConstantInt or add/sub instruction used only by this phi node.

For such Phi nodes, we can eliminate add/sub instruction after the phi node by changing immediates of previous add/sub instructions.

Example of redundant add instruction to be optimized:

  %add = add i64 %a, 5 # -> immediate will be changed to 6
  br label %BB3
  %sub = sub i64 %b, 3 # -> immediate will be changed to 2
  br label %BB3
  %phi = phi i64 [ %add, %BB1 ], [ %sub, %BB2 ]
  %rc = add i64 %phi, 1 # -> will be removed

Additionally, if only one incoming value to the phi node does not meet above conditions, we can move the add/sub instruction to avoid partially redundant computation.
This optimization happens about 19k times (4k for full redundant cases and others for partially redundant cases) while building LLVM/Clang.

I am going to add similar optimization for logical operations as well as add/sub as a separate patch later.

Diff Detail

Event Timeline

inouehrs created this revision.Mar 6 2018, 6:21 AM

So far, I disabled this optimization for Hexagon since this patch affects Hexagon loop idiom recognition (test/CodeGen/Hexagon/loop-idiom/pmpy-mod.ll).
Should I add a flag in backend to control this optimization rather than explicitly checking the triple in this method?

It looks like a lot of the test changes are changing post-increment loop exit checks to pre-increment loop exit checks; that isn't a profitable transform.

This transform is a variation of some of the other phi of ops vs op of phi transforms we already do, just applied to a very limited case of expressions that would not normally be fully available, but where you've decide the cost is low.
It is also likely to screw up loop induction variables :)

None of these are fully redundant (because the expressions do not currently exist in the program in the places you have them), they are all partially redundant.
There is no need to special case it as you have in general, it just requires expression insertion. You are inserting them too, you are just repurposing part of an existing expression that you know will be dead.

The generalization is essentially what you get if you apply the phi-of-ops transform to PRE on top of NewGVN.

I doubt this should be done here because it's going to be hard to control the cost model or anything else here, and it's non-trivial to predict the effects.

inouehrs added a comment.EditedMar 7 2018, 1:32 AM

Thanks for the comments!
After disabling optimization for loop induction variables, optimization still happens more than 10k during the bootstrap. However, I cannot see visible change in code size and performance with benchmarks.
So, I will revisit this when I find a realistic case for which it matters.

As a simplified toy example, the following C code results in obviously redundant generated code:

unsigned long func(unsigned long v, bool b) {
  unsigned long rc;
  if (b) rc = v + 1;
  else rc = callee(v);
  return rc - 1;

generates a code sequence like

	addi 3, 3, 1
	addi 3, 3, -1

Yeah, this is a generally tricky case to get.
It requires a value based PRE that is going to do a phi of ops transform,
because this testcase is really:

if (b) phival1 = v + 1
else phival2 = callee(v)
phi = (phival1, phival2)
return phi -1

So it has to understand this is equivalent to:

if (b) phival1 = v+ 1, tmpval1 = v
else phival2 = callee(v), tmpval2 = 1 + callee
phi = (phival1, phival2)
return phi-1

Then it can see that v is already available, so tmpval1 is available, so
this is PREable.

None of our PRE's do.
NewGVN does detect this, but it's not a full redundancy, and full
NewGVN-PRE is not done.
IE you will get:
Processing instruction %10 = sub nsw i64 %.0, 1
Simplified <badref> = sub nsw i64 %6, 1 to variable i64 %0
Found phi of ops operand i64 %0 in %5
Cannot find phi of ops operand for <badref> = sub nsw i64 %8, 1 in block

You could trivially handle these kinds of case though by making
makePossiblePHIOfOps do PRE insertion depending on safety and availability
in preds. It computes availability already, and knows this is a PRE case.
It computes some parts of safety but not the parts you'd need to insert
memory instructions.
It also would not be "lifetime optimal" PRE.

It would subsume what we do for GVN's scalar PRE, but it would also require
multiple iterations of NewGVN (just as we require multiple iterations of
GVN's PRE) to get all cases instead of being able to do it in one step.

If you change it to a full redundancy, you'll see it will already perform
the no-cost transform:

c = v+2
printf("%d\n", c); make c used
if (b) phival1 = v + 1
else phival2 =v + 3
phi = (phival1, phival2)
return phi -1
c = v+2
printf("%d\n", c);
make c used
if (b) phival1 = v + 1
else phival2 =v + 3
phi = (phival1, phival2)
phiofops = (v, c)
return phiofops

(I didn't add global reassociation, so how far you get here depends on how
good a just InstSimplify does at reassociating for you. It's actually very
trivial to add though if we wanted)

Bryant Wong is working on a NewGVN PRE that could perform this in a
lifetime optimal way if you add the phi of ops transform we do in the
appropriate places.