Page MenuHomePhabricator

Fixing inst-combine not to drops nsw when combining adds into mul (PR19263)
ClosedPublic

Authored by dinesh.d on May 16 2014, 7:34 AM.

Details

Summary

I tried to fix this but I am not sure about few things.

  1. We always convert (X * C1) + (X * C2) to X *(C1 + C2) even if (C1 + C2)

overflows. e.g.

%1 = mul nsw i32 %x, 2147483647
%2 = mul nsw i32 %x, 3
%3 = add nsw i32 %1, %2

becomes

%1 = mul i32 %x, -2147483646

Is this what we want. In that case, we can blindly copy nsw from
add instruction to mul instruction. If so, I will update this patch.

  1. In this patch, I have tried not to combine add to mul for above case

but that is not enough to stop this conversion (getting converted from
some other code). If above conversion is wrong, I can go through code
and fix this. For now, this patch should take care of not dropping nsw
flags.

  1. I haven't added similar checks in visitSub. Actually, all c examples,

I can come up with, was not creating sub with nsw. If is it required to
add similar check, I can update this patch.

Diff Detail

Repository
rL LLVM

Event Timeline

dinesh.d updated this revision to Diff 9481.May 16 2014, 7:34 AM
dinesh.d retitled this revision from to Fixing inst-combine not to drops nsw when combining adds into mul (PR19263).
dinesh.d updated this object.
dinesh.d edited the test plan for this revision. (Show Details)
dinesh.d added reviewers: bkramer, rafael, majnemer.
dinesh.d added a subscriber: Unknown Object (MLST).
rafael edited edge metadata.Jun 4 2014, 12:37 PM

+ if (ConstantInt *CI1 = dyn_cast<ConstantInt>(LHS))
+ if (ConstantInt *CI2 = dyn_cast<ConstantInt>(RHS)) {
+ APInt ACI1 = CI1->getValue();
+ APInt ACI2 = CI2->getValue();
+ bool IsLHSNegative = ACI1.isNegative();
+ bool IsRHSNegative = ACI2.isNegative();

If both are constants, the result should have been folded before
getting here, no?

Can you rebase on trunk? WillNotOverflowSignedAdd has learned a few
extra tricks.

dinesh.d updated this revision to Diff 10138.Jun 5 2014, 5:03 AM
dinesh.d edited edge metadata.

rebased code on trunk

+ if (ConstantInt *CI1 = dyn_cast<ConstantInt>(LHS))
+ if (ConstantInt *CI2 = dyn_cast<ConstantInt>(RHS)) {
+ APInt ACI1 = CI1->getValue();
+ APInt ACI2 = CI2->getValue();
+ bool IsLHSNegative = ACI1.isNegative();
+ bool IsRHSNegative = ACI2.isNegative();

If both are constants, the result should have been folded before
getting here, no?

Here both constants are from different expression, one from LHS and other
one is from RHS.

Can you rebase on trunk? WillNotOverflowSignedAdd has learned a few
extra tricks.

rebased.

Here both constants are from different expression, one from LHS and other
one is from RHS.

All tests still pass if I delete the extra logic from
WillNotOverflowSignedAdd. Can you add an extra test of split this off
to an independent patch?

Cheers,
Rafael

All tests still pass if I delete the extra logic from
WillNotOverflowSignedAdd. Can you add an extra test of split this off
to an independent patch?

More importantly, it looks like this misoptimizes a few cases. For example,

define i16 @f(i16 %a) {

%b = mul i16 %a, 2
%c = mul i16 %a, 3
%d = add nsw i16 %b, %c
ret i16 %d

}

this would optimize it into

define i16 @f(i16 %a) {

%d = mul nsw i16 %a, 5
ret i16 %d

}

But I think that is invalid. There are inputs where the original add
would not overflow the add but the mul will. For example:

a = 0x2aab
b = 0x5556
c = 0x8001 the computation of c has an overflow, but that is fine
since the mul is not nws
d = 0xd557
no overflow in the add

Cheers,
Rafael

dinesh.d added a comment.EditedJun 5 2014, 11:35 AM

Yes, now WillNotOverflowSignedAdd can handle cases where
one of LHS and RHS is power of 2 and other has known zero
after high bit in first one. I will update test accordingly or if you
think that should go as independent patch I can surely do that.

For you other comment, I think we can not have nsw in addition
of 2 variables even if any one of them is not known to have nsw.
So I think nsw in add is wrong. But I may be wrong.

How about this input
a = 0x2aaa
b = 0x5554
c = 0x7ffe
d = 0xd552

b and c are ok here as mul instructions does not guaranty either
(overflow or no overflow) but add instruction guaranties no sign
overflow and d is overflowing.

dinesh.d updated this revision to Diff 10152.Jun 5 2014, 12:53 PM

updated patch as per comments

There is an interesting organizational problem:

Given

define i16 @f(i16 %a) {

%b = mul nsw i16 %a, 3
%c = mul nsw i16 %a, 7
%d = add nsw i16 %b, %c
ret i16 %d

}

With your patch we produce

define i16 @f(i16 %a) {

%d = mul i16 %a, 10
ret i16 %d

}

but given

define i16 @f(i16 %a) {

%b = mul nsw i16 %a, 2
%c = mul nsw i16 %a, 7
%d = add nsw i16 %b, %c
ret i16 %d

}

it produces

define i16 @f(i16 %a) {

%d = mul nsw i16 %a, 9
ret i16 %d

}

The problem comes from some cases being transformed in
SimplifyUsingDistributiveLaws before getting to your code.

Maybe what should happen is that as an early cleanup patch
dyn_castFoldableMul should be moved and used by
SimplifyUsingDistributiveLaws which would now handle all relevant
cases.

What do you think?

Yet another thing to be careful about. The proposed patch would have
disable the combining in

define i16 @f(i16 %a) {

%b = mul i16 %a, 2
%c = mul i16 %a, 32767
%d = add i16 %b, %c
ret i16 %d

}

I added the testcase in r210287.

dinesh.d updated this revision to Diff 10173.Jun 6 2014, 3:42 AM

I agree with the idea that these cases should get handled in SimplifyUsingDistributiveLaws and
I am working on generalized approach for that.

I have updated patch to just put following changes in record.
I have removed check for overflow because after analyzing few cases, I think we should not
disable transformation to pattern like this even if (2 + 32767)

%b = mul i16 %a, 2
%c = mul i16 %a, 32767
%d = add i16 %b, %c
ret i16 %d

as original case and transformed case will result in same output. When I pointed out case similar to
this, I thought that for some value of 'a', '(b+c)' might be positive but if we add constant which will become
negative due to overflow, result will mismatch.
I realized that as 'a' in integer, if a == 0, we will get 0 in any case and any value of a > 0, b and c will never
be less than 2 and 32767 respectively. so add result will always overflow.

What do you think? Should be disable above transform.

I have added a TODO to fix transform happening in SimplifyUsingDistributiveLaws and dropping nsw.
I will update it soon so that all these pattern gets handle in SimplifyUsingDistributiveLaws.

dinesh.d updated this revision to Diff 10270.Jun 10 2014, 3:08 AM

moved dyn_castFoldableMul logic to InstructionCombining.cpp.
Now SimplifyUsingDistributiveLaws handles all relevant cases.

Looks a lot better, thanks.

lib/Transforms/InstCombine/InstructionCombining.cpp
400 ↗(On Diff #10270)

Please name functions starting with a lowercase letter.
This function can be static.

418 ↗(On Diff #10270)

lowercase, static.

It seems a bit strange to return one operand and put the other one in an out value. Also, this function doesn't do the factorization, it just splits the expression.

Also, the above function for handling 1 could be merged into this (and with that avoid the extra indentation), no? Something like

Try to split Op0 and Op1 into (A op B) and (C op D) with the same operation
in both. This handles cases like matching a shift as a mul and B or D being
// implicitly 1.
static bool matchForFactorization(const BinaryOperator *Op0,

const BinaryOperator *Op1,
Instruction::BinaryOps &OpCode, Value *&A,
Value *&B, Value *&C, Value *&D)

Hi Dinesh,

It looks pretty good overall. Thanks for working on this! Please find my comments inlined.

lib/Transforms/InstCombine/InstructionCombining.cpp
445 ↗(On Diff #10270)

I'd move these definitions closer to their uses. e.g.,

Value *B = nullptr, *D = nullptr;
Value *A = FactorBinaryOps(Op0, B, LHSOpcode);
452 ↗(On Diff #10270)

Correct me if I'm wrong. I think the current logic seems unable to convert

%z = mul nsw i32 %x, %y
%z3 = mul nsw i32 %z, 3
%z4 = add nsw i32 %z, %z3

to

%z = mul nsw i32 %x, %y
%z4 = mul nsw i32 %z, 4

because if %z is in the format of %x * %y, you will not try the identity factorization (i.e., %z * 1).

If it is indeed a missed opportunity, it may not be too difficult to cover. For example, we can try all four combinations:

  1. LHS (binary op), RHS (binary op)
  2. LHS (identity), RHS (binary op)
  3. LHS (binary op), RHS (identity)
  4. LHS (identity), RHS (identity) // probably already handled somewhere else?

Anyway, we can do this later.

466 ↗(On Diff #10270)

I feel the logic of this IF and next IF (if (RightDistributesOverLeft) a bit difficult to follow. There is also some copy and paste. What about extracting the check on InnerCommutative out? Something like:

try(a, b, c, d);
if (op' is commutative) {
  try(b, a, c, d);
  try(a, b, d, c);
  try(b, a, c, d);
}

By doing this, inside each "try", we don't need to think about whether op' is commutative.

What do you think?

dinesh.d updated this revision to Diff 10499.Jun 17 2014, 7:12 AM

Update patch as per review comments

Thanks for review. I ahve added comments inline.

lib/Transforms/InstCombine/InstructionCombining.cpp
400 ↗(On Diff #10270)

updated.

I have kept this funtion so it can be used to get identity values
for other operations e.g. and, or etc. as well.

418 ↗(On Diff #10270)

Update function name and added static.
I have to keep both function separate to try to explore optimization, jingyue
has mentioned in his comment.

I have changed function parameter and return value. Let me know if it looks ok.

445 ↗(On Diff #10270)

updated.

452 ↗(On Diff #10270)

updated. New patch takes care of all except LHS (identity), RHS (identity).
it is getting handled before SimplifyUsingDistributiveLaws called.

466 ↗(On Diff #10270)

Updated patch to handle this. We can not just depend on isCommutative as
there are oparation which are not commutative but still can be distributed e.g.
"(X + Y) / Z = X/Z + Y/Z"

We do not handle this now but we might want to do it in future.

jingyue added inline comments.Jun 17 2014, 10:31 AM
lib/Transforms/InstCombine/InstructionCombining.cpp
466 ↗(On Diff #10270)

I think the first "try" in my pseudo-code handle this case. Right?

While "try" doesn't swap a and b or c and d, it considers both LeftDistributesOverRight and RightDistributesOverLeft. Therefore, it is able to convert "a / b + c / d" to "(a + c) / d".

dinesh.d added inline comments.Jun 17 2014, 1:41 PM
lib/Transforms/InstCombine/InstructionCombining.cpp
466 ↗(On Diff #10270)

Yes, if we considers both LeftDistributesOverRight and RightDistributesOverLeft, it handles all cases.
But then I think current code is handling all cases more efficiently.

If left distributes over right and inner op is commutative, then right will also distribute over left
too and vice versa. So Checking following 2 combinations if left distributes over right

try(a, b, c, d);
if (op' is commutative) {
  try(a, b, d, c);
}

and following 2 combination if right distributes ove left should be sufficient.

try(a, b, c, d);
if (op' is commutative) {
  try(a, b, d, c);
}

Because for inner commutative operation

if try(a, b, c, d) fails for LeftDistributesOverRight then try(b, a, d, c) will fail for RightDistributesOverLeft
if try(a, b, c, d) fails for RightDistributesOverLeft then try(b, a, d, c) will fail for LeftDistributesOverRight

similarly

if try(b, a, c, d) fails for LeftDistributesOverRight then try(a, b, d, c) will fail for RightDistributesOverLeft
if try(a, b, c, d) fails for RightDistributesOverLeft then try(a, b, d, c) will fail for LeftDistributesOverRight

Am I missing something?

typo :(

lib/Transforms/InstCombine/InstructionCombining.cpp
466 ↗(On Diff #10270)

Please read last line in previous reply as
if try(b, a, c, d) fails for RightDistributesOverLeft then try(a, b, d, c) will fail for LeftDistributesOverRight

Still reading the code, just wanted to post the easy bits first.

lib/Transforms/InstCombine/InstructionCombining.cpp
398 ↗(On Diff #10499)

Nit: don't duplicate the function name in the comment.

418 ↗(On Diff #10499)

same

442 ↗(On Diff #10499)

here too.

Mostly nits. This is LGTM with the issues on line 552 addressed, but please wait for Jingyue Wu to confirm that he is OK with it too.

Thanks!

lib/Transforms/InstCombine/InstructionCombining.cpp
415 ↗(On Diff #10499)

You have both a default case with a return and a return after the switch.

Given that we only handle Mul, it is probably better to write this as an if for now. When we add support for other opcodes it is easy to go back to an switch.

552 ↗(On Diff #10499)

This should be

if (Value *V = tryFactorization(Builder, DL, I, RHSOpcode, LHS,
                                getIdentityValue(LHSOpcode, LHS), C, D))

no?

jingyue added inline comments.
lib/Transforms/InstCombine/InstructionCombining.cpp
481 ↗(On Diff #10499)

Do we still need to try right distribution if left distribution already succeeds?

494 ↗(On Diff #10499)

A question: any chance to preserve NSW/NUW for TopLevelOpcode?

508 ↗(On Diff #10499)

I think it's better to write:

if (BinaryOperator *Op0 = dyn_cast<BinaryOperator>(LHS)) {
  if (isa<OverflowingBinaryOperator>(Op0)) {
    HasNSW &= Op0->hasNoSignedWrap();
}
515 ↗(On Diff #10499)

I may be over-concerned, but is this cast dangerous? CreateBinOp returns a ConstantExpr if its operands are both Constant. Note that ConstantInt op ConstantInt is not always foldable, e.g., ptrtoint (a global variable) + 5.

543 ↗(On Diff #10499)

(A op' B) op (C)?

466 ↗(On Diff #10270)

Good point. I agree

try(a, b, c, d);
if (op' is commutative) {
  try(a, b, d, c);
}

is enough.

Thanks for review and comments. I have replied them inline.

lib/Transforms/InstCombine/InstructionCombining.cpp
398 ↗(On Diff #10499)

updated.

415 ↗(On Diff #10499)

updated.

418 ↗(On Diff #10499)

updated.

442 ↗(On Diff #10499)

updated.

481 ↗(On Diff #10499)

No. We should not. I have updated code to check if left distribution already succeeds

494 ↗(On Diff #10499)

I will try to do that in next patches.

508 ↗(On Diff #10499)

updated.

515 ↗(On Diff #10499)

We have already check if 'isa<OverflowingBinaryOperator>(SimplifiedInst)' so casting SimplifiedInst to BinaryOperator should be safe.

543 ↗(On Diff #10499)

updated.

552 ↗(On Diff #10499)

updated code. last minute copy paste issue. it should be

getIdentityValue(RHSOpcode, LHS)

I will be more carefull to avoid these mistakes.

dinesh.d updated this revision to Diff 10526.Jun 17 2014, 11:09 PM

Updated patch as per comments. Thanks for review.

jingyue accepted this revision.Jun 18 2014, 3:04 PM
jingyue edited edge metadata.

Only one nit, otherwise LGTM.

lib/Transforms/InstCombine/InstructionCombining.cpp
515 ↗(On Diff #10499)

http://llvm.org/docs/doxygen/html/classllvm_1_1OverflowingBinaryOperator.html

OverflowingBinaryOperator is not a subclass of BinaryOperator. BinaryOperator is an Instruction, but OverflowingBinaryOperator can be either an Instruction or a ConstantExpr.

I agree the naming is a little confusing :)

This revision is now accepted and ready to land.Jun 18 2014, 3:04 PM
dinesh.d updated this revision to Diff 10613.Jun 19 2014, 1:12 AM
dinesh.d edited edge metadata.

Thanks for review. I have updated cast<BinaryOperator>(SimplifiedInst) to use dyn_cast.

dinesh.d closed this revision.Jun 19 2014, 1:37 AM
dinesh.d updated this revision to Diff 10616.

Closed by commit rL211261 (authored by dinesh).